1.东北石油大学计算机与信息技术学院,黑龙江大庆 163318
2.山东理工大学计算机科学与技术学院,山东淄博 255000
[ "高俊涛 男,1979年出生于黑龙江哈尔滨. 现为东北石油大学副教授、硕士生导师. 研究方向包括软件工程、过程建模、自动机学习等.E‑mail:gjt@nepu.edu.cn" ]
[ "王 梅 女,1976年出生于河北安国.现为东北石油大学教授,研究方向包括机器学习、模型选择和核方法.E‑mail:wangmei@nepu.edu.cn" ]
[ "徐光会 女,1995年出生于山东济宁.硕士研究生. 主要研究方向为自动机学习.E‑mail:xuguanghui6688@163.com" ]
[ "刘 聪 男,1990年出生于山东淄博.现为山东理工大学教授,研究方向包括业务过程管理、过程挖掘、数据分析、Petri网理论与应用.E‑mail:liucongchina@sdust.edu.cn" ]
收稿:2020-04-27,
修回:2021-01-06,
纸质出版:2021-12-25
移动端阅览
高俊涛,王梅,徐光会等.正则语言推断综述[J].电子学报,2021,49(12):2479-2489.
GAO Jun-tao,WANG Mei,XU Guang-hui,et al.A Survey on Regular Language Inference[J].ACTA ELECTRONICA SINICA,2021,49(12):2479-2489.
高俊涛,王梅,徐光会等.正则语言推断综述[J].电子学报,2021,49(12):2479-2489. DOI: 10.12263/DZXB.20200404.
GAO Jun-tao,WANG Mei,XU Guang-hui,et al.A Survey on Regular Language Inference[J].ACTA ELECTRONICA SINICA,2021,49(12):2479-2489. DOI: 10.12263/DZXB.20200404.
正则语言推断研究从语言的有限信息出发,通过归纳和推理得出正则语言模型.该技术在信息抽取、软件工程、模式识别等领域应用广泛.本文首先阐明了语言的可学习性概念和推断结果的评价准则.然后从推理策略、数据结构、算法复杂性等方面,对被动、主动和基于神经网络的学习算法进行分类归纳与对比,梳理各流派的技术发展脉络.接着分析推断产生的三种泛化效应.最后指出当前研究中不足,对未来研究方向进行展望.
The regular language inference is devoted to the study of how to derive the definition of the language from the limited information. Regular language inference is popular in information extraction
software engineering
and schema learning
etc. The learnability models of regular languages and evaluation criteria are clarified. The passive
active
and neural network‑based learning algorithms are respectively surveyed and compared from the aspects of reasoning strategies
data structures
algorithm complexity
etc
and the development trend is pointed out. Three kinds of generalization effects are analyzed. This paper points out the defects of the current researches and looks forward to future research directions.
BEX G J , NEVEN F , SCHWENTICK T , et al . Inference of concise regular expressions and DTDs [J]. ACM Trans Database Syst, 2010 , 35 ( 2 ): 1 - 47 .
XIE Y , YU F , ACHAN K , et al . Spamming botnets: signatures and characteristics [J]. SIGCOMM Comput Commun Rev , 2008 , 38 ( 4 ): 171 - 182 .
CHIVILIKHIN D , PATIL S , CHUKHAREV K , et al . Automatic state machine reconstruction from legacy programmable logic controller using data collection and sat solver [J]. IEEE Transactions on Industrial Informatics , 2020 , 16 ( 12 ): 7821 - 7831 .
CHEN Y‑F , HSIEH C , LENG L O R , et al . PAC learning‑ based verification and model synthesis [A]. DILLON L. Proceedings of the 38th International Conference on Software Engineering [C]. New York, United States : ACM , 2016 . 714 - 724 .
GOLD E M . Language identification in the limit [J]. Information and Control , 1967 , 10 ( 5 ): 447 - 474 .
ANGLUIN D . Inductive inference of formal languages from positive data [J]. Information and Control , 1980 , 45 ( 2 ): 117 - 135 .
VALIANT L G , VALIANT L . A theory of the learnable [J]. Communications of ACM , 1984 , 27 ( 11 ): 1134 - 1142 .
ISHIGAMI Y , TANI S I . VC‑dimensions of finite automata and commutative finite automata with k letters and n states [J]. Discrete Applied Mathematics , 1997 , 74 ( 3 ): 229 - 240 .
GOLD E M . Complexity of automation identification and control [J]. Information and Control , 1978 , 37 ( 4 ): 302 - 320 .
ANGLUIN D . On the complexity of minimum inference of regular sets [J]. Information and Control , 1978 , 39 ( 3 ): 337 - 350 .
ANGLUIN D . Queries and concept learning [J]. Machine Learning , 1988 , 2 ( 3 ): 319 - 342 .
FREYDENBERGER D , REIDENBACH D . Inferring descriptive generalizations of formal languages [J]. Journal of Computer and System Sciences , 2012 , 79 ( 5 ): 622 - 638 .
WALKINSHAW N , BOGDANOV K , DAMAS C , et al . A framework for the competitive evaluation of model inference techniques [A]. GROZ R, LI K. Proceedings of the First International Workshop on Model Inference In Testing [C]. New York, United States : ACM , 2010 . 1 - 9 .
GRACHEV P , BEZBORODOV R , SMETANNIKOV I , et al . Exploring the relationship between the structural and the actual similarities of automata [A]. Proceedings of the 3rd International Conference on Machine Learning and Soft Computing [C]. New York, United States : ACM , 2019 . 81 - 86 .
CHAN M GC , AND R. RASTOGI . RE‑tree: an efficient index structure for regular expressions [J]. the VLDB Journal , 2003 , 12 ( 2 ): 102 - 118 .
GRUNWALD P D . The Minimum Description Length Principle [M]. Dordrecht, Netherlands : Springer , 2007 .
RISSANEN J . Modeling by shortest data description [J]. Automatica , 1978 , 14 ( 5 ): 465 - 471 .
GOLD E M . Complexity of automaton identification from given data [J]. Information and Control , 1978 , 37 ( 3 ): 302 - 320 .
TRAKHTENBROT B A , BARZDIN Y M . Finite Automata: Behavior and Synthesis [M]. North‑Holland, Netherlands : Elsevier , 1973 .
ZQUEZ DE PARGA MV , GARC A P , L PEZ D . Minimal consistent DFA revisited [J]. Theoretical Computer Science , 2016 , 647 : 43 - 49 .
ZHANG C . Minimal consistent DFA from sample strings [J]. Acta Informatica , 2020 , 57 ( 3 ): 657 - 670 .
ONCINA J , GARCIA P . Inferring regular languages in polynomial update time [A]. SANFELIU A, BLANCA N P D L, VIDAL E. Pattern Recognition and Image Analysis [C]. Singapore : World Scientific , 1992 . 49 - 61 .
LANG K J . Random DFA’s can be approximately learned from sparse uniform examples [A]. HAUSSLER D. Proceedings of the Fifth Annual Workshop on Computational Learning Theory [C]. New York, United States : ACM , 1992 . 45 - 52 .
LANG K J , PEARLMUTTER B A , PRICE R A . Results of the abbadingo one DFA learning competition and a new evidence‑driven state merging algorithm [A]. HONAVAR V G, SLUTZKI G. Proceedings of the 4th International Colloquium on Grammatical Inference [C]. Berlin, German : Springer , 1998 . 1 - 12 .
LANG K . Faster Algorithms for Finding Minimal Consistent DFAs [R]. Kovilpatti, India : National Engineering College , 1999 .
OLIVEIRA A L , SILVA J P M . Efficient algorithms for the inference of minimum size DFAs [J]. Machine Learning , 2001 , 44 ( 1‑2 ): 93 - 119 .
ZAKIRZYANOV I , MORGADO A , IGNATIEV A , et al . Efficient symmetry breaking for SAT‑based minimum DFA inference [A]. MART N‑VIDE C, OKHOTIN A, SHAPIRA D. Language and Automata Theory and Applications [C]. Cham,Switzerland : Springer , 2019 . 159 - 173 .
HEULE M J H , VERWER S . Exact DFA identification using SAT solvers [A]. SEMPERE J M, GARC A P. Grammatical Inference: Theoretical Results and Applications [C]. Berlin, German : Springer , 2010 . 66 - 79 .
SMETSERS R , FITERĂU‑BROŞTEAN P , VAANDRAGER F . Model learning as a satisfiability modulo theories problem [A]. KLEIN S T, MART N‑VIDE C, SHAPIRA D. Language and Automata Theory and Applications [C]. Cham,Switzerland : Springer , 2018 . 182 - 194 .
DUPONT P , LAMBEAU B , DAMAS C , et al . The QSM algorithm and its application to software behavior model induction [J]. Applied Artificial Intelligence , 2008 , 22 ( 1‑2 ): 77 - 115 .
DENIS F , LEMAY A , TERLUTTE A . Learning regular languages using non deterministic finite automata [A]. Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications [C]. Berlin, Germany : Springer , 39 - 50 .
GRINCHTEIN O , LEUCKER M , PITERMAN N . Inferring network invariants automatically [A]. FURBACH U, SHANKAR N. Automated Reasoning [C]. Berlin, Germany : Springer , 2006 . 483 - 497 .
ZAKIRZYANOV I , SHALYTO A , ULYANTSEV V . Finding all minimum‑size dfa consistent with given examples: SAT‑based approach [A]. CERONE A, ROVERI M. Software Engineering and Formal Methods [C]. Cham, Switzerland : Springer , 2018 . 117 - 131 .
GRACHEV P . Reputational genetic model for regular inference [A]. Proceedings of the 3rd International Conference on Advances in Image Processing [C]. New York, United States : ACM , 2019 . 185 - 189 .
GRACHEV P . Grammar inference with multiparameter genetic model [A]. Proceedings of the 3rd International Conference on Advances in Image Processing [C]. New York, United States : ACM , 2019 . 160 - 164 .
LIU J , BAI R , LU Z , et al . Data‑driven regular expressions evolution for medical text classification using genetic programming [A]. 2020 IEEE Congress on Evolutionary Computation [C]. Piscataway, United States : IEEE , 2020 . 1 - 8 .
RADHAKRISHNAN V , NAGARAJA G . Inference of regular grammars via skeletons [J]. IEEE Transactions on Systems,Man, and Cybernetics , 1987 , 17 ( 6 ): 982 - 992 .
GARCIA P , VIDAL E . Inference of k‑testable languges in the strict sense and application to syntactic pattern recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1990 , 12 ( 9 ): 920 - 925 .
GARCIA P , VIDAL E , ONCINA J . Learning locally testable languages in the strict sense [A]. ARIKAWA S, GOTO S, OHSUGA S, et al. Proceedings of ALT’90 [C]. Tokyo, Japan : JSAI , 1990 . 325 - 338 .
AHONEN H . Generating Grammars for Structured Documents Using Grammatical Inference Methods [D]. Helsinki, Finland : Univeristy of Helsinki , 1996 .
TANIDA N , T.YOKOMORI . Polynomial time identification of strictly regular languages in the limit [J]. IEICE Transaction on Information and Systems , 1992 , E75-D( 6 ): 125 - 132 .
EMERALD J D , SUBRAMANIAN K G , THOMAS D G . Learning code regular and code linear languages [A]. MICLET L, HIGUERA C D L. Grammatical Interference: Learning Syntax from Sentences [C]. Berlin, Germany : Springer , 1996 . 211 - 221 .
MAKINEN E . Inferring uniquely terminating regular languages from positive data [J]. Information Processing Letters , 1997 , 62 ( 2 ): 57 - 60 .
FERNAU H . Algorithms for learning regular expressions from positive data [J]. Information and Computation , 2009 , 207 ( 4 ): 521 - 541 .
BEX G J , GELADE W , NEVEN F , et al . Learning deterministic regular expressions for the inference of schemas from XML data [J]. ACM Transactions on The Web , 2010 , 4 ( 4 ): 1 - 32 .
冯晓强 , 郑黎晓 , 陈海明 . 一类受限正则表达式的推断算法 [J]. 计算机科学 , 2014 , 41 ( 4 ): 178 - 183 .
FENG Xiao‑qiang , ZHENG Li‑xiao , CHEN Hai‑ming . Inferring algorithm for a subclass of restricted regular expressions [J]. Computer Science , 2014 , 41 ( 4 ): 178 - 183 . (in Chinese)
BIERMANN A W , FELDMAN J A . On the synthesis of finite‑state machines from samples of their behavior [J]. IEEE Transactions on Computers , 1972 , C- 21 ( 6 ): 592 - 597 .
ZAANEN M M V . Bootstrapping Structure into Language:Alignment‑Based Learning [D]. Mahikeng, South Africa : North‑West University , 2001 .
MIN J‑K , J‑YAHN , C‑WCHUNG . Efficient extraction of schemas for XML documents [J]. Information Processing Letters , 2003 , 85 ( 1 ): 7 - 12 .
GALASSI U , GIORDANA A . Learning regular expressions from noisy sequences [A]. ZUCKER J‑D, SAITTA L. Abstraction , Reformulation and Approximation[C] . Berlin, Germany : Springer , 2005 . 92 - 106 .
GAROFALAKIS M , GIONIS A , RASTOGI R , et al . XTRACT: learning document type descriptors from XML document collections [J]. Data Mining and Knowledge Discovery , 2003 , 7 ( 1 ): 23 - 56 .
GAO J , ZHANG Y . Regular expression learning from positive examples based on integer programming [J]. International Journal of Software Engineering and Knowledge Engineering , 2020 , 30 ( 10 ): 1 - 37 .
ANGLUIN D . Learning regular sets from queries and counterexamples [J]. Information and Control , 1988 , 75 ( 2 ): 87 - 106 .
KEARNS M J , VAZIRANI U . An Introduction to Computational Learning Theory [M]. Cambridge, United States : MIT , 1994 .
ISBERNER M , HOWAR F , STEFFEN B . The TTT algorithm: a redundancy‑free approach to active automata learning [A]. BONAKDARPOUR B, SMOLKA S A. Runtime Verification [C]. Cham, Switzerland : Springer , 2014 . 307 - 322 .
DENIS F , LEMAY A , TERLUTTE A . Residual finite state automata [A]. FERREIRA A, REICHEL H. Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science [C]. Berlin, Germany : Springer , 2001 . 144 - 157 .
BOLLIG B , HABERMEHL P , KERN C , et al . Angluin‑style learning of NFA [A]. KITANO H. Proceedings of the 21st International Joint Conference on Artifical intelligence [C]. San Francisco, United States : Morgan Kaufmann . 1004 - 1009 .
ANGLUIN D , EISENSTAT S , FISMAN D . Learning regular languages via alternating automata [A]. YANG Q, WOOLDRIDGE M. Proceedings of the 24th International Conference on Artificial Intelligence [C]. Palo Alto, United States : AAAI Press , 2015 . 3308 - 3314 .
BERNDT S , LIŚKIEWICZ M , LUTTER M , et al . Learning residual alternating automata [A]. Proceedings of the 31st AAAI Conference on Artificial Intelligence [C]. Palo Alto, United States : AAAI Press , 2017 . 1749 - 1755 .
RIVEST R L , SCHAPIRE R E . Inference of finite futomata using homing sequences [A]. JOHNSON D S. Proceedings of the 21st Annual ACM Symposium on Theory of Computing [C]. New York, United States : ACM , 1989 . 411 - 420 .
ELMAN J L . Finding structure in time [J]. Cognitive Science , 1990 , 14 ( 2 ): 179 - 211 .
WATROUS R L , KUHN G M . Induction of finite‑state automata using second‑order recurrent networks [J]. NIPS , 1991 , 4 ( 3 ): 309 - 316 .
GRACHEV P , LOBANOV I , SMETANNIKOV I , et al . Neural network for synthesizing deterministic finite automata [A]. KLIMOVA A, BILYATDINOVA A, KORTELAINEN J, et al. Proceedings of the 6th International Young Scientist Conference on Computational Science [C]. North‑Holland, Netherlands : Elsevier , 2017 . 73 - 82 .
MICHALENKO J J , SHAH A , VERMA A , et al . Representing formal languages: a comparison between finite automata and recurrent neural networks [EB/OL]. https://openreview.net/pdf?id=H1zeHnA9KX https://openreview.net/pdf?id=H1zeHnA9KX , 2019‑05‑06 / 2020‑04‑27 .
OMLIN C W A GC . L. Extraction of rules from discrete‑ time recurrent neural networks [J]. Neural Networks , 1996 , 9 ( 1 ): 41 - 52 .
CECHIN A L , SIMON D . R. P. , AND STERTZ , K . State automata extraction from recurrent neural nets using k‑means and fuzzy clustering [A]. Proceedings of the XXIII International Conference of the Chilean Computer Science Society [C]. Washington, United States : IEEE Computer Society , 2003 . 73 - 78 .
AYACHE S , EYRAUD R , GOUDIAN N . Explaining black boxes on sequential data using weighted automata [A]. UNOLD O, DYRKA W, WIECZOREK W. Proceedings of Machine Learning Research 93 [C]. Cambridge, United States : MIT , 2018 . 81 - 103 .
WEISS G , GOLDBERG Y , YAHAV E . Extracting automata from recurrent neural networks using queries and counterexamples [A]. JENNIFER D, ANDREAS K. Proceedings of the 35th International Conference on Machine Learning [C]. Cambridge, United States : JMLR , 2018 . 5247 - 5256 .
AVELLANEDA F , PETRENKO A . FSM inference from long traces [A]. HAVELUND K, PELESKA J, ROSCOE B, et al. Formal Methods [C]. Cham, Switzerland : Springer , 2018 . 93 - 109 .
WIEMAN R , ANICHE M , LOBBEZOO W , et al . An experience report on applying passive learning in a large‑scale payment company [A]. Proceedings of IEEE International Conference on Software Maintenance and Evolution [C]. Piscataway, United States : IEEE , 2017 . 564 - 573 .
BRĀZMA A . Efficient identification of regular expressions from representative examples [A]. PITT L. Proceedings of the 6th Annual Conference on Computational Learning Theory [C]. New York, United States : ACM , 1993 . 236 - 242 .
FELDMAN J A , GIPS J , HORNING J J , et al . Grammatical Complexity and Inference [R]. Stanford, United States : Stanford University , 1969 .
ITOGA S Y . A new heuristic for inferring regular grammars [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1981 , PAMI‑ 3 ( 2 ): 191 - 197 .
FU K‑S , BOOTH T L . Grammatical inference: introduction and survey‑part I [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1986 , PAMI‑ 8 ( 3 ): 343 - 359 .
GGEMANN-KLEIN ABR . Regular expressions into finite automata [J]. Theoretical Computer Science , 1993 , 120 ( 2 ): 197 - 213 .
RULOT H , VIDAL E . Modelling (Sub) String Length based Constraints Through a Grammatical Inference Method [A]. DEVIJVER P A, KITTLER J. Pattern Recognition Theory and Applications [C]. Berlin, Heidelberg : Springer , 1987 . 451 - 459 .
0
浏览量
16
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621