Research Initiation Fund of Huaqiao University (No.12BS215);National Natural Science Foundation of China (No.51305142, No.61502184);Natural Science Foundation of Fujian Province (No.2015J01259)
This paper surveys the state of the art of schema inference from XML data.First
the formal models based on regular tree grammar for commonly used XML schema languages are presented.Then
the existing works on XML schema inference are summarized and compared from various aspects such as inference methods
target schema languages
supported expressiveness
regular expression types corresponding to the content models
and so on.In addition
inferences of some basic integrity constraints from XML data are also introduced.Finally
this paper points out the defects of current research and discusses some potential future research directions.This paper aims to offer a detail overview
comparison and analysis of the mainstream methods and recent progress in this field
expecting to be beneficial for subsequent research.