A Novel BOM Similarity Metric Method Based on Ensemble Model
WU Wen-li1, FAN Xiao-peng2, ZHOU Geng-shen1, HUANG Yi1, CAO Yang1, LIN Gui-chan1
1. China Greatwall Technology Group Co., Ltd, Shenzhen, Guangdong 518000, China;
2. Shenzhen Institutes of Advanced Technology, China Academy of Sciences, Shenzhen, Guangdong 518000, China
摘要 为满足多品种小批次、大规模定制模式下有效划分产品族的需求,全面分析BOM(Bill of Materials,物料清单)所包含的特征,概括已有结构近似方法并提出内容近似度量模型,在此基础上提出组合两者的集成模型.结构近似模型方面,以包含BOM层次结构和物料数量的相邻矩阵表示BOM,利用正交普氏分析法计算BOM与BOM之间的近似程度.内容近似模型方面,从BOM文本中提取有效特征,引入逆向词频法将文本特征转换成机器可识别向量形式,采用余弦近似公式完成向量近似的计算.集成模型提出基于基尼系数的权重分配方法集成结构和内容两种模型.最后,提供测试框架并通过实验评价集成模型较已有方法在模型性能及训练耗时上的优劣.
Abstract:In order to meet the requirements of grouping product families for advanced manufacturing modes such as mass customization,the features in BOM (Bill of Materials) are comprehensively analyzed,and a concept of BOM structure-based similarity metric model,a content-based similarity metric model,and an ensemble model combined with both are proposed.In the structure-based model,BOMs are represented by adjacent matrixes,including the relationships between materials and the quantity of materials,and the Orthogonal Procrustes Analysis is implemented to measure the similarity among BOMs.While in content-based model,effective text features are extracted from BOMs,being transformed to vectors by TFIDF(Term Frequency-Inverse Document Frequency),and finally being inputted into cosine approximation formula for similarity value.To obtain more accuracy and performance,a weight distribution method based on the Gini coefficient is proposed for the ensemble model.Finally,a test framework is provided and all models are in evaluated experimentally in accuracy and performance.
[1] Hu X,Peng W,Jin L,Dou J,Zhong Y,Jiang R.A new product family mining method based on PLM database[J].Journal of Central South University,2017,24(11):2513-2523.
[2] Romanowski C J,Nagi R.On comparing bills of materials:a similarity/distance measure for unordered trees[J].IEEE Transactions on Systems,Man and Cybernetics,Part A:Systems and Humans,2015,35(2):249-260.
[3] ZHU H,WANG H,ZHANG G.General bill of material reconfiguration method based on data mining[J].Computer Integreted Manufacturing Systems.2008,14(2):315-321.
[4] Geng J,Zhen M,Tian X,Zhang D.Product lifecycle-oriented BOM similarity metric method[J].China Mechanical Engineering,2008,19(20):2441-2445.
[5] Shih H M.Product structure (BOM)-based product similarity measures using orthogonal procrustes approach[J].Computers & Industrial Engineering,2013,61(3):608-628.
[6] Israt J C,Richi N.Identifying product families using data mining techniques in manufacturing[A].Paradigm.Proceedings of the Twelfth Australasian Data Mining Conference[C].Berlin:Springer's Communication in Computer and Information Science,2014.113-120.
[7] Chowdhury I J,Naya R.A novel method for finding similarities between unordered trees using matrix data model[A].Web Information Systems Engineering-WISE[C].Berlin:Springer's Communication in Computer and Information Science,2013.421-430.
[8] Chen C.Improved TFIDF in big news retrieval:An empirical study[J].Pattern Recognition Letters.2017,93(1):113-122.
[9] Shuo C,Rongxing L,Jie Z.A flexible privacy-preserving framework for singular value decomposition under internet of things environment[A].IFIP International Conference on Trust Management[C].Berlin:Springer,2017.21-37.
[10] Geng Z,Li Y,Han Y,Zhu Q.A novel self-organizing cosine similarity learning network[J].Energy,2018:142(1):400-410.
[11] Caitlin M A,Elizabeth A W,Brian G.Prediction of plant lncRNA by ensemble machine learning classifiers[J].BMC Genomics,2018,19(1):1-11.