电子学报 ›› 2022, Vol. 50 ›› Issue (9): 2079-2089.DOI: 10.12263/DZXB.20210454
李雪莹, 王田路, 梁鹏, 王翀
收稿日期:
2021-04-09
修回日期:
2021-12-20
出版日期:
2022-09-25
作者简介:
基金资助:
LI Xue-ying, WANG Tian-lu, LIANG Peng, WANG Chong
Received:
2021-04-09
Revised:
2021-12-20
Online:
2022-09-25
Published:
2022-10-26
摘要:
移动应用程序中的用户评论是获取用户需求的重要来源. 从用户评论中获取的用户需求,不仅可以帮助开发人员维护现有系统,还可以快速、准确地定位新的用户需求. 本文主要关注移动应用用户评论中的非功能需求,并基于系统模型、采用机器学习和深度学习算法将其自动分类为行为型需求和表示型需求. 在使用机器学习方法分类时,将2种特征提取技术与5种机器学习算法进行组合. 在使用深度学习方法分类时,使用了2种基于词嵌入的深度学习算法和1种基于字符嵌入的深度学习算法. 从性能和时间消耗2个维度比较了机器学习模型和深度学习模型,结果表明,机器学习模型比深度学习模型表现更好. 此外,支持向量机(Support Vector Machine,SVM)与词频-逆文档频率(Term Frequency?Inverse Document Frequency,TF?IDF)组合获得了最好的分类性能,精确率为0.941,召回率为0.990,F1-score为0.965.
中图分类号:
李雪莹, 王田路, 梁鹏, 王翀. 基于系统模型的用户评论中非功能需求的自动分类[J]. 电子学报, 2022, 50(9): 2079-2089.
LI Xue-ying, WANG Tian-lu, LIANG Peng, WANG Chong. Automatic Classification of Non-Functional Requirements in App User Reviews Based on System Model[J]. Acta Electronica Sinica, 2022, 50(9): 2079-2089.
需求类型 | 描述 | 示例 |
---|---|---|
行为型 需求 | Requirements that describe behavioral properties of the system, including the behavioral of interface, architecture, and state. | App is not responding while sending the pictures. |
表示型 需求 | Requirements that describe the representation, description, construction, implementation, and execution of the system (i.e., the way that a system is syntactically and technically represented). | If there is another version released called ibooks classic with the wooden bookshelf UI, we will be thankful. |
表1 用户评论中行为型需求和表示型需求的示例
需求类型 | 描述 | 示例 |
---|---|---|
行为型 需求 | Requirements that describe behavioral properties of the system, including the behavioral of interface, architecture, and state. | App is not responding while sending the pictures. |
表示型 需求 | Requirements that describe the representation, description, construction, implementation, and execution of the system (i.e., the way that a system is syntactically and technically represented). | If there is another version released called ibooks classic with the wooden bookshelf UI, we will be thankful. |
Hyper-Parameter | Value |
---|---|
Dimension of word vector | 200 |
Length of word sequence | 200 |
Number of convolution kernel | 128 |
Size of convolution kernel | [ |
Learning rate | 0.001 |
Batch size | 64 |
Epoch | 10 |
表2 TextCNN模型的主要超参数
Hyper-Parameter | Value |
---|---|
Dimension of word vector | 200 |
Length of word sequence | 200 |
Number of convolution kernel | 128 |
Size of convolution kernel | [ |
Learning rate | 0.001 |
Batch size | 64 |
Epoch | 10 |
Hyper-parameter | Value |
---|---|
Dimension of word vector | 200 |
Length of word sequence | 200 |
Size of hidden layer | 128 |
Learning rate | 0.001 |
Batch size | 64 |
Epoch | 10 |
表3 RCNN模型的主要超参数
Hyper-parameter | Value |
---|---|
Dimension of word vector | 200 |
Length of word sequence | 200 |
Size of hidden layer | 128 |
Learning rate | 0.001 |
Batch size | 64 |
Epoch | 10 |
Hyper-Parameter | Value |
---|---|
Dimension of character vector | 70 |
Length of character sequence | 1014 |
Size of convolution and pooling layers | ([256,7,3], [256,7,3], [256,3,None], [256,3,None], [256,3,None], [256,3,3]) |
Size of fully-connected | [1024,1024,2] |
Learning rate | 0.001 |
Batch size | 64 |
Epoch | 10 |
表4 CharCNN模型的主要超参数
Hyper-Parameter | Value |
---|---|
Dimension of character vector | 70 |
Length of character sequence | 1014 |
Size of convolution and pooling layers | ([256,7,3], [256,7,3], [256,3,None], [256,3,None], [256,3,None], [256,3,3]) |
Size of fully-connected | [1024,1024,2] |
Learning rate | 0.001 |
Batch size | 64 |
Epoch | 10 |
TF-IDF | BoW | |||||
---|---|---|---|---|---|---|
Precision | Recall | F1-score | Precision | Recall | F1-score | |
NB | 0.914 | 0.980 | 0.946 | 0.989 | 0.908 | 0.947 |
LR | 0.931 | 0.969 | 0.950 | 0.941 | 0.979 | 0.960 |
DT | 0.934 | 0.867 | 0.899 | 0.939 | 0.949 | 0.944 |
RF | 0.922 | 0.969 | 0.945 | 0.922 | 0.969 | 0.945 |
SVM | 0.941 | 0.990 | 0.965 | 0.941 | 0.980 | 0.960 |
表5 使用机器学习模型对用户评论中的非功能需求进行分类的结果
TF-IDF | BoW | |||||
---|---|---|---|---|---|---|
Precision | Recall | F1-score | Precision | Recall | F1-score | |
NB | 0.914 | 0.980 | 0.946 | 0.989 | 0.908 | 0.947 |
LR | 0.931 | 0.969 | 0.950 | 0.941 | 0.979 | 0.960 |
DT | 0.934 | 0.867 | 0.899 | 0.939 | 0.949 | 0.944 |
RF | 0.922 | 0.969 | 0.945 | 0.922 | 0.969 | 0.945 |
SVM | 0.941 | 0.990 | 0.965 | 0.941 | 0.980 | 0.960 |
Precision | Recall | F1-score | |
---|---|---|---|
TextCNN + Word2Vec | 0.950 | 0.969 | 0.959 |
TextCNN + FastText | 0.989 | 0.898 | 0.941 |
RCNN + Word2Vec | 0.931 | 0.959 | 0.945 |
RCNN + FastText | 0.766 | 1.000 | 0.867 |
CharCNN | 0.876 | 0.867 | 0.871 |
表6 使用深度学习模型对用户评论中的非功能需求进行分类的结果
Precision | Recall | F1-score | |
---|---|---|---|
TextCNN + Word2Vec | 0.950 | 0.969 | 0.959 |
TextCNN + FastText | 0.989 | 0.898 | 0.941 |
RCNN + Word2Vec | 0.931 | 0.959 | 0.945 |
RCNN + FastText | 0.766 | 1.000 | 0.867 |
CharCNN | 0.876 | 0.867 | 0.871 |
TF-IDF | BoW | |||
---|---|---|---|---|
Bt | Pt | Bt | Pt | |
NB | 548 | 4 | 491 | 4 |
LR | 853 | 4 | 848 | 6 |
DT | 585 | 3 | 514 | 3 |
RF | 880 | 17 | 994 | 18 |
SVM | 604 | 9 | 540 | 7 |
表7 机器学习算法的建模耗时(Bt)以及模型预测耗时(Pt) ms
TF-IDF | BoW | |||
---|---|---|---|---|
Bt | Pt | Bt | Pt | |
NB | 548 | 4 | 491 | 4 |
LR | 853 | 4 | 848 | 6 |
DT | 585 | 3 | 514 | 3 |
RF | 880 | 17 | 994 | 18 |
SVM | 604 | 9 | 540 | 7 |
Word Embedding | ||||
---|---|---|---|---|
Word2Vec | FastText | |||
Bt | Pt | Bt | Pt | |
TextCNN | 94575 | 27817 | 135659 | 27694 |
RCNN | 224776 | 82282 | 277075 | 72534 |
表8 深度学习算法(TextCNN、RCNN)的建模耗时(Bt)以及模型预 (测耗时(Pt) ms)
Word Embedding | ||||
---|---|---|---|---|
Word2Vec | FastText | |||
Bt | Pt | Bt | Pt | |
TextCNN | 94575 | 27817 | 135659 | 27694 |
RCNN | 224776 | 82282 | 277075 | 72534 |
CharCNN | Character Embedding | |
---|---|---|
Bt | Pt | |
514391 | 85078 |
表9 深度学习算法(CharCNN)的建模耗时(Bt)以及模型预测耗时(Pt)ms
CharCNN | Character Embedding | |
---|---|---|
Bt | Pt | |
514391 | 85078 |
1 | FERNÁNDEZ D M. Supporting requirements-engineering research that industry needs: The NaPiRE initiative[J]. IEEE Software, 2018, 35(1): 112-116. |
2 | BROY M. Rethinking Functional Requirements: A Novel Approach Categorizing System and Software Requirements[M]//Software Technology: 10 Years of Innovation in IEEE Computer. Hoboken, NJ, USA: John Wiley & Sons, Inc, 2018: 155-187. |
3 | BROY M. Rethinking nonfunctional software requirements[J]. Computer, 2015, 48(5): 96-99. |
4 | ECKHARDT J. Categorizations of Product-related Requirements in Practice[D]. München: Technische Universität München, 2017. |
5 | LU M M, LIANG P. Automatic classification of non-functional requirements from augmented app user reviews[C]//EASE'17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. New York: ACM, 2017: 344-353. |
6 | Joint Technical Committee ISO/IEC [S/OL].[2021-04-07].. |
7 | 李雪莹,王田路,梁鹏,王翀. 基于系统模型的用户评论中非功能需求的自动分类[EB/OL]. [2021-04-07] . |
8 | GLINZ M. On non-functional requirements[C]//15th IEEE International Requirements Engineering Conference. Piscataway: IEEE, 2007: 21-26. |
9 | ABAD Z S H, KARRAS O, GHAZI P, et al. What works better? A study of classifying requirements[C]//2017 IEEE 25th International Requirements Engineering Conference. Piscataway: IEEE, 2017: 496-501. |
10 | LI C Y, HUANG L G, GE J D, et al. Automatically classifying user requests in crowdsourcing requirements engineering[J]. Journal of Systems and Software, 2018, 138: 108-123. |
11 | STANIK C, HAERING M, MAALEJ W. Classifying multilingual user feedback using traditional machine learning and deep learning[C]//2019 IEEE 27th International Requirements Engineering Conference Workshops. Piscataway: IEEE, 2019: 220-226. |
12 | JHA N, MAHMOUD A. Mining non-functional requirements from App store reviews[J].Empirical Software Engineering, 2019, 24(6): 3659-3695. |
13 | WANG T L, LIANG P, LU M M. What aspects do non-functional requirements in app user reviews describe? an exploratory and comparative study[C]//2018 25th Asia-Pacific Software Engineering Conference(APSEC). Piscataway: IEEE, 2018: 494-503. |
14 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-01-16). . |
15 | BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146. |
16 | KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2014: 1746-1751. |
17 | LAI S W, XU L H, LIU K, et al. Recurrent convolutional neural networks for text classification[C]//AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI, 2015: 2267-2273. |
18 | ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. Cambridge, MA: MIT Press, 2015: 649-657. |
19 | COHEN J. A coefficient of agreement for nominal scales[J]. Educational and Psychological Measurement, 1960, 20(1): 37-46. |
20 | SHULL F, SINGER J, SJØBERG D. I. Guide to Advanced Empirical Software Engineering[M]. Berlin, German: Springer, 2008. |
[1] | 吴靖, 叶晓晶, 黄峰, 陈丽琼, 王志锋, 刘文犀. 基于深度学习的单帧图像超分辨率重建综述[J]. 电子学报, 2022, 50(9): 2265-2294. |
[2] | 琚长瑞, 秦晓燕, 袁广林, 李豪, 朱虹. 尺度敏感损失与特征融合的快速小目标检测方法[J]. 电子学报, 2022, 50(9): 2119-2126. |
[3] | 张志昌, 于沛霖, 庞雅丽, 朱林, 曾扬扬. SMGN:用于对话状态跟踪的状态记忆图网络[J]. 电子学报, 2022, 50(8): 1851-1858. |
[4] | 张亚洲, 俞洋, 朱少林, 陈锐, 戎璐, 梁辉. 一种量子概率启发的对话讽刺识别网络模型[J]. 电子学报, 2022, 50(8): 1885-1893. |
[5] | 王飞扬, 冀鹏欣, 孙笠, 危倩, 李根, 张忠宝. 一种基于深度学习的动态社交网络用户对齐方法[J]. 电子学报, 2022, 50(8): 1925-1936. |
[6] | 徐兴荣, 刘聪, 李婷, 郭娜, 任崇广, 曾庆田. 基于双向准循环神经网络和注意力机制的业务流程剩余时间预测方法[J]. 电子学报, 2022, 50(8): 1975-1984. |
[7] | 裴炤, 邱文涛, 王淼, 马苗, 张艳宁. 基于Transformer动态场景信息生成对抗网络的行人轨迹预测方法[J]. 电子学报, 2022, 50(7): 1537-1547. |
[8] | 欧阳与点, 谢鲲, 谢高岗, 文吉刚. 面向大规模网络测量的数据恢复算法:基于关联学习的张量填充[J]. 电子学报, 2022, 50(7): 1653-1663. |
[9] | 李政伟, 李佳树, 尤著宏, 聂茹, 赵欢, 钟堂波. 基于异质图注意力网络的miRNA与疾病关联预测算法[J]. 电子学报, 2022, 50(6): 1428-1435. |
[10] | 彭闯, 王伦文, 胡炜林. 融合深度特征的电磁频谱异常检测算法[J]. 电子学报, 2022, 50(6): 1359-1369. |
[11] | 张波, 陆云杰, 秦东明, 邹国建. 一种卷积自编码深度学习的空气污染多站点联合预测模型[J]. 电子学报, 2022, 50(6): 1410-1427. |
[12] | 杨伟超, 杜宇, 文伟, 侯舒维, 徐常志, 张建华. 基于多重分形谱智能分析的卫星信号调制识别研究[J]. 电子学报, 2022, 50(6): 1336-1343. |
[13] | 冀振燕, 韩梦豪, 宋晓军, 冯其波. 面向激光光条图像修复的循环相似度映射网络[J]. 电子学报, 2022, 50(5): 1234-1242. |
[14] | 廖勇, 李玉杰. 一种轻量化低复杂度的FDD大规模MIMO系统CSI反馈方法[J]. 电子学报, 2022, 50(5): 1211-1217. |
[15] | 黄璐, 蔚保国, 李宏生, 李隽, 贾浩男, 程建强, 李雅宁. GNSS拒止环境下的伪卫星指纹定位方法[J]. 电子学报, 2022, 50(4): 811-822. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||