1.北京信息科技大学计算机学院,北京 100192
2.南通大学信息科学技术学院,江苏南通 226019
[ "杨 君 男,2000年10月生,北京人.现为北京信息科技大学硕士研究生.主要研究方向为issue标题自动生成.E-mail: yangjun1026@bistu.edu.cn" ]
[ "刘诗凡 男,1997年9月生,陕西汉中人.已于北京信息科技大学获得硕士学位,现为北京科技大学博士研究生.主要研究方向为代码注释生成和软件测试.E-mail: pawn2017@bistu.edu.cn" ]
[ "陈 翔 男,1980年3月生,江苏南通人.现为南通大学副教授,硕士生导师.主要研究方向为软件缺陷预测、软件缺陷定位、回归测试和组合测试.E-mail: xchencs@ntu.edu.cn" ]
[ "崔展齐 男,1984年2月生,贵州金沙人.现为北京信息科技大学教授,博士生导师.主要研究方向为软件分析与软件测试技术.E-mail: czq@bistu.edu.cn" ]
收稿:2024-05-10,
修回:2025-02-21,
纸质出版:2025-05-25
移动端阅览
杨君, 刘诗凡, 陈翔, 等. GITG:面向Gitee平台的issue标题自动生成方法[J]. 电子学报, 2025, 53(05): 1559-1570.
YANG Jun, LIU Shi-fan, CHEN Xiang, et al. GITG: Automatic Issue Title Generation Method for Gitee Platform[J]. Acta Electronica Sinica, 2025, 53(05): 1559-1570.
杨君, 刘诗凡, 陈翔, 等. GITG:面向Gitee平台的issue标题自动生成方法[J]. 电子学报, 2025, 53(05): 1559-1570. DOI:10.12263/DZXB.20240434
YANG Jun, LIU Shi-fan, CHEN Xiang, et al. GITG: Automatic Issue Title Generation Method for Gitee Platform[J]. Acta Electronica Sinica, 2025, 53(05): 1559-1570. DOI:10.12263/DZXB.20240434
在开源软件和开源平台中,开发人员可以通过提交issue来记录所发现的软件错误或提出新功能需求.由于缺乏经验、专业水平有限等原因,用户可能无法对issue内容进行准确有效地总结,导致issue标题质量较低,进而降低issue的解决效率.此外,现有的issue标题自动生成方法主要面向GitHub等英文开源平台,当应用在Gitee等国产开源平台时表现不佳.同时,现有方法主要使用issue主体描述作为输入,忽略了issue中的代码片段等重要信息.为此,本文提出一种面向Gitee平台的issue标题自动生成方法GITG(Gitee Issue Title Generation),针对包含中文和英文文本的issue,使用构建的Gitee issue数据集对支持中文的预训练模型Chinese BART(Bidirectional and Auto-Regressive Transformers)进行微调,利用issue主体描述和代码片段的双模态信息来自动生成issue标题.为验证GITG的有效性,构建了包含18 242个Gitee issue样本的数据集.实验结果表明,GITG在ROUGE-1、ROUGE-2和ROUGE-L指标上相较于iTAPE和iTiger分别至少提升了13.09%、10.18%和12.84%,在BLEU和METEOR指标上同样取得了性能提升.人工评价结果表明,GITG生成标题的平均得分在整体分数、流畅性、信息性和简洁性4个评价指标上相较iTAPE和iTiger分别至少提升了26.7%、20.8%、24.2%和20.0%.
In open-source software and platforms
developers can submit issues to report software bugs or suggest new feature requests. Due to the lack of experience and limited professional skills
users may struggle to summarize the content of issues accurately and effectively
resulting in low-quality issue titles
which in turn decreases the efficiency of addressing issues. Additionally
existing automatic issue title generation methods are primarily designed for English open-source platforms
such as GitHub
and the performance are degraded when applied to Chinese open-source platforms
like Gitee. Furthermore
existing methods mainly use the issue body description as inputs
ignoring the code snippets in the issue. In this paper
we propose a method called GITG (Gitee Issue Title Generation) specifically designed for Gitee
an open-source platform. GITG addresses the challenge of generating issue titles for both Chinese and English text by fine-tuning the Chinese BART (Bidirectional and Auto-Regressive Transformers) pre-trained model on a constructed Gitee issue dataset. It leverages the bi-modal information from the issue body description and code snippets to generate informative and accurate issue titles. A dataset consisting of 18 242 Gitee issue samples is constructed to validate the effectiveness of GITG. Experimental results demonstrate that GITG outperforms iTAPE and iTiger by at least 13.09%
10.18%
and 12.84% on the ROUGE-1
ROUGE-2
and ROUGE-L metrics
respectively. GITG also achieves improvements in BLEU and METEOR metrics. Human evaluation results also indicate that the average scores of the titles generated by GITG are improved by at least 26.7%
20.8%
24.2%
and 20.0% in overall score
fluency
informativeness
and conciseness
respectively
compared to iTAPE and iTiger.
ERFANI JOORABCHI M , MIRZAAGHAEI M , MESBAH A . Works for me! Characterizing non-reproducible bug reports [C ] // Proceedings of the 11th Working Conference on Mining Software Repositories . New York : ACM , 2014 : 62 - 71 .
SOLTANI M , HERMANS F , BÄCK T . The significance of bug report elements [J ] . Empirical Software Engineering , 2020 , 25 ( 6 ): 5255 - 5294 .
MA X X , KEUNG J W , YU X , et al . AttSum: A deep attention-based summarization model for bug report title generation [J ] . IEEE Transactions on Reliability , 2023 , 72 ( 4 ): 1663 - 1677 .
CHEN S Q , XIE X Y , YIN B G , et al . Stay professional and efficient: Automatically generate titles for your bug reports [C ] // Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering . New York : ACM , 2020 : 385 - 397 .
LIN H , CHEN X , CHEN X J , et al . TitleGen-FL: Quality prediction-based filter for automated issue title generati-on [J ] . Journal of Systems and Software , 2023 , 195 : 111513 .
ZHANG T , IRSAN I C , THUNG F , et al . iTiger: An automatic issue title generation tool [C ] // Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York : ACM , 2022 : 1637 - 1641 .
中华人民共和国工业和信息化部 . “十四五”软件和信息技术服务业发展规划 [EB/OL ] . ( 2021-11-15 )[ 2024-05-10 ] . https://www.miit.gov.cn/jgsj/ghs/zlygh/art/2022/art_f43c0 68acfb14f15b8daf4238945deb0.html https://www.miit.gov.cn/jgsj/ghs/zlygh/art/2022/art_f43c068acfb14f15b8daf4238945deb0.html .
ZHANG F J , YU X , KEUNG J , et al . Improving stack overflow question title generation with copying enhanced CodeBERT model and bi-modal information [J ] . Information and Software Technology , 2022 , 148 : 106922 .
ZHOU Y L , YANG S Y , CHEN X , et al . QTC4SO: Automatic question title completion for stack overflow [C ] // Proceedings of the 31st IEEE/ACM International Conference on Program Comprehension . Piscataway : IEEE , 2023 : 1 - 12 .
SHAO Y F , GENG Z C , LIU Y T , et al . CPT: A pre-trained unbalanced transformer for both Chinese language understanding and generation [J ] . Science China (Information Sciences) , 2024 , 67 ( 5 ): 152102 .
陈翔 , 于池 , 杨光 , 等 . 基于双重信息检索的Bash代码注释生成方法 [J ] . 软件学报 , 2023 , 34 ( 3 ): 1310 - 1329 .
CHEN X , YU C , YANG G , et al . Bash code comment generation method based on dual information retrieval [J ] . Journal of Software , 2023 , 34 ( 3 ): 1310 - 1329 . (in Chinese)
郭丹 , 姚沈涛 , 王辉 , 等 . 嵌入局部聚类描述符的视频问答Transformer模型 [J ] . 计算机学报 , 2023 , 46 ( 4 ): 671 - 689 .
GUO D , YAO S T , WANG H , et al . Embedding VLAD in transformer for video question answering [J ] . Chinese Journal of Computers , 2023 , 46 ( 4 ): 671 - 689 . (in Chinese)
孙锐 , 谢瑞瑞 , 张磊 , 等 . 基于灾难性遗忘及组合叠加擦除的跨模态行人重识别预训练方法 [J ] . 电子学报 , 2023 , 51 ( 10 ): 2925 - 2935 .
SUN R , XIE R R , ZHANG L , et al . Cross-modal pedestrian re-identification pre-training method based on catastrophic forgetting and combination superimposed eras-ure [J ] . Acta Electronica Sinica , 2023 , 51 ( 10 ): 2925 - 2935 . (in Chinese)
LIU K , CHEN X , CHEN C Y , et al . Automated question title reformulation by mining modification logs from stack overflow [J ] . IEEE Transactions on Software Engineering , 2023 , 49 ( 9 ): 4390 - 4410 .
LIU S Q , GAO C Y , CHEN S , et al . ATOM: Commit message generation based on abstract syntax tree and hybrid ranking [J ] . IEEE Transactions on Software Engineering , 2022 , 48 ( 5 ): 1800 - 1817 .
刘诗凡 , 崔展齐 , 陈翔 , 等 . MMCUP: 融合多模态信息的代码注释自动更新方法 [J ] . 计算机学报 , 2024 , 47 ( 1 ): 172 - 189 .
LIU S F , CUI Z Q , CHEN X , et al . MMCUP: Updating code comments based on multi-modal information [J ] . Chinese Journal of Computers , 2024 , 47 ( 1 ): 172 - 189 . (in Chinese)
BAJAJ K , PATTABIRAMAN K , MESBAH A . Mining questions asked by web developers [C ] // Proceedings of the 11th Working Conference on Mining Software Repositories . New York : ACM 2014 : 112 - 121 .
LIN C Y . ROUGE: A package for automatic evaluation of summaries [J ] . Text Summarization Branches Out , 2004 : 74 - 81 .
PAPINENI K , ROUKOS S , WARD T , et al . BLEU: A method for automatic evaluation of machine translat-ion [C ] // Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : ACL , 2002 : 311 - 318 .
BANERJEE S , LAVIE A . METEOR: An automatic metric for MT evaluation with improved correlation with human judgments [C ] // Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization . Stroudsburg : ACL , 2005 : 65 - 72 .
陈翔 , 杨光 , 崔展齐 , 等 . 代码注释自动生成方法综述 [J ] . 软件学报 , 2021 , 32 ( 7 ): 2118 - 2141 .
CHEN X , YANG G , CUI Z Q , et al . Survey of state-of-the-art automatic code comment generation [J ] . Journal of Software , 2021 , 32 ( 7 ): 2118 - 2141 . (in Chinese)
YANG G , CHEN X , ZHOU Y L , et al . DualSC: Automatic generation and summarization of shellcode via transformer and dual learning [C ] // Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering . Piscataway : IEEE , 2022 : 361 - 372 .
CARO T M , ROPER R , YOUNG M , et al . Inter-observer reliability [J ] . Behaviour , 1979 , 69 ( 3 ): 303 - 315 .
0
浏览量
9
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621