Leveraging GraphCodeBERT for Enhanced Bug Prediction and Code Quality Analysis in Software Development Using Machine Learning
DOI:
https://doi.org/10.70454/IJMRE.2022.20601Keywords:
GraphCodeBERT, Bug Prediction, Code Quality Analysis, Machine Learning, Source Code AnalysisAbstract
Bug prediction and code quality analysis are two crucial elements of software design with direct impact on maintainability and reliability of software. Traditional methods fail as they rely on manual inspection and infrequent feature extraction mechanisms. This paper presents a machine learning framework employing GraphCodeBERT—a programming language-specific transformer—towards enhancing the precision of bug detection and semantic source code analysis. By combining code embeddings with graph structures like Abstract Syntax Trees and Control Flow Graph, the model encapsulates both syntactic and semantic structure in code. The method involves pre-processing phases such as text cleaning, tokenization, and feature vector generation, resulting in classification by a softmax-based prediction model. Experimental comparisons to Logistic Regression, SVM, CNN, and baseline models indicate higher performance in accuracy (97.6%), precision (95.3%), recall (96.8%), and F1 score (96%). The results support GraphCodeBERT's effectiveness in offering robust and scalable solutions to bug prediction and code quality enhancement.
References
[1] Qiao, L., Li, X., Umer, Q., & Guo, P. (2020). Deep learning based software defect prediction. Neurocomputing, 385, 100-110.
[2] Akhil, R.G.Y. (2021). Improving Cloud Computing Data Security with the RSA Algorithm. International Journal of Information Technology & Computer Engineering, 9(2), ISSN 2347–3657.
[3] Qi, X., Chen, G., Li, Y., Cheng, X., & Li, C. (2019). Applying neural-network-based machine learning to additive manufacturing: current applications, challenges, and future perspectives. Engineering, 5(4), 721-729.
[4] Yalla, R.K.M.K. (2021). Cloud-Based Attribute-Based Encryption and Big Data for Safeguarding Financial Data. International Journal of Engineering Research and Science & Technology, 17 (4).
[5] Wang, W., Zhang, Y., Sui, Y., Wan, Y., Zhao, Z., Wu, J., ... & Xu, G. (2020). Reinforcement-learning-guided source code summarization using hierarchical attention. IEEE Transactions on software Engineering, 48(1), 102-119.
[6] Harikumar, N. (2021). Streamlining Geological Big Data Collection and Processing for Cloud Services. Journal of Current Science, 9(04), ISSN NO: 9726-001X.
[7] Shen, Z., & Chen, S. (2020). A survey of automatic software vulnerability detection, program repair, and defect prediction techniques. Security and Communication Networks, 2020(1), 8858010.
[8] Basava, R.G. (2021). AI-powered smart comrade robot for elderly healthcare with integrated emergency rescue system. World Journal of Advanced Engineering Technology and Sciences, 02(01), 122–131.
[9] Esteves, G., Figueiredo, E., Veloso, A., Viggiato, M., &Ziviani, N. (2020). Understanding machine learning software defect predictions. Automated Software Engineering, 27(3), 369-392.
[10] Sri, H.G. (2021). Integrating HMI display module into passive IoT optical fiber sensor network for water level monitoring and feature extraction. World Journal of Advanced Engineering Technology and Sciences, 02(01), 132–139.
[11] Kula, E., Greuter, E., Van Deursen, A., &Gousios, G. (2021). Factors affecting on-time delivery in large-scale agile software development. IEEE Transactions on Software Engineering, 48(9), 3573-3592.
[12] Rajeswaran, A. (2021). Advanced Recommender System Using Hybrid Clustering and Evolutionary Algorithms for E-Commerce Product Recommendations. International Journal of Management Research and Business Strategy, 10(1), ISSN 2319-345X.
[13] Rodríguez-Pérez, G., Robles, G., Serebrenik, A., Zaidman, A., Germán, D. M., & Gonzalez-Barahona, J. M. (2020). How bugs are born: a model to identify how bugs are introduced in software components. Empirical Software Engineering, 25, 1294-1340.
[14] Sreekar, P. (2021). Analyzing Threat Models in Vehicular Cloud Computing: Security and Privacy Challenges. International Journal of Modern Electronics and Communication Engineering, 9(4), ISSN2321-2152.
[15] Parri, J., Patara, F., Sampietro, S., & Vicario, E. (2021). A framework for model-driven engineering of resilient software-controlled systems. Computing, 103(4), 589-612.
[16] Naresh, K.R.P. (2021). Optimized Hybrid Machine Learning Framework for Enhanced Financial Fraud Detection Using E-Commerce Big Data. International Journal of Management Research & Review, 11(2), ISSN: 2249-7196.
[17] Bonavita, M., &Laloyaux, P. (2020). Machine learning for model error inference and correction. Journal of Advances in Modeling Earth Systems, 12(12), e2020MS002232.
[18] Sitaraman, S. R. (2021). AI-Driven Healthcare Systems Enhanced by Advanced Data Analytics and Mobile Computing. International Journal of Information Technology and Computer Engineering, 12(2).
[19] Laaber, C., Basmaci, M., &Salza, P. (2021). Predicting unstable software benchmarks using static source code features. Empirical Software Engineering, 26(6), 114.
[20] Mamidala, V. (2021). Enhanced Security in Cloud Computing Using Secure Multi-Party Computation (SMPC). International Journal of Computer Science and Engineering( IJCSE), 10(2), 59–72
[21] Yang, F., Simpson, G., Young, L., Ford, J., Dogan, N., & Wang, L. (2020). Impact of contouring variability on oncological PET radiomics features in the lung. Scientific reports, 10(1), 369.
[22] Sareddy, M. R. (2021). The future of HRM: Integrating machine learning algorithms for optimal workforce management. International Journal of Human Resources Management (IJHRM), 10(2).
[23] Kubelka, J., Robbes, R., &Bergel, A. (2019, May). Live programming and software evolution: Questions during a programming change task. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) (pp. 30-41). IEEE.
[24] Chetlapalli, H. (2021). Enhancing Test Generation through Pre-Trained Language Models and Evolutionary Algorithms: An Empirical Study. International Journal of Computer Science and Engineering( IJCSE), 10(1), 85–96
[25] Xu, X., Zhou, F., Zhang, K., Liu, S., &Trajcevski, G. (2021). Casflow: Exploring hierarchical structures and propagation uncertainty for cascade prediction. IEEE Transactions on Knowledge and Data Engineering, 35(4), 3484-3499.
[26] Basani, D. K. R. (2021). Leveraging Robotic Process Automation and Business Analytics in Digital Transformation: Insights from Machine Learning and AI. International Journal of Engineering Research and Science & Technology, 17(3).
[27] Qiu, S., Xu, H., Deng, J., Jiang, S., & Lu, L. (2019). Transfer convolutional neural network for cross-project defect prediction. Applied Sciences, 9(13), 2660.
[28] Sareddy, M. R. (2021). Advanced quantitative models: Markov analysis, linear functions, and logarithms in HR problem solving. International Journal of Applied Science Engineering and Management, 15(3).
[29] Medeiros, J., Couceiro, R., Duarte, G., Durães, J., Castelhano, J., Duarte, C., ... & Teixeira, C. (2021). Can EEG be adopted as a neuroscience reference for assessing software programmers’ cognitive load?. Sensors, 21(7), 2338.
[30] Bobba, J. (2021). Enterprise financial data sharing and security in hybrid cloud environments: An information fusion approach for banking sectors. International Journal of Management Research & Review, 11(3), 74–86.
[31] Song, X., Chen, C., Cui, B., & Fu, J. (2020). Malicious JavaScript detection based on bidirectional LSTM model. Applied Sciences, 10(10), 3440.
[32] Narla, S., Peddi, S., &Valivarthi, D. T. (2021). Optimizing predictive healthcare modelling in a cloud computing environment using histogram-based gradient boosting, MARS, and SoftMax regression. International Journal of Management Research and Business Strategy, 11(4).
[33] Sharbaf, M., & Zamani, B. (2020). Configurable three‐way model merging. Software: Practice and Experience, 50(8), 1565-1599.
[34] Kethu, S. S., &Purandhar, N. (2021). AI-driven intelligent CRM framework: Cloud-based solutions for customer management, feedback evaluation, and inquiry automation in telecom and banking. Journal of Science and Technology, 6(3), 253–271.
[35] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2020). A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1), 4-24.
[36] Srinivasan, K., &Awotunde, J. B. (2021). Network analysis and comparative effectiveness research in cardiology: A comprehensive review of applications and analytics. Journal of Science and Technology, 6(4), 317–332.
[37] Qiu, L., Li, H., Wang, M., & Wang, X. (2021). Gated graph attention network for cancer prediction. Sensors, 21(6), 1938.
[38] Narla, S., &Purandhar, N. (2021). AI-infused cloud solutions in CRM: Transforming customer workflows and sentiment engagement strategies. International Journal of Applied Science Engineering and Management, 15(1).
[39] Akimova, E. N., Bersenev, A. Y., Deikov, A. A., Kobylkin, K. S., Konygin, A. V., Mezentsev, I. P., &Misilov, V. E. (2021). A survey on software defect prediction using deep learning. Mathematics, 9(11), 1180.
[40] Budda, R. (2021). Integrating artificial intelligence and big data mining for IoT healthcare applications: A comprehensive framework for performance optimization, patient-centric care, and sustainable medical strategies. International Journal of Management Research & Review, 11(1), 86–97.
[41] Semasaba, A. O. A., Zheng, W., Wu, X., & Agyemang, S. A. (2020). Literature survey of deep learning‐based vulnerability analysis on source code. IET Software, 14(6), 654-664.
[42] Ganesan, T., & Devarajan, M. V. (2021). Integrating IoT, Fog, and Cloud Computing for Real-Time ECG Monitoring and Scalable Healthcare Systems Using Machine Learning-Driven Signal Processing Techniques. International Journal of Information Technology and Computer Engineering, 9(1).
[43] Luo, Z., Parvin, H., Garg, H., Qasem, S. N., Pho, K., &Mansor, Z. (2021). Dealing with imbalanced dataset leveraging boundary samples discovered by support vector data description. Computers, Materials & Continua, 66(3), 2691-2708.
[44] Pulakhandam, W., &Samudrala, V. K. (2021). Enhancing SHACS with Oblivious RAM for secure and resilient access control in cloud healthcare environments. International Journal of Engineering Research and Science & Technology, 17(2).
[45] Mahdi, M. N., Mohamed Zabil, M. H., Ahmad, A. R., Ismail, R., Yusoff, Y., Cheng, L. K., ... &Happala Naidu, H. (2021). Software project management using machine learning technique—a review. Applied Sciences, 11(11), 5183.
[46] Jayaprakasam, B. S., &Thanjaivadivel, M. (2021). Integrating deep learning and EHR analytics for real-time healthcare decision support and disease progression modeling. International Journal of Management Research & Review, 11(4), 1–15. ISSN 2249-7196.
[47] Qiu, S., Lu, L., & Jiang, S. (2019). Joint distribution matching model for distribution–adaptation‐based cross‐project defect prediction. IET software, 13(5), 393-402.
[48] Jayaprakasam, B. S., &Thanjaivadivel, M. (2021). Cloud-Enabled Time-Series Forecasting for Hospital Readmissions Using Transformer Models and Attention Mechanisms. Indo-American Journal of Life Sciences and Biotechnology, 18(1), 57-77.
[49] Nguyen, D. C., Ding, M., Pathirana, P. N., Seneviratne, A., Li, J., & Poor, H. V. (2021). Federated learning for internet of things: A comprehensive survey. IEEE Communications Surveys & Tutorials, 23(3), 1622-1658.
[50] Dyavani, N. R., &Thanjaivadivel, M. (2021). Advanced security strategies for cloud-based e-commerce: Integrating encryption, biometrics, blockchain, and zero trust for transaction protection. Journal of Current Science, 9(3), ISSN 9726-001X.
[51] Chekroud, A. M., Bondar, J., Delgadillo, J., Doherty, G., Wasil, A., Fokkema, M., ... & Choi, K. (2021). The promise of machine learning in predicting treatment outcomes in psychiatry. World Psychiatry, 20(2), 154-170.
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Chaitanya Vasamsetty, Bhavya Kadiyala, Karthick. M (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an Open Access article distributed under the term's of the Creative Common Attribution 4.0 International License permitting all use, distribution, and reproduction in any medium, provided the work is properly cited.