Programming language Model for Software Engineering task (PLMSE)

NAISS 2023/22-816


NAISS Small Compute

Principal Investigator:

Sushant Kumar Pandey


Göteborgs universitet

Start Date:


End Date:


Primary Classification:

10205: Software Engineering




As software systems grow increasingly intricate, recognizing design patterns (DPs) is vital in maintaining code quality and structure. The complexity of identifying DPs often stems from the requirement of setup, code compilation, substantial training examples, and a focus limited to standard patterns, predominantly in Java. Our aim is to transcend these challenges and facilitate the recognition of diverse design and architecture patterns with exceptional accuracy within intricate industrial codebases. To achieve this, we propose the integration of pre-trained Programming Language Models (PLMs) into the DP recognition process. This initiative is underscored by our conviction that PLMs hold the potential to yield considerable benefits. By leveraging a pre-trained PLM, the need for extensive training examples is mitigated, code compilation is circumvented, and the range of patterns and languages recognized surpasses those catered to by existing DP techniques. We embark on this journey by designing and evaluating an initial approach utilizing Facebook's PLM to identify standard object-oriented DPs in Java codebases. In this project, we meticulously compare our results to state-of-the-art techniques, ensuring the benchmark for accuracy is met. Remarkably, our approach can yield an overall f-score that is comparable to established methodologies. Significantly, this study stands as one of the pioneering efforts in assessing a PLM's comprehension of higher-level design and architecture, with promising outcomes. The implications of these findings are profound, paving the way for forthcoming studies that scrutinize PLMs for recognizing a spectrum of DPs within the broader landscape of industrial software development. Apart from design patterns, ensuring the integrity of automotive software development hinges upon strict adherence to MISRA (Motor Industry Software Reliability Association) guidelines, guaranteeing code quality, safety, and dependability. This study introduces an innovative solution to identify violations of these pivotal guidelines within automotive codebases. Leveraging a pre-trained RoBERTa model, our approach exemplifies a breakthrough in detecting MISRA guideline deviations. By amassing a diverse codebase from GitHub repositories, particularly focusing on AUTOSAR codebases, we conducted pre-training to enhance the model's capacity. Complementing this effort, we meticulously curated a dataset composed of C/C++ code strictly aligned with MISRA standards. The results from comprehensive experiments underscore the efficacy of our approach, affirming its ability to accurately pinpoint MISRA guideline violations within the intricate domain of automotive software. In conclusion, our application for GPU-driven DP recognition, guideline voilation prediction signifies a paradigm shift in industrial software development. By employing the capabilities of GPUs, we transcend the limitations of conventional DP recognition methods. As digital advances, our application stands poised to revolutionize the efficiency and accuracy of DP recognition in industrial contexts.