Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs

dc.contributor.author Saleem, W.
dc.contributor.author Nazlioglu, S.
dc.date.accessioned 2026-03-05T15:08:13Z
dc.date.available 2026-03-05T15:08:13Z
dc.date.issued 2025
dc.description.abstract The security of AI-generated code has become a growing concern as Large Language Models (LLMs) like GPT-4, Gemini, DeepSeek, and LLaMA are increasingly integrated into software development pipelines. While prior research has primarily focused on GPT-family models, the security performance of newer open models under structured prompting remains underexplored. This study evaluates the ability of modern LLMs to generate secure code using six established prompting strategies across 150 Python tasks (LLMSecEval). Generated code was assessed using two static analysis tools (Bandit and CodeQL) to detect Common Weakness Enumeration (CWE) vulnerabilities. Findings showed that Recursive Criticism and Improvement (RCI) prompting significantly improves security outcomes across all models. Notably, LLaMA produced over 15,800 lines of vulnerability-free code under RCI. Gemini and DeepSeek also showed notable improvements under guided prompting. From a tool-specific perspective, Bandit and Cod-eQL produced divergent results, with CodeQL exposing deeper or more complex vulnerabilities. These results highlight the necessity of prompt-aware security evaluations and multi-tool static analysis to ensure reliable, secure code generation from LLMs. This study offers practical insights into secure code generation for developers and researchers. © 2025 IEEE. en_US
dc.identifier.doi 10.1109/UBMK67458.2025.11207030
dc.identifier.isbn 9798331599768
dc.identifier.issn 2521-1641
dc.identifier.scopus 2-s2.0-105030819067
dc.identifier.uri https://doi.org/10.1109/UBMK67458.2025.11207030
dc.identifier.uri https://hdl.handle.net/20.500.14411/11208
dc.language.iso en en_US
dc.publisher Institute of Electrical and Electronics Engineers Inc. en_US
dc.relation.ispartof International Conference on Computer Science and Engineering, UBMK -- 10th International Conference on Computer Science and Engineering, UBMK 2025 -- 2025-09-17 Through 2025-09-21 -- Istanbul -- 214243 en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Large Language Models en_US
dc.subject Prompt Engineering en_US
dc.subject Secure Code Generation en_US
dc.subject Software Security en_US
dc.subject Static Code Analysis en_US
dc.title Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.scopusid 60411222500
gdc.author.scopusid 57984702300
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.description.department Atılım University en_US
gdc.description.departmenttemp [Saleem] Wardah, Department of Software Engineering, Atilim University, Ankara, Turkey; [Nazlioglu] Selma, Department of Software Engineering, Atilim University, Ankara, Turkey en_US
gdc.description.endpage 276 en_US
gdc.description.issue 2025 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 271 en_US
gdc.description.wosquality N/A
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.4895952E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.7494755E-9
gdc.oaire.publicfunded false
gdc.scopus.citedcount 0
gdc.virtual.author Nazlıoğlu, Selma
relation.isAuthorOfPublication 1deb41cd-45a4-4520-bc22-7addb375a869
relation.isAuthorOfPublication.latestForDiscovery 1deb41cd-45a4-4520-bc22-7addb375a869
relation.isOrgUnitOfPublication 50be38c5-40c4-4d5f-b8e6-463e9514c6dd
relation.isOrgUnitOfPublication 4abda634-67fd-417f-bee6-59c29fc99997
relation.isOrgUnitOfPublication.latestForDiscovery 50be38c5-40c4-4d5f-b8e6-463e9514c6dd

Files

Collections