Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs

Saleem, W.; Nazlioglu, S.

Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs

dc.contributor.author	Saleem, W.
dc.contributor.author	Nazlioglu, S.
dc.date.accessioned	2026-03-05T15:08:13Z
dc.date.available	2026-03-05T15:08:13Z
dc.date.issued	2025
dc.description.abstract	The security of AI-generated code has become a growing concern as Large Language Models (LLMs) like GPT-4, Gemini, DeepSeek, and LLaMA are increasingly integrated into software development pipelines. While prior research has primarily focused on GPT-family models, the security performance of newer open models under structured prompting remains underexplored. This study evaluates the ability of modern LLMs to generate secure code using six established prompting strategies across 150 Python tasks (LLMSecEval). Generated code was assessed using two static analysis tools (Bandit and CodeQL) to detect Common Weakness Enumeration (CWE) vulnerabilities. Findings showed that Recursive Criticism and Improvement (RCI) prompting significantly improves security outcomes across all models. Notably, LLaMA produced over 15,800 lines of vulnerability-free code under RCI. Gemini and DeepSeek also showed notable improvements under guided prompting. From a tool-specific perspective, Bandit and Cod-eQL produced divergent results, with CodeQL exposing deeper or more complex vulnerabilities. These results highlight the necessity of prompt-aware security evaluations and multi-tool static analysis to ensure reliable, secure code generation from LLMs. This study offers practical insights into secure code generation for developers and researchers. © 2025 IEEE.	en_US
dc.identifier.doi	10.1109/UBMK67458.2025.11207030
dc.identifier.isbn	9798331599768
dc.identifier.issn	2521-1641
dc.identifier.scopus	2-s2.0-105030819067
dc.identifier.uri	https://doi.org/10.1109/UBMK67458.2025.11207030
dc.identifier.uri	https://hdl.handle.net/20.500.14411/11208
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartof	International Conference on Computer Science and Engineering, UBMK -- 10th International Conference on Computer Science and Engineering, UBMK 2025 -- 2025-09-17 Through 2025-09-21 -- Istanbul -- 214243	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Large Language Models	en_US
dc.subject	Prompt Engineering	en_US
dc.subject	Secure Code Generation	en_US
dc.subject	Software Security	en_US
dc.subject	Static Code Analysis	en_US
dc.title	Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs	en_US
dc.type	Conference Object	en_US
dspace.entity.type	Publication
gdc.author.scopusid	60411222500
gdc.author.scopusid	57984702300
gdc.bip.impulseclass	C5
gdc.bip.influenceclass	C5
gdc.bip.popularityclass	C5
gdc.description.department	Atılım University	en_US
gdc.description.departmenttemp	[Saleem] Wardah, Department of Software Engineering, Atilim University, Ankara, Turkey; [Nazlioglu] Selma, Department of Software Engineering, Atilim University, Ankara, Turkey	en_US
gdc.description.endpage	276	en_US
gdc.description.issue	2025	en_US
gdc.description.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
gdc.description.scopusquality	N/A
gdc.description.startpage	271	en_US
gdc.description.wosquality	N/A
gdc.index.type	Scopus
gdc.oaire.diamondjournal	false
gdc.oaire.impulse	0.0
gdc.oaire.influence	2.4895952E-9
gdc.oaire.isgreen	false
gdc.oaire.popularity	2.7494755E-9
gdc.oaire.publicfunded	false
gdc.scopus.citedcount	0
gdc.virtual.author	Nazlıoğlu, Selma
relation.isAuthorOfPublication	1deb41cd-45a4-4520-bc22-7addb375a869
relation.isAuthorOfPublication.latestForDiscovery	1deb41cd-45a4-4520-bc22-7addb375a869
relation.isOrgUnitOfPublication	50be38c5-40c4-4d5f-b8e6-463e9514c6dd
relation.isOrgUnitOfPublication	4abda634-67fd-417f-bee6-59c29fc99997
relation.isOrgUnitOfPublication.latestForDiscovery	50be38c5-40c4-4d5f-b8e6-463e9514c6dd

Collections

Scopus

Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs

Files

Collections