Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs

Saleem, W.; Nazlioglu, S.

Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs

Date

2025

Authors

Saleem, W.

Nazlioglu, S.

Publisher

Institute of Electrical and Electronics Engineers Inc.

Green Open Access

No

Publicly Funded

No

Impulse

Average

Influence

Average

Popularity

Average

Abstract

The security of AI-generated code has become a growing concern as Large Language Models (LLMs) like GPT-4, Gemini, DeepSeek, and LLaMA are increasingly integrated into software development pipelines. While prior research has primarily focused on GPT-family models, the security performance of newer open models under structured prompting remains underexplored. This study evaluates the ability of modern LLMs to generate secure code using six established prompting strategies across 150 Python tasks (LLMSecEval). Generated code was assessed using two static analysis tools (Bandit and CodeQL) to detect Common Weakness Enumeration (CWE) vulnerabilities. Findings showed that Recursive Criticism and Improvement (RCI) prompting significantly improves security outcomes across all models. Notably, LLaMA produced over 15,800 lines of vulnerability-free code under RCI. Gemini and DeepSeek also showed notable improvements under guided prompting. From a tool-specific perspective, Bandit and Cod-eQL produced divergent results, with CodeQL exposing deeper or more complex vulnerabilities. These results highlight the necessity of prompt-aware security evaluations and multi-tool static analysis to ensure reliable, secure code generation from LLMs. This study offers practical insights into secure code generation for developers and researchers. © 2025 IEEE.

Keywords

Large Language Models, Prompt Engineering, Secure Code Generation, Software Security, Static Code Analysis

WoS Q

N/A

Scopus Q

N/A

Source

International Conference on Computer Science and Engineering, UBMK -- 10th International Conference on Computer Science and Engineering, UBMK 2025 -- 2025-09-17 Through 2025-09-21 -- Istanbul -- 214243

Issue

2025

Start Page

271

End Page

276

URI

https://doi.org/10.1109/UBMK67458.2025.11207030
https://hdl.handle.net/20.500.14411/11208

Collections

Scopus

Full item page

Google Scholar™

Check

Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

Green Open Access

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

BIP! Indicators

Research Projects

Journal Issue

Abstract

Description

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections

Google Scholar™

Sustainable Development Goals