Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Open Access Color
Green Open Access
No
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
The study presented here is an empirical investigation into the performance and cost effects of Chain-of-Thought (CoT) prompting versus contextual Zero-Shot/Few-Shot prompting on GPT-5 across a variety of software engineering tasks. We used a 2x2 factorial design with 80 model-generated functions, measuring accuracy, prompt verbosity, response verbosity, and total API cost. The main finding is that there is a trade-off between better reasoning and higher resource consumption. The two-way ANOVA results indicate that CoT indeed significantly improved accuracy for both evaluators. This confirms the assumption that spelling out intermediate steps decreases black-box errors and improves logical justification. However, CoT also led to a drastic increase in response verbosity (means: 305.20 vs. 166.55 tokens) and the corresponding API costs. On the other hand, the Few-Shot context factor was primarily responsible for input complexity: it greatly increased prompt verbosity (means: 219.55 vs. 50.75 tokens), while leaving response verbosity and cost essentially unchanged. While one assessment indicated that Few-Shot improved accuracy, the other did not, suggesting that Few-Shot examples are effective only in some cases and can sometimes cause confusion rather than improvement. We conclude that, in this GPT-5-based configuration, CoT is best reserved for scenarios where the need for enhanced reasoning outweighs the associated computational overhead. By quantifying this accuracy vs cost tradeoff, our findings guide the selection of cost-efficient prompting strategies. One limitation of this study is that tasks were both generated and responded to by the LLM, thereby limiting its external validity relative to prompts crafted by human engineers. © 2026 IEEE.
Description
Keywords
Prompt Engineering, Chain-of-Thought (CoT), Empirical Study, Large Language Models, Few-Shot Learning
Fields of Science
Citation
WoS Q
Scopus Q

OpenCitations Citation Count
N/A
Volume
Issue
Start Page
382
End Page
387
Collections
PlumX Metrics
Citations
Scopus : 0
Captures
Mendeley Readers : 9
Google Scholar™


