Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation

Loading...

Journal Title

Journal ISSN

Volume Title

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

The study presented here is an empirical investigation into the performance and cost effects of Chain-of-Thought (CoT) prompting versus contextual Zero-Shot/Few-Shot prompting on GPT-5 across a variety of software engineering tasks. We used a 2x2 factorial design with 80 model-generated functions, measuring accuracy, prompt verbosity, response verbosity, and total API cost. The main finding is that there is a trade-off between better reasoning and higher resource consumption. The two-way ANOVA results indicate that CoT indeed significantly improved accuracy for both evaluators. This confirms the assumption that spelling out intermediate steps decreases black-box errors and improves logical justification. However, CoT also led to a drastic increase in response verbosity (means: 305.20 vs. 166.55 tokens) and the corresponding API costs. On the other hand, the Few-Shot context factor was primarily responsible for input complexity: it greatly increased prompt verbosity (means: 219.55 vs. 50.75 tokens), while leaving response verbosity and cost essentially unchanged. While one assessment indicated that Few-Shot improved accuracy, the other did not, suggesting that Few-Shot examples are effective only in some cases and can sometimes cause confusion rather than improvement. We conclude that, in this GPT-5-based configuration, CoT is best reserved for scenarios where the need for enhanced reasoning outweighs the associated computational overhead. By quantifying this accuracy vs cost tradeoff, our findings guide the selection of cost-efficient prompting strategies. One limitation of this study is that tasks were both generated and responded to by the LLM, thereby limiting its external validity relative to prompts crafted by human engineers. © 2026 IEEE.

Description

Keywords

Prompt Engineering, Chain-of-Thought (CoT), Empirical Study, Large Language Models, Few-Shot Learning

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Logo
OpenCitations Citation Count
N/A

Volume

Issue

Start Page

382

End Page

387

Collections

PlumX Metrics
Citations

Scopus : 0

Captures

Mendeley Readers : 9

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.00

Sustainable Development Goals