Search Results

Now showing 1 - 2 of 2
  • Article
    Citation - Scopus: 1
    Opportunities and Challenges of AI in Educational Assessment
    (Assoc Measurement & Evaluation Education & Psychology, 2024) Sahın, Alper; Thompson, Nathan; Ercikan, Kadriye
    In the past few years, as artificial intelligence (AI) and large language models (LLM) have rapidly entered our lives, we have witnessed groundbreaking innovations across numerous fields. The rapid pace of these changes has been met with excitement by some and apprehension by others. However, we all agree that they have made tremendous contributions so far and their contributions in the future will reshape our existence. The field of educational assessment is no exception. With this in mind, we issued a call for a special issue themed “Opportunities and Challenges of AI in Educational Assessment.” which finally included seven distinguished articles on subthemes of fair and responsible use of AI in educational assessment, learning analytics, automated scoring, and real-life examples of AI and LLM.
  • Conference Object
    Prompting for Security: A Cross-Model Evaluation of Code Generation in LLMs
    (Institute of Electrical and Electronics Engineers Inc., 2025) Saleem, W.; Nazlioglu, S.
    The security of AI-generated code has become a growing concern as Large Language Models (LLMs) like GPT-4, Gemini, DeepSeek, and LLaMA are increasingly integrated into software development pipelines. While prior research has primarily focused on GPT-family models, the security performance of newer open models under structured prompting remains underexplored. This study evaluates the ability of modern LLMs to generate secure code using six established prompting strategies across 150 Python tasks (LLMSecEval). Generated code was assessed using two static analysis tools (Bandit and CodeQL) to detect Common Weakness Enumeration (CWE) vulnerabilities. Findings showed that Recursive Criticism and Improvement (RCI) prompting significantly improves security outcomes across all models. Notably, LLaMA produced over 15,800 lines of vulnerability-free code under RCI. Gemini and DeepSeek also showed notable improvements under guided prompting. From a tool-specific perspective, Bandit and Cod-eQL produced divergent results, with CodeQL exposing deeper or more complex vulnerabilities. These results highlight the necessity of prompt-aware security evaluations and multi-tool static analysis to ensure reliable, secure code generation from LLMs. This study offers practical insights into secure code generation for developers and researchers. © 2025 IEEE.