Search Results

Now showing 1 - 4 of 4
  • Conference Object
    Citation - Scopus: 1
    Developing and Evaluating a Model-Based Metric for Legal Question Answering Systems
    (Institute of Electrical and Electronics Engineers Inc., 2023) Bakir,D.; Yildiz,B.; Aktas,M.S.
    In the complicated world of legal law, Question Answering (QA) systems only work if they can give correct, situation-aware, and logically sound answers. Traditional evaluation methods, which rely on superficial similarity measures, can't catch the complex accuracy and reasoning needed in legal answers. This means that evaluation methods need to change completely. To fix the problems with current methods, this study presents a new model-based evaluation metric that is designed to work well with legal QA systems. We are looking into the basic ideas that are needed for this kind of metric, as well as the problems of putting it into practice in the real world, finding the right technological frameworks, creating good evaluation methods. We talk about a theory framework that is based on legal standards and computational linguistics. We also talk about how the metric was created and how it can be used in real life. Our results, which come from thorough tests, show that our suggested measure is better than existing ones. It is more reliable, accurate, and useful for judging legal quality assurance systems. © 2023 IEEE.
  • Article
    Citation - Scopus: 2
    Systematic Mapping Study on Natural Language Processing for Social Robots
    (Prof.Dr. İskender AKKURT, 2024) Adem,A.İ.; Turhan,Ç.; Sezen,A.
    Nowadays, social robots are becoming increasingly sophisticated in terms of their ability to interact with humans and possess social skills, and in this context, natural language processing (NLP) plays a critical role for robots to understand and communicate with human language. Natural Language Processing (NLP) is an interdisciplinary field used to help computers understand, interpret, and generate human language with a wide range of applications. The examination of the datasets, methods/techniques and tools, and usage of speech recognition or generation in the fields of NLP is important in understanding the developments in this field. In this study, 35 out of 92 studies in the literature collected from Web of Science were examined using a systematic mapping approach, and important findings on the use of NLP in social robots were identified. In particular, emphasis was placed on the effective evaluation of the research questions in the context of NLP in social robots. This study creates a starting point that will guide research in the field of NLP use in social robots and guide future studies. © 2024, Prof.Dr. İskender AKKURT. All rights reserved.
  • Article
    A Model-Based Evaluation Metric for Question Answering Systems
    (World Scientific, 2025) Baklr, D.; Aktas, M.S.; Ylldlz, B.
    The paper addresses the limitations of traditional evaluation metrics for Question Answering (QA) systems that primarily focus on syntax and n-gram similarity. We propose a novel model-based evaluation metric, MQA-metric, and create a human-judgment-based dataset, squad-qametric and marco-qametric, to validate our approach. The research aims to solve several key problems: the objectivity in dataset labeling, the effectiveness of metrics when there is no syntax similarity, the impact of answer length on metric performance, and the influence of real answer quality on metric results. To tackle these challenges, we designed an interface for dataset labeling and conducted extensive experiments with human reviewers. Our analysis shows that the MQA-metric outperforms traditional metrics like BLEU, ROUGE and METEOR. Unlike existing metrics, MQA-metric leverages semantic comprehension through large language models (LLMs), enabling it to capture contextual nuances and synonymous expressions more effectively. This approach sets a standard for evaluating QA systems by prioritizing semantic accuracy over surface-level similarities. The proposed metric correlates better with human judgment, making it a more reliable tool for evaluating QA systems. Our contributions include the development of a robust evaluation workflow, creation of high-quality datasets, and an extensive comparison with existing evaluation methods. The results indicate that our model-based approach provides a significant improvement in assessing the quality of QA systems, which is crucial for their practical application and trustworthiness. © 2025 World Scientific Publishing Company.
  • Conference Object
    Citation - WoS: 3
    Citation - Scopus: 5
    Predicting Software Functional Size Using Natural Language Processing: an Exploratory Case Study
    (IEEE, 2024) Unlu, Huseyin; Tenekeci, Samet; Ciftci, Can; Oral, Ibrahim Baran; Atalay, Tunahan; Hacaloglu, Tuna; Demirors, Onur
    Software Size Measurement (SSM) plays an essential role in software project management as it enables the acquisition of software size, which is the primary input for development effort and schedule estimation. However, many small and medium-sized companies cannot perform objective SSM and Software Effort Estimation (SEE) due to the lack of resources and an expert workforce. This results in inadequate estimates and projects exceeding the planned time and budget. Therefore, organizations need to perform objective SSM and SEE using minimal resources without an expert workforce. In this research, we conducted an exploratory case study to predict the functional size of software project requirements using state-of-the-art large language models (LLMs). For this aim, we fine-tuned BERT and BERT_SE with a set of user stories and their respective functional size in COSMIC Function Points (CFP). We gathered the user stories included in different project requirement documents. In total size prediction, we achieved 72.8% accuracy with BERT and 74.4% accuracy with BERT_SE. In data movement-based size prediction, we achieved 87.5% average accuracy with BERT and 88.1% average accuracy with BERT_SE. Although we use relatively small datasets in model training, these results are promising and hold significant value as they demonstrate the practical utility of language models in SSM.