A Model-Based Evaluation Metric for Question Answering Systems

dc.contributor.author Baklr, D.
dc.contributor.author Aktas, M.S.
dc.contributor.author Ylldlz, B.
dc.contributor.author Yildiz, Beytullah
dc.contributor.author Bakir, Dilan
dc.date.accessioned 2025-03-05T20:47:03Z
dc.date.available 2025-03-05T20:47:03Z
dc.date.issued 2025
dc.description.abstract The paper addresses the limitations of traditional evaluation metrics for Question Answering (QA) systems that primarily focus on syntax and n-gram similarity. We propose a novel model-based evaluation metric, MQA-metric, and create a human-judgment-based dataset, squad-qametric and marco-qametric, to validate our approach. The research aims to solve several key problems: the objectivity in dataset labeling, the effectiveness of metrics when there is no syntax similarity, the impact of answer length on metric performance, and the influence of real answer quality on metric results. To tackle these challenges, we designed an interface for dataset labeling and conducted extensive experiments with human reviewers. Our analysis shows that the MQA-metric outperforms traditional metrics like BLEU, ROUGE and METEOR. Unlike existing metrics, MQA-metric leverages semantic comprehension through large language models (LLMs), enabling it to capture contextual nuances and synonymous expressions more effectively. This approach sets a standard for evaluating QA systems by prioritizing semantic accuracy over surface-level similarities. The proposed metric correlates better with human judgment, making it a more reliable tool for evaluating QA systems. Our contributions include the development of a robust evaluation workflow, creation of high-quality datasets, and an extensive comparison with existing evaluation methods. The results indicate that our model-based approach provides a significant improvement in assessing the quality of QA systems, which is crucial for their practical application and trustworthiness. © 2025 World Scientific Publishing Company. en_US
dc.identifier.doi 10.1142/S0218194025500032
dc.identifier.issn 0218-1940
dc.identifier.issn 1793-6403
dc.identifier.scopus 2-s2.0-86000436474
dc.identifier.uri https://doi.org/10.1142/S0218194025500032
dc.identifier.uri https://hdl.handle.net/20.500.14411/10470
dc.language.iso en en_US
dc.publisher World Scientific en_US
dc.relation.ispartof International Journal of Software Engineering and Knowledge Engineering en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Evaluation Metric en_US
dc.subject Generative Model en_US
dc.subject Large Language Model en_US
dc.subject Natural Language Processing en_US
dc.subject Question Answering en_US
dc.subject Transformer Models en_US
dc.title A Model-Based Evaluation Metric for Question Answering Systems en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Aktas, Mehmet/0000-0001-7908-5067
gdc.author.id YILDIZ, Beytullah/0000-0001-7664-5145
gdc.author.scopusid 59677045200
gdc.author.scopusid 8410237700
gdc.author.scopusid 59677422200
gdc.author.wosid Aktas, Mehmet/G-9710-2012
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Atılım University en_US
gdc.description.departmenttemp Baklr D., Computer Engineering Department, Yildiz Technical University Istanbul, Turkey; Aktas M.S., Computer Engineering Department, Yildiz Technical University Istanbul, Turkey; Ylldlz B., Software Engineering Department, Atilim University Ankara, Turkey en_US
gdc.description.endpage 262 en_US
gdc.description.issue 2 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q3
gdc.description.startpage 243 en_US
gdc.description.volume 35 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q4
gdc.identifier.openalex W4404838894
gdc.identifier.wos WOS:001405946200001
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.3811355E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.5970819E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration National
gdc.openalex.fwci 0.7252
gdc.openalex.normalizedpercentile 0.78
gdc.opencitations.count 0
gdc.plumx.mendeley 4
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Yıldız, Beytullah
gdc.wos.citedcount 0
relation.isAuthorOfPublication 8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isAuthorOfPublication.latestForDiscovery 8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isOrgUnitOfPublication 50be38c5-40c4-4d5f-b8e6-463e9514c6dd
relation.isOrgUnitOfPublication 4abda634-67fd-417f-bee6-59c29fc99997
relation.isOrgUnitOfPublication.latestForDiscovery 50be38c5-40c4-4d5f-b8e6-463e9514c6dd

Files

Collections