Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability

dc.contributor.author Briman, Mohammed Khalid Hilmi
dc.contributor.author Yıldız, Beytullah
dc.contributor.author Yildiz, Beytullah
dc.contributor.author Yıldız, Beytullah
dc.contributor.other Software Engineering
dc.date.accessioned 2024-09-10T21:33:39Z
dc.date.available 2024-09-10T21:33:39Z
dc.date.issued 2024
dc.description YILDIZ, Beytullah/0000-0001-7664-5145; Briman, Mohammed Khalid Hilmi/0009-0000-5785-6916 en_US
dc.description.abstract A vast amount of textual information on the internet has amplified the importance of text summarization models. Abstractive summarization generates original words and sentences that may not exist in the source document to be summarized. Such abstractive models may suffer from shortcomings such as linguistic acceptability and hallucinations. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a metric commonly used to evaluate abstractive summarization models. However, due to its n-gram-based approach, it ignores several critical linguistic aspects. In this work, we propose Similarity, Entailment, and Acceptability Score (SEAScore), an automatic evaluation metric for evaluating abstractive text summarization models using the power of state-of-the-art pre-trained language models. SEAScore comprises three language models (LMs) that extract meaningful linguistic features from candidate and reference summaries and a weighted sum aggregator that computes an evaluation score. Experimental results show that our LM-based SEAScore metric correlates better with human judgment than standard evaluation metrics such as ROUGE-N and BERTScore. en_US
dc.identifier.doi 10.1142/S0218213024500179
dc.identifier.issn 0218-2130
dc.identifier.issn 1793-6349
dc.identifier.scopus 2-s2.0-85199505625
dc.identifier.uri https://doi.org/10.1142/S0218213024500179
dc.identifier.uri https://hdl.handle.net/20.500.14411/7300
dc.language.iso en en_US
dc.publisher World Scientific Publ Co Pte Ltd en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Machine learning en_US
dc.subject deep learning en_US
dc.subject natural language processing en_US
dc.subject transformer en_US
dc.subject text summarization en_US
dc.subject language models en_US
dc.title Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id YILDIZ, Beytullah/0000-0001-7664-5145
gdc.author.id Briman, Mohammed Khalid Hilmi/0009-0000-5785-6916
gdc.author.institutional Yıldız, Beytullah
gdc.author.scopusid 59211549500
gdc.author.scopusid 14632851900
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.description.department Atılım University en_US
gdc.description.departmenttemp [Briman, Mohammed Khalid Hilmi] Atilim Univ, Comp Engn Dept, TR-06830 Incek, Ankara, Turkiye; [Yildiz, Beytullah] Atilim Univ, Software Engn Dept, TR-06830 Incek, Ankara, Turkiye en_US
gdc.description.issue 5 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q3
gdc.description.volume 33 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q4
gdc.identifier.wos WOS:001275042600001
gdc.scopus.citedcount 2
gdc.wos.citedcount 2
relation.isAuthorOfPublication 8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isAuthorOfPublication 8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isAuthorOfPublication 8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isAuthorOfPublication.latestForDiscovery 8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isOrgUnitOfPublication d86bbe4b-0f69-4303-a6de-c7ec0c515da5
relation.isOrgUnitOfPublication.latestForDiscovery d86bbe4b-0f69-4303-a6de-c7ec0c515da5

Files

Collections