Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability

dc.authoridYILDIZ, Beytullah/0000-0001-7664-5145
dc.authoridBriman, Mohammed Khalid Hilmi/0009-0000-5785-6916
dc.authorscopusid59211549500
dc.authorscopusid14632851900
dc.contributor.authorBriman, Mohammed Khalid Hilmi
dc.contributor.authorYıldız, Beytullah
dc.contributor.authorYildiz, Beytullah
dc.contributor.authorYıldız, Beytullah
dc.contributor.authorYıldız, Beytullah
dc.contributor.otherSoftware Engineering
dc.date.accessioned2024-09-10T21:33:39Z
dc.date.available2024-09-10T21:33:39Z
dc.date.issued2024
dc.departmentAtılım Universityen_US
dc.department-temp[Briman, Mohammed Khalid Hilmi] Atilim Univ, Comp Engn Dept, TR-06830 Incek, Ankara, Turkiye; [Yildiz, Beytullah] Atilim Univ, Software Engn Dept, TR-06830 Incek, Ankara, Turkiyeen_US
dc.descriptionYILDIZ, Beytullah/0000-0001-7664-5145; Briman, Mohammed Khalid Hilmi/0009-0000-5785-6916en_US
dc.description.abstractA vast amount of textual information on the internet has amplified the importance of text summarization models. Abstractive summarization generates original words and sentences that may not exist in the source document to be summarized. Such abstractive models may suffer from shortcomings such as linguistic acceptability and hallucinations. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a metric commonly used to evaluate abstractive summarization models. However, due to its n-gram-based approach, it ignores several critical linguistic aspects. In this work, we propose Similarity, Entailment, and Acceptability Score (SEAScore), an automatic evaluation metric for evaluating abstractive text summarization models using the power of state-of-the-art pre-trained language models. SEAScore comprises three language models (LMs) that extract meaningful linguistic features from candidate and reference summaries and a weighted sum aggregator that computes an evaluation score. Experimental results show that our LM-based SEAScore metric correlates better with human judgment than standard evaluation metrics such as ROUGE-N and BERTScore.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.citation0
dc.identifier.doi10.1142/S0218213024500179
dc.identifier.issn0218-2130
dc.identifier.issn1793-6349
dc.identifier.issue5en_US
dc.identifier.scopus2-s2.0-85199505625
dc.identifier.scopusqualityQ3
dc.identifier.urihttps://doi.org/10.1142/S0218213024500179
dc.identifier.urihttps://hdl.handle.net/20.500.14411/7300
dc.identifier.volume33en_US
dc.identifier.wosWOS:001275042600001
dc.identifier.wosqualityQ4
dc.language.isoenen_US
dc.publisherWorld Scientific Publ Co Pte Ltden_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectMachine learningen_US
dc.subjectdeep learningen_US
dc.subjectnatural language processingen_US
dc.subjecttransformeren_US
dc.subjecttext summarizationen_US
dc.subjectlanguage modelsen_US
dc.titleBeyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptabilityen_US
dc.typeArticleen_US
dspace.entity.typePublication
relation.isAuthorOfPublication8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isAuthorOfPublication8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isAuthorOfPublication8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isAuthorOfPublication.latestForDiscovery8eb144cb-95ff-4557-a99c-cd0ffa90749d
relation.isOrgUnitOfPublicationd86bbe4b-0f69-4303-a6de-c7ec0c515da5
relation.isOrgUnitOfPublication.latestForDiscoveryd86bbe4b-0f69-4303-a6de-c7ec0c515da5

Files

Collections