Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability
| dc.contributor.author | Briman, Mohammed Khalid Hilmi | |
| dc.contributor.author | Yıldız, Beytullah | |
| dc.contributor.author | Yildiz, Beytullah | |
| dc.contributor.author | Yıldız, Beytullah | |
| dc.contributor.other | Software Engineering | |
| dc.date.accessioned | 2024-09-10T21:33:39Z | |
| dc.date.available | 2024-09-10T21:33:39Z | |
| dc.date.issued | 2024 | |
| dc.description | YILDIZ, Beytullah/0000-0001-7664-5145; Briman, Mohammed Khalid Hilmi/0009-0000-5785-6916 | en_US |
| dc.description.abstract | A vast amount of textual information on the internet has amplified the importance of text summarization models. Abstractive summarization generates original words and sentences that may not exist in the source document to be summarized. Such abstractive models may suffer from shortcomings such as linguistic acceptability and hallucinations. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a metric commonly used to evaluate abstractive summarization models. However, due to its n-gram-based approach, it ignores several critical linguistic aspects. In this work, we propose Similarity, Entailment, and Acceptability Score (SEAScore), an automatic evaluation metric for evaluating abstractive text summarization models using the power of state-of-the-art pre-trained language models. SEAScore comprises three language models (LMs) that extract meaningful linguistic features from candidate and reference summaries and a weighted sum aggregator that computes an evaluation score. Experimental results show that our LM-based SEAScore metric correlates better with human judgment than standard evaluation metrics such as ROUGE-N and BERTScore. | en_US |
| dc.identifier.doi | 10.1142/S0218213024500179 | |
| dc.identifier.issn | 0218-2130 | |
| dc.identifier.issn | 1793-6349 | |
| dc.identifier.scopus | 2-s2.0-85199505625 | |
| dc.identifier.uri | https://doi.org/10.1142/S0218213024500179 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14411/7300 | |
| dc.language.iso | en | en_US |
| dc.publisher | World Scientific Publ Co Pte Ltd | en_US |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | Machine learning | en_US |
| dc.subject | deep learning | en_US |
| dc.subject | natural language processing | en_US |
| dc.subject | transformer | en_US |
| dc.subject | text summarization | en_US |
| dc.subject | language models | en_US |
| dc.title | Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability | en_US |
| dc.type | Article | en_US |
| dspace.entity.type | Publication | |
| gdc.author.id | YILDIZ, Beytullah/0000-0001-7664-5145 | |
| gdc.author.id | Briman, Mohammed Khalid Hilmi/0009-0000-5785-6916 | |
| gdc.author.institutional | Yıldız, Beytullah | |
| gdc.author.scopusid | 59211549500 | |
| gdc.author.scopusid | 14632851900 | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::journal::journal article | |
| gdc.description.department | Atılım University | en_US |
| gdc.description.departmenttemp | [Briman, Mohammed Khalid Hilmi] Atilim Univ, Comp Engn Dept, TR-06830 Incek, Ankara, Turkiye; [Yildiz, Beytullah] Atilim Univ, Software Engn Dept, TR-06830 Incek, Ankara, Turkiye | en_US |
| gdc.description.issue | 5 | en_US |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
| gdc.description.scopusquality | Q3 | |
| gdc.description.volume | 33 | en_US |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.description.wosquality | Q4 | |
| gdc.identifier.wos | WOS:001275042600001 | |
| gdc.scopus.citedcount | 2 | |
| gdc.wos.citedcount | 2 | |
| relation.isAuthorOfPublication | 8eb144cb-95ff-4557-a99c-cd0ffa90749d | |
| relation.isAuthorOfPublication | 8eb144cb-95ff-4557-a99c-cd0ffa90749d | |
| relation.isAuthorOfPublication | 8eb144cb-95ff-4557-a99c-cd0ffa90749d | |
| relation.isAuthorOfPublication.latestForDiscovery | 8eb144cb-95ff-4557-a99c-cd0ffa90749d | |
| relation.isOrgUnitOfPublication | d86bbe4b-0f69-4303-a6de-c7ec0c515da5 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | d86bbe4b-0f69-4303-a6de-c7ec0c515da5 |