Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability

Briman, Mohammed Khalid Hilmi; Yıldız, Beytullah; Yildiz, Beytullah

Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability

Date

2024

Authors

Briman, Mohammed Khalid Hilmi

Yıldız, Beytullah

Yildiz, Beytullah

Yıldız, Beytullah

Publisher

World Scientific Publ Co Pte Ltd

Organizational Units

Organizational Unit

Software Engineering

(2005)

Department of Software Engineering was founded in 2005 as the first department in Ankara in Software Engineering. The recent developments in current technologies such as Artificial Intelligence, Machine Learning, Big Data, and Blockchains, have placed Software Engineering among the top professions of today, and the future. The academic and research activities in the department are pursued with qualified faculty at Undergraduate, Graduate and Doctorate Degree levels. Our University is one of the two universities offering a Doctorate-level program in this field. In addition to focusing on the basic phases of software (analysis, design, development, testing) and relevant methodologies in detail, our department offers education in various areas of expertise, such as Object-oriented Analysis and Design, Human-Computer Interaction, Software Quality Assurance, Software Requirement Engineering, Software Design and Architecture, Software Project Management, Software Testing and Model-Driven Software Development. The curriculum of our Department is catered to graduate individuals who are prepared to take part in any phase of software development of large-scale software in line with the requirements of the software sector. Department of Software Engineering is accredited by MÜDEK (Association for Evaluation and Accreditation of Engineering Programs) until September 30th, 2021, and has been granted the EUR-ACE label that is valid in Europe. This label provides our graduates with a vital head-start to be admitted to graduate-level programs, and into working environments in European Union countries. The Big Data and Cloud Computing Laboratory, as well as MobiLab where mobile applications are developed, SimLAB, the simulation laboratory for Medical Computing, and software education laboratories of the department are equipped with various software tools and hardware to enable our students to use state-of-the-art software technologies. Our graduates are employed in software and R&D companies (Technoparks), national/international institutions developing or utilizing software technologies (such as banks, healthcare institutions, the Information Technologies departments of private and public institutions, telecommunication companies, TÜİK, SPK, BDDK, EPDK, RK, or universities), and research institutions such TÜBİTAK.

Abstract

A vast amount of textual information on the internet has amplified the importance of text summarization models. Abstractive summarization generates original words and sentences that may not exist in the source document to be summarized. Such abstractive models may suffer from shortcomings such as linguistic acceptability and hallucinations. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a metric commonly used to evaluate abstractive summarization models. However, due to its n-gram-based approach, it ignores several critical linguistic aspects. In this work, we propose Similarity, Entailment, and Acceptability Score (SEAScore), an automatic evaluation metric for evaluating abstractive text summarization models using the power of state-of-the-art pre-trained language models. SEAScore comprises three language models (LMs) that extract meaningful linguistic features from candidate and reference summaries and a weighted sum aggregator that computes an evaluation score. Experimental results show that our LM-based SEAScore metric correlates better with human judgment than standard evaluation metrics such as ROUGE-N and BERTScore.

Description

YILDIZ, Beytullah/0000-0001-7664-5145; Briman, Mohammed Khalid Hilmi/0009-0000-5785-6916

ORCID

YILDIZ, Beytullah

Briman, Mohammed Khalid Hilmi

Keywords

Machine learning, deep learning, natural language processing, transformer, text summarization, language models

WoS Q

Q4

Scopus Q

Q3

Volume

33

Issue

5

URI

https://doi.org/10.1142/S0218213024500179
https://hdl.handle.net/20.500.14411/7300

Collections

WoS
Scopus

Full item page

Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Organizational Units

Journal Issue

Events

Abstract

Description

ORCID

Keywords

Turkish CoHE Thesis Center URL

Fields of Science

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections