3 results
Search Results
Now showing 1 - 3 of 3
Conference Object Citation - Scopus: 1Toxicity Detection Using State of the Art Natural Language Methodologies(Ieee, 2023) Keskin, Enes Faruk; Acikgoz, Erkut; Dogan, GulustanIn this paper, the studies carried out to detect objectionable expressions in any text will be explained. Experiments were performed with Sentence transformers, supervised machine learning algorithms, and Bert transformer architecture trained in English, and the results were observed. To prepare the dataset used in the experiments, the natural language processing and machine learning methodologies of the toxic and non-toxic contents in the labeled text data obtained from the Kaggle platform are explained, and then the methods and performances of the models trained using this dataset are summarized in this paper.Article Citation - Scopus: 88Is Chatgpt Accurate and Reliable in Answering Questions Regarding Head and Neck Cancer?(Frontiers Media SA, 2023) Kuşcu,O.; Pamuk,A.E.; Sütay Süslü,N.; Hosal,S.Background and objective: Chat Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence (AI)-based language processing model using deep learning to create human-like text dialogue. It has been a popular source of information covering vast number of topics including medicine. Patient education in head and neck cancer (HNC) is crucial to enhance the understanding of patients about their medical condition, diagnosis, and treatment options. Therefore, this study aims to examine the accuracy and reliability of ChatGPT in answering questions regarding HNC. Methods: 154 head and neck cancer-related questions were compiled from sources including professional societies, institutions, patient support groups, and social media. These questions were categorized into topics like basic knowledge, diagnosis, treatment, recovery, operative risks, complications, follow-up, and cancer prevention. ChatGPT was queried with each question, and two experienced head and neck surgeons assessed each response independently for accuracy and reproducibility. Responses were rated on a scale: (1) comprehensive/correct, (2) incomplete/partially correct, (3) a mix of accurate and inaccurate/misleading, and (4) completely inaccurate/irrelevant. Discrepancies in grading were resolved by a third reviewer. Reproducibility was evaluated by repeating questions and analyzing grading consistency. Results: ChatGPT yielded “comprehensive/correct” responses to 133/154 (86.4%) of the questions whereas, rates of “incomplete/partially correct” and “mixed with accurate and inaccurate data/misleading” responses were 11% and 2.6%, respectively. There were no “completely inaccurate/irrelevant” responses. According to category, the model provided “comprehensive/correct” answers to 80.6% of questions regarding “basic knowledge”, 92.6% related to “diagnosis”, 88.9% related to “treatment”, 80% related to “recovery – operative risks – complications – follow-up”, 100% related to “cancer prevention” and 92.9% related to “other”. There was not any significant difference between the categories regarding the grades of ChatGPT responses (p=0.88). The rate of reproducibility was 94.1% (145 of 154 questions). Conclusion: ChatGPT generated substantially accurate and reproducible information to diverse medical queries related to HNC. Despite its limitations, it can be a useful source of information for both patients and medical professionals. With further developments in the model, ChatGPT can also play a crucial role in clinical decision support to provide the clinicians with up-to-date information. Copyright © 2023 Kuşcu, Pamuk, Sütay Süslü and Hosal.Article Citation - WoS: 6Citation - Scopus: 10Beyond Rouge: a Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability(World Scientific Publ Co Pte Ltd, 2024) Briman, Mohammed Khalid Hilmi; Yıldız, Beytullah; Yildiz, Beytullah; Yıldız, BeytullahA vast amount of textual information on the internet has amplified the importance of text summarization models. Abstractive summarization generates original words and sentences that may not exist in the source document to be summarized. Such abstractive models may suffer from shortcomings such as linguistic acceptability and hallucinations. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a metric commonly used to evaluate abstractive summarization models. However, due to its n-gram-based approach, it ignores several critical linguistic aspects. In this work, we propose Similarity, Entailment, and Acceptability Score (SEAScore), an automatic evaluation metric for evaluating abstractive text summarization models using the power of state-of-the-art pre-trained language models. SEAScore comprises three language models (LMs) that extract meaningful linguistic features from candidate and reference summaries and a weighted sum aggregator that computes an evaluation score. Experimental results show that our LM-based SEAScore metric correlates better with human judgment than standard evaluation metrics such as ROUGE-N and BERTScore.

