Selective Word Encoding for Effective Text Representation
dc.authorscopusid | 43261670800 | |
dc.authorscopusid | 43261651300 | |
dc.contributor.author | Özkan, Savaş | |
dc.contributor.author | Özkan, Akın | |
dc.contributor.other | Department of Electrical & Electronics Engineering | |
dc.date.accessioned | 2024-07-05T15:28:42Z | |
dc.date.available | 2024-07-05T15:28:42Z | |
dc.date.issued | 2019 | |
dc.department | Atılım University | en_US |
dc.department-temp | ORTA DOĞU TEKNİK ÜNİVERSİTESİ,ATILIM ÜNİVERSİTESİ | en_US |
dc.description.abstract | Determining the category of a text document from its semantic content is highly motivated in the literatureand it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve itsperformance. In particular, the studies which exploit the aggregation of word-level representations are the mainstreamtechniques used in the problem. In this paper, we tackle text representation to achieve high performance in differenttext classification tasks. Throughout the paper, three critical contributions are presented. First, to encode the wordlevel representations for each text, we adapt a trainable orderless aggregation algorithm to obtain a more discriminativeabstract representation by transforming word vectors to the text-level representation. Second, we propose an effectiveterm-weighting scheme to compute the relative importance of words from the context based on their conjunction with theproblem in an end-to-end learning manner. Third, we present a weighted loss function to mitigate the class-imbalanceproblem between the categories. To evaluate the performance, we collect two distinct datasets as Turkish parliamentrecords (i.e. written speeches of four major political parties including 30731/7683 train and test documents) and newspaper articles (i.e. daily articles of the columnists including 16000/3200 train and test documents) whose data is availableon the web. From the results, the proposed method introduces significant performance improvements to the baselinetechniques (i.e. VLAD and Fisher Vector) and achieves 0.823% and 0.878% true prediction accuracies for the partymembership and the estimation of the category of articles respectively. The performance validates that the proposed contributions (i.e. trainable word-encoding model, trainable term-weighting scheme and weighted loss function) significantlyoutperform the baselines. | en_US |
dc.description.sponsorship | NVIDIA | en_US |
dc.description.sponsorship | The authors are pleased to thank NVIDIA for supporting this research with the Tesla K40 graphics card. | en_US |
dc.identifier.citationcount | 0 | |
dc.identifier.doi | 10.3906/elk-1805-138 | |
dc.identifier.endpage | 1040 | en_US |
dc.identifier.issn | 1300-0632 | |
dc.identifier.issn | 1300-0632 | |
dc.identifier.issue | 2 | en_US |
dc.identifier.scopus | 2-s2.0-85065847925 | |
dc.identifier.scopusquality | Q3 | |
dc.identifier.startpage | 1028 | en_US |
dc.identifier.trdizinid | 336585 | |
dc.identifier.uri | https://doi.org/10.3906/elk-1805-138 | |
dc.identifier.uri | https://search.trdizin.gov.tr/tr/yayin/detay/336585/selective-word-encoding-for-effective-text-representation | |
dc.identifier.volume | 27 | en_US |
dc.identifier.wos | WOS:000463355800026 | |
dc.identifier.wosquality | Q4 | |
dc.institutionauthor | Özkan, Akın | |
dc.language.iso | en | en_US |
dc.publisher | Tubitak Scientific & Technological Research Council Turkey | en_US |
dc.relation.ispartof | Turkish Journal of Electrical Engineering and Computer Sciences | en_US |
dc.relation.publicationcategory | Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.scopus.citedbyCount | 0 | |
dc.subject | Mühendislik | en_US |
dc.subject | Elektrik ve Elektronik | en_US |
dc.subject | Bilgisayar Bilimleri | en_US |
dc.subject | Yazılım Mühendisliği | en_US |
dc.subject | Bilgisayar Bilimleri | en_US |
dc.subject | Sibernitik | en_US |
dc.subject | Bilgisayar Bilimleri | en_US |
dc.subject | Bilgi Sistemleri | en_US |
dc.subject | Bilgisayar Bilimleri | en_US |
dc.subject | Donanım ve Mimari | en_US |
dc.subject | Bilgisayar Bilimleri | en_US |
dc.subject | Teori ve Metotlar | en_US |
dc.subject | Bilgisayar Bilimleri | en_US |
dc.subject | Yapay Zeka | en_US |
dc.title | Selective Word Encoding for Effective Text Representation | en_US |
dc.type | Article | en_US |
dc.wos.citedbyCount | 0 | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | f399fa5d-a26e-401f-84b2-4e77e16bc0a7 | |
relation.isAuthorOfPublication.latestForDiscovery | f399fa5d-a26e-401f-84b2-4e77e16bc0a7 | |
relation.isOrgUnitOfPublication | c3c9b34a-b165-4cd6-8959-dc25e91e206b | |
relation.isOrgUnitOfPublication.latestForDiscovery | c3c9b34a-b165-4cd6-8959-dc25e91e206b |