Selective Word Encoding for Effective Text Representation

dc.contributor.author Özkan, Savaş
dc.contributor.author Özkan, Akın
dc.date.accessioned 2024-07-05T15:28:42Z
dc.date.available 2024-07-05T15:28:42Z
dc.date.issued 2019
dc.description.abstract Determining the category of a text document from its semantic content is highly motivated in the literatureand it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve itsperformance. In particular, the studies which exploit the aggregation of word-level representations are the mainstreamtechniques used in the problem. In this paper, we tackle text representation to achieve high performance in differenttext classification tasks. Throughout the paper, three critical contributions are presented. First, to encode the wordlevel representations for each text, we adapt a trainable orderless aggregation algorithm to obtain a more discriminativeabstract representation by transforming word vectors to the text-level representation. Second, we propose an effectiveterm-weighting scheme to compute the relative importance of words from the context based on their conjunction with theproblem in an end-to-end learning manner. Third, we present a weighted loss function to mitigate the class-imbalanceproblem between the categories. To evaluate the performance, we collect two distinct datasets as Turkish parliamentrecords (i.e. written speeches of four major political parties including 30731/7683 train and test documents) and newspaper articles (i.e. daily articles of the columnists including 16000/3200 train and test documents) whose data is availableon the web. From the results, the proposed method introduces significant performance improvements to the baselinetechniques (i.e. VLAD and Fisher Vector) and achieves 0.823% and 0.878% true prediction accuracies for the partymembership and the estimation of the category of articles respectively. The performance validates that the proposed contributions (i.e. trainable word-encoding model, trainable term-weighting scheme and weighted loss function) significantlyoutperform the baselines. en_US
dc.description.sponsorship NVIDIA en_US
dc.description.sponsorship The authors are pleased to thank NVIDIA for supporting this research with the Tesla K40 graphics card. en_US
dc.identifier.doi 10.3906/elk-1805-138
dc.identifier.issn 1300-0632
dc.identifier.issn 1300-0632
dc.identifier.issn 1303-6203
dc.identifier.scopus 2-s2.0-85065847925
dc.identifier.uri https://doi.org/10.3906/elk-1805-138
dc.identifier.uri https://search.trdizin.gov.tr/tr/yayin/detay/336585/selective-word-encoding-for-effective-text-representation
dc.identifier.uri https://hdl.handle.net/20.500.14411/2834
dc.language.iso en en_US
dc.publisher Tubitak Scientific & Technological Research Council Turkey en_US
dc.relation.ispartof Turkish Journal of Electrical Engineering and Computer Sciences en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Mühendislik en_US
dc.subject Elektrik ve Elektronik en_US
dc.subject Bilgisayar Bilimleri en_US
dc.subject Yazılım Mühendisliği en_US
dc.subject Bilgisayar Bilimleri en_US
dc.subject Sibernitik en_US
dc.subject Bilgisayar Bilimleri en_US
dc.subject Bilgi Sistemleri en_US
dc.subject Bilgisayar Bilimleri en_US
dc.subject Donanım ve Mimari en_US
dc.subject Bilgisayar Bilimleri en_US
dc.subject Teori ve Metotlar en_US
dc.subject Bilgisayar Bilimleri en_US
dc.subject Yapay Zeka en_US
dc.title Selective Word Encoding for Effective Text Representation en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.scopusid 43261670800
gdc.author.scopusid 43261651300
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Atılım University en_US
gdc.description.departmenttemp ORTA DOĞU TEKNİK ÜNİVERSİTESİ,ATILIM ÜNİVERSİTESİ en_US
gdc.description.endpage 1040 en_US
gdc.description.issue 2 en_US
gdc.description.publicationcategory Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.startpage 1028 en_US
gdc.description.volume 27 en_US
gdc.description.wosquality Q3
gdc.identifier.openalex W2931352421
gdc.identifier.trdizinid 336585
gdc.identifier.wos WOS:000463355800026
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.4895952E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 1.181496E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.oaire.sciencefields 01 natural sciences
gdc.oaire.sciencefields 0105 earth and related environmental sciences
gdc.openalex.fwci 0.0
gdc.openalex.normalizedpercentile 0.03
gdc.opencitations.count 0
gdc.plumx.mendeley 4
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Özkan, Akın
gdc.wos.citedcount 0
relation.isAuthorOfPublication f399fa5d-a26e-401f-84b2-4e77e16bc0a7
relation.isAuthorOfPublication.latestForDiscovery f399fa5d-a26e-401f-84b2-4e77e16bc0a7
relation.isOrgUnitOfPublication c3c9b34a-b165-4cd6-8959-dc25e91e206b
relation.isOrgUnitOfPublication dff2e5a6-d02d-4bef-8b9e-efebe3919b10
relation.isOrgUnitOfPublication 50be38c5-40c4-4d5f-b8e6-463e9514c6dd
relation.isOrgUnitOfPublication.latestForDiscovery c3c9b34a-b165-4cd6-8959-dc25e91e206b

Files