Improving Word Embedding Quality With Innovative Automated Approaches To Hyperparameters

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Wiley

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Top 10%
Influence
Top 10%
Popularity
Top 10%

Research Projects

Journal Issue

Abstract

Deep learning practices have a great impact in many areas. Big data and significant hardware developments are the main reasons behind deep learning success. Recent advances in deep learning have led to significant improvements in text analysis and classification. Progress in the quality of word representation is an important factor among these improvements. In this study, we aimed to develop word2vec word representation, also called embedding, by automatically optimizing hyperparameters. Minimum word count, vector size, window size, negative sample, and iteration number were used to improve word embedding. We introduce two approaches for setting hyperparameters that are faster than grid search and random search. Word embeddings were created using documents of approximately 300 million words. We measured the quality of word embedding using a deep learning classification model on documents of 10 different classes. It was observed that the optimization of the values of hyperparameters alone increased classification success by 9%. In addition, we demonstrate the benefits of our approaches by comparing the semantic and syntactic relations between word embedding using default and optimized hyperparameters.

Description

YILDIZ, Beytullah/0000-0001-7664-5145

Keywords

deep learning, machine learning, text analysis, text classification, word embedding, word2vec

Turkish CoHE Thesis Center URL

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, 01 natural sciences, 0105 earth and related environmental sciences

Citation

WoS Q

Q3

Scopus Q

Q2
OpenCitations Logo
OpenCitations Citation Count
16

Source

Concurrency and Computation: Practice and Experience

Volume

33

Issue

18

Start Page

End Page

Collections

PlumX Metrics
Citations

CrossRef : 7

Scopus : 13

Captures

Mendeley Readers : 17

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
2.68096793

Sustainable Development Goals

2

ZERO HUNGER
ZERO HUNGER Logo

3

GOOD HEALTH AND WELL-BEING
GOOD HEALTH AND WELL-BEING Logo

5

GENDER EQUALITY
GENDER EQUALITY Logo

6

CLEAN WATER AND SANITATION
CLEAN WATER AND SANITATION Logo

11

SUSTAINABLE CITIES AND COMMUNITIES
SUSTAINABLE CITIES AND COMMUNITIES Logo

14

LIFE BELOW WATER
LIFE BELOW WATER Logo

15

LIFE ON LAND
LIFE ON LAND Logo

16

PEACE, JUSTICE AND STRONG INSTITUTIONS
PEACE, JUSTICE AND STRONG INSTITUTIONS Logo

17

PARTNERSHIPS FOR THE GOALS
PARTNERSHIPS FOR THE GOALS Logo