Entropy based streaming big-data reduction with adjustable compression ratio

No Thumbnail Available

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Springer

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Organizational Units

Organizational Unit
Software Engineering
(2005)
Department of Software Engineering was founded in 2005 as the first department in Ankara in Software Engineering. The recent developments in current technologies such as Artificial Intelligence, Machine Learning, Big Data, and Blockchains, have placed Software Engineering among the top professions of today, and the future. The academic and research activities in the department are pursued with qualified faculty at Undergraduate, Graduate and Doctorate Degree levels. Our University is one of the two universities offering a Doctorate-level program in this field. In addition to focusing on the basic phases of software (analysis, design, development, testing) and relevant methodologies in detail, our department offers education in various areas of expertise, such as Object-oriented Analysis and Design, Human-Computer Interaction, Software Quality Assurance, Software Requirement Engineering, Software Design and Architecture, Software Project Management, Software Testing and Model-Driven Software Development. The curriculum of our Department is catered to graduate individuals who are prepared to take part in any phase of software development of large-scale software in line with the requirements of the software sector. Department of Software Engineering is accredited by MÜDEK (Association for Evaluation and Accreditation of Engineering Programs) until September 30th, 2021, and has been granted the EUR-ACE label that is valid in Europe. This label provides our graduates with a vital head-start to be admitted to graduate-level programs, and into working environments in European Union countries. The Big Data and Cloud Computing Laboratory, as well as MobiLab where mobile applications are developed, SimLAB, the simulation laboratory for Medical Computing, and software education laboratories of the department are equipped with various software tools and hardware to enable our students to use state-of-the-art software technologies. Our graduates are employed in software and R&D companies (Technoparks), national/international institutions developing or utilizing software technologies (such as banks, healthcare institutions, the Information Technologies departments of private and public institutions, telecommunication companies, TÜİK, SPK, BDDK, EPDK, RK, or universities), and research institutions such TÜBİTAK.

Journal Issue

Abstract

The Internet of Things is a novel concept in which numerous physical devices are linked to the internet to collect, generate, and distribute data for processing. Data storage and processing become more challenging as the number of devices increases. One solution to the problem is to reduce the amount of stored data in such a way that processing accuracy does not suffer significantly. The reduction can be lossy or lossless, depending on the type of data. The article presents a novel lossy algorithm for reducing the amount of data stored in the system. The reduction process aims to reduce the volume of data while maintaining classification accuracy and properly adjusting the reduction ratio. A nonlinear cluster distance measure is used to create subgroups so that samples can be assigned to the correct clusters even though the cluster shape is nonlinear. Each sample is assumed to arrive one at a time during the reduction. As a result of this approach, the algorithm is suitable for streaming data. The user can adjust the degree of reduction, and the reduction algorithm strives to minimize classification error. The algorithm is not dependent on any particular classification technique. Subclusters are formed and readjusted after each sample during the calculation. To summarize the data from the subclusters, representative points are calculated. The data summary that is created can be saved and used for future processing. The accuracy difference between regular and reduced datasets is used to measure the effectiveness of the proposed method. Different classifiers are used to measure the accuracy difference. The results show that the nonlinear information-theoretic cluster distance measure improves the reduction rates with higher accuracy values compared to existing studies. At the same time, the reduction rate can be adjusted as desired, which is a lacking feature in the current methods. The characteristics are discussed, and the results are compared to previously published algorithms.

Description

Gokcay, Erhan/0000-0002-4220-199X

Keywords

Entropy, Information theory, Instance reduction, Adjustable compression

Turkish CoHE Thesis Center URL

Fields of Science

Citation

0

WoS Q

Q2

Scopus Q

Q2

Source

Volume

Issue

Start Page

End Page

Collections