Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio

Gokcay, Erhan

Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio

dc.authorid	Gokcay, Erhan/0000-0002-4220-199X
dc.authorscopusid	7004217859
dc.authorwosid	Gokcay, Erhan/JOK-0734-2023
dc.contributor.author	Gokcay, Erhan
dc.contributor.other	Software Engineering
dc.date.accessioned	2024-07-05T15:22:19Z
dc.date.available	2024-07-05T15:22:19Z
dc.date.issued	2023
dc.department	Atılım University	en_US
dc.department-temp	[Gokcay, Erhan] Atilim Univ, Software Engn, TR-06830 Ankara, Turkiye	en_US
dc.description	Gokcay, Erhan/0000-0002-4220-199X	en_US
dc.description.abstract	The Internet of Things is a novel concept in which numerous physical devices are linked to the internet to collect, generate, and distribute data for processing. Data storage and processing become more challenging as the number of devices increases. One solution to the problem is to reduce the amount of stored data in such a way that processing accuracy does not suffer significantly. The reduction can be lossy or lossless, depending on the type of data. The article presents a novel lossy algorithm for reducing the amount of data stored in the system. The reduction process aims to reduce the volume of data while maintaining classification accuracy and properly adjusting the reduction ratio. A nonlinear cluster distance measure is used to create subgroups so that samples can be assigned to the correct clusters even though the cluster shape is nonlinear. Each sample is assumed to arrive one at a time during the reduction. As a result of this approach, the algorithm is suitable for streaming data. The user can adjust the degree of reduction, and the reduction algorithm strives to minimize classification error. The algorithm is not dependent on any particular classification technique. Subclusters are formed and readjusted after each sample during the calculation. To summarize the data from the subclusters, representative points are calculated. The data summary that is created can be saved and used for future processing. The accuracy difference between regular and reduced datasets is used to measure the effectiveness of the proposed method. Different classifiers are used to measure the accuracy difference. The results show that the nonlinear information-theoretic cluster distance measure improves the reduction rates with higher accuracy values compared to existing studies. At the same time, the reduction rate can be adjusted as desired, which is a lacking feature in the current methods. The characteristics are discussed, and the results are compared to previously published algorithms.	en_US
dc.identifier.citationcount	0
dc.identifier.doi	10.1007/s11042-023-15897-7
dc.identifier.issn	1380-7501
dc.identifier.issn	1573-7721
dc.identifier.scopus	2-s2.0-85161359119
dc.identifier.scopusquality	Q2
dc.identifier.uri	https://doi.org/10.1007/s11042-023-15897-7
dc.identifier.uri	https://hdl.handle.net/20.500.14411/2180
dc.identifier.wos	WOS:001004157400008
dc.identifier.wosquality	Q2
dc.institutionauthor	Gökçay, Erhan
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.scopus.citedbyCount	0
dc.subject	Entropy	en_US
dc.subject	Information theory	en_US
dc.subject	Instance reduction	en_US
dc.subject	Adjustable compression	en_US
dc.title	Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio	en_US
dc.type	Article	en_US
dc.wos.citedbyCount	0
dspace.entity.type	Publication
relation.isAuthorOfPublication	07b095f1-e384-448e-8662-cd924cb2139d
relation.isAuthorOfPublication.latestForDiscovery	07b095f1-e384-448e-8662-cd924cb2139d
relation.isOrgUnitOfPublication	d86bbe4b-0f69-4303-a6de-c7ec0c515da5
relation.isOrgUnitOfPublication.latestForDiscovery	d86bbe4b-0f69-4303-a6de-c7ec0c515da5

Collections

WoS
Scopus

Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio

Files

Collections