Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio

dc.contributor.author Gokcay, Erhan
dc.date.accessioned 2024-07-05T15:22:19Z
dc.date.available 2024-07-05T15:22:19Z
dc.date.issued 2023
dc.description Gokcay, Erhan/0000-0002-4220-199X en_US
dc.description.abstract The Internet of Things is a novel concept in which numerous physical devices are linked to the internet to collect, generate, and distribute data for processing. Data storage and processing become more challenging as the number of devices increases. One solution to the problem is to reduce the amount of stored data in such a way that processing accuracy does not suffer significantly. The reduction can be lossy or lossless, depending on the type of data. The article presents a novel lossy algorithm for reducing the amount of data stored in the system. The reduction process aims to reduce the volume of data while maintaining classification accuracy and properly adjusting the reduction ratio. A nonlinear cluster distance measure is used to create subgroups so that samples can be assigned to the correct clusters even though the cluster shape is nonlinear. Each sample is assumed to arrive one at a time during the reduction. As a result of this approach, the algorithm is suitable for streaming data. The user can adjust the degree of reduction, and the reduction algorithm strives to minimize classification error. The algorithm is not dependent on any particular classification technique. Subclusters are formed and readjusted after each sample during the calculation. To summarize the data from the subclusters, representative points are calculated. The data summary that is created can be saved and used for future processing. The accuracy difference between regular and reduced datasets is used to measure the effectiveness of the proposed method. Different classifiers are used to measure the accuracy difference. The results show that the nonlinear information-theoretic cluster distance measure improves the reduction rates with higher accuracy values compared to existing studies. At the same time, the reduction rate can be adjusted as desired, which is a lacking feature in the current methods. The characteristics are discussed, and the results are compared to previously published algorithms. en_US
dc.identifier.doi 10.1007/s11042-023-15897-7
dc.identifier.issn 1380-7501
dc.identifier.issn 1573-7721
dc.identifier.scopus 2-s2.0-85161359119
dc.identifier.uri https://doi.org/10.1007/s11042-023-15897-7
dc.identifier.uri https://hdl.handle.net/20.500.14411/2180
dc.language.iso en en_US
dc.publisher Springer en_US
dc.relation.ispartof Multimedia Tools and Applications
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Entropy en_US
dc.subject Information theory en_US
dc.subject Instance reduction en_US
dc.subject Adjustable compression en_US
dc.title Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Gokcay, Erhan/0000-0002-4220-199X
gdc.author.institutional Gokcay, Erhan (7004217859)
gdc.author.scopusid 7004217859
gdc.author.wosid Gokcay, Erhan/JOK-0734-2023
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Atılım University en_US
gdc.description.departmenttemp [Gokcay, Erhan] Atilim Univ, Software Engn, TR-06830 Ankara, Turkiye en_US
gdc.description.endpage 2681
gdc.description.issue 1
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.startpage 2647
gdc.description.volume 83
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q2
gdc.identifier.openalex W4379797924
gdc.identifier.wos WOS:001004157400008
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 1.0
gdc.oaire.influence 2.4893692E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.6830604E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration National
gdc.openalex.fwci 0.1766
gdc.openalex.normalizedpercentile 0.54
gdc.opencitations.count 1
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Gökçay, Erhan
gdc.wos.citedcount 0
relation.isAuthorOfPublication 07b095f1-e384-448e-8662-cd924cb2139d
relation.isAuthorOfPublication.latestForDiscovery 07b095f1-e384-448e-8662-cd924cb2139d
relation.isOrgUnitOfPublication d86bbe4b-0f69-4303-a6de-c7ec0c515da5
relation.isOrgUnitOfPublication 4abda634-67fd-417f-bee6-59c29fc99997
relation.isOrgUnitOfPublication 50be38c5-40c4-4d5f-b8e6-463e9514c6dd
relation.isOrgUnitOfPublication.latestForDiscovery d86bbe4b-0f69-4303-a6de-c7ec0c515da5

Files

Collections