GCRIS :: Search

Now showing 1 - 3 of 3

Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio
(Springer, 2023) Gokcay, Erhan
The Internet of Things is a novel concept in which numerous physical devices are linked to the internet to collect, generate, and distribute data for processing. Data storage and processing become more challenging as the number of devices increases. One solution to the problem is to reduce the amount of stored data in such a way that processing accuracy does not suffer significantly. The reduction can be lossy or lossless, depending on the type of data. The article presents a novel lossy algorithm for reducing the amount of data stored in the system. The reduction process aims to reduce the volume of data while maintaining classification accuracy and properly adjusting the reduction ratio. A nonlinear cluster distance measure is used to create subgroups so that samples can be assigned to the correct clusters even though the cluster shape is nonlinear. Each sample is assumed to arrive one at a time during the reduction. As a result of this approach, the algorithm is suitable for streaming data. The user can adjust the degree of reduction, and the reduction algorithm strives to minimize classification error. The algorithm is not dependent on any particular classification technique. Subclusters are formed and readjusted after each sample during the calculation. To summarize the data from the subclusters, representative points are calculated. The data summary that is created can be saved and used for future processing. The accuracy difference between regular and reduced datasets is used to measure the effectiveness of the proposed method. Different classifiers are used to measure the accuracy difference. The results show that the nonlinear information-theoretic cluster distance measure improves the reduction rates with higher accuracy values compared to existing studies. At the same time, the reduction rate can be adjusted as desired, which is a lacking feature in the current methods. The characteristics are discussed, and the results are compared to previously published algorithms.
Citation - WoS: 3
Citation - Scopus: 3
An Information-Theoretic Instance-Based Classifier
(Elsevier Science inc, 2020) Gokcay, Erhan
Classification algorithms are used in many areas to determine new class labels given a training set. Many classification algorithms, linear or not, require a training phase to determine model parameters by using an iterative optimization of the cost function for that particular model or algorithm. The training phase can adjust and fine-tune the boundary line between classes. However, the process may get stuck in a local optimum, which may or may not be close to the desired solution. Another disadvantage of training processes is that upon arrival of a new sample, a retraining of the model is necessary. This work presents a new information-theoretic approach to an instance-based supervised classification. The boundary line between classes is calculated only by the data points without any external parameters or weights, and it is given in closed-form. The separation between classes is nonlinear and smooth, which reduces memorization problems. Since the method does not require a training phase, classified samples can be incorporated in the training set directly, simplifying a streaming classification operation. The boundary line can be replaced with an approximation or regression model for parametric calculations. Features and performance of the proposed method are discussed and compared with similar algorithms. (C) 2020 Elsevier Inc. All rights reserved.
A Decentralized on Demand Cloud Cpu Design With Instruction Level Virtualization
(Springer Verlag, 2018) Gokcay,E.
Cloud technology provides many advantages and provides many services over traditional computational models. Although the provided virtual services increase resource sharing and cost effectiveness of the system, each node in the system is still centralized. Different CPU and OS versions bring interoperability problems in data exchange between nodes. In most cases less powerful units are left outside the service area. These units can only be considered as consumers of the cloud system. A new service called Cloud CPU is described elsewhere where the cloud provides the computational background for the components of a virtual CPU and the computation is distributed over internet. The design is using all units connected to the internet and it achieves a massively parallel operation. In this paper, the design of Cloud CPU will be extended and description of services needed with the new architecture will be discussed. One of the new services needed is a multi-language compiler where the target language is not fixed as well as the source language. The job of the compiler is not using the cloud for execution but to distribute the computation depending on the provided instruction sets published by each node. The computation makes sense only when all units work together and there is a need to synchronize and connect all nodes included in a particular computation. The need for synchronization will be gone when the computation is finished. Therefore an on demand Cloud-OS service is needed for bookkeeping and synchronization. The need for the Cloud-OS is temporary and the on demand initiated Cloud-OS will be terminated when the computation is ended. © Springer International Publishing AG, part of Springer Nature 2018.

Filters

Settings

Sort By

Results per page

Search Results