GCRIS :: Search

Search Results

Now showing 1 - 5 of 5

Entropy Based Streaming Big-Data Reduction With Adjustable Compression Ratio
(Springer, 2023) Gokcay, Erhan
The Internet of Things is a novel concept in which numerous physical devices are linked to the internet to collect, generate, and distribute data for processing. Data storage and processing become more challenging as the number of devices increases. One solution to the problem is to reduce the amount of stored data in such a way that processing accuracy does not suffer significantly. The reduction can be lossy or lossless, depending on the type of data. The article presents a novel lossy algorithm for reducing the amount of data stored in the system. The reduction process aims to reduce the volume of data while maintaining classification accuracy and properly adjusting the reduction ratio. A nonlinear cluster distance measure is used to create subgroups so that samples can be assigned to the correct clusters even though the cluster shape is nonlinear. Each sample is assumed to arrive one at a time during the reduction. As a result of this approach, the algorithm is suitable for streaming data. The user can adjust the degree of reduction, and the reduction algorithm strives to minimize classification error. The algorithm is not dependent on any particular classification technique. Subclusters are formed and readjusted after each sample during the calculation. To summarize the data from the subclusters, representative points are calculated. The data summary that is created can be saved and used for future processing. The accuracy difference between regular and reduced datasets is used to measure the effectiveness of the proposed method. Different classifiers are used to measure the accuracy difference. The results show that the nonlinear information-theoretic cluster distance measure improves the reduction rates with higher accuracy values compared to existing studies. At the same time, the reduction rate can be adjusted as desired, which is a lacking feature in the current methods. The characteristics are discussed, and the results are compared to previously published algorithms.
Software Code Smell Prediction Model Using Shannon, Renyi and Tsallis Entropies
(MDPI, 2018) Blazauskas, Tomas; Gupta, Aakanshi; Misra, Sanjay; Suri, Bharti; Kumar, Vijay; Damasevicius, Robertas
The current era demands high quality software in a limited time period to achieve new goals and heights. To meet user requirements, the source codes undergo frequent modifications which can generate the bad smells in software that deteriorate the quality and reliability of software. Source code of the open source software is easily accessible by any developer, thus frequently modifiable. In this paper, we have proposed a mathematical model to predict the bad smells using the concept of entropy as defined by the Information Theory. Open-source software Apache Abdera is taken into consideration for calculating the bad smells. Bad smells are collected using a detection tool from sub components of the Apache Abdera project, and different measures of entropy (Shannon, Renyi and Tsallis entropy). By applying non-linear regression techniques, the bad smells that can arise in the future versions of software are predicted based on the observed bad smells and entropy measures. The proposed model has been validated using goodness of fit parameters (prediction error, bias, variation, and Root Mean Squared Prediction Error (RMSPE)). The values of model performance statistics (R-2, adjusted R-2, Mean Square Error (MSE) and standard error) also justify the proposed model. We have compared the results of the prediction model with the observed results on real data. The results of the model might be helpful for software development industries and future researchers.
Measuring the Reusable Quality for Xml Schema Documents
(Budapest Tech Polytechnical Institution, 2013) Thaw,T.; Misra,S.
Extensible Markup Language (XML) based web applications are widely used for data describing and providing internet services. The design of XML schema document (XSD) needs to be quantified with software with the reusable nature of XSD. This nature of documents helps software developers to produce software at a lower software development cost. This paper proposes a metric Entropy Measure of Complexity (EMC), which is intended to measure the reusable quality of XML schema documents. A higher EMC value tends to more reusable quality, and as well, a higher EMC value implies that this schema document contains inheritance feature, elements and attributes. For empirical validation, the metric is applied on 70 WSDL schema files. A comparison with similar measures is also performed. The proposed EMC metric is also validated practically and theoretically. Empirical, theoretical and practical validation and a comparative study proves that the EMC metric is a valid metric and capable of measuring the reusable quality of XSD.
Quantitative Quality Evaluation of Software Products by Considering Summary and Comments Entropy of a Reported Bug
(MDPI, 2019) Misra, Sanjay; Kumari, Madhu; Misra, Ananya; Damasevicius, Robertas; Fernandez Sanz, Luis; Sanz, Luis Fernandez; Singh, V. B.
A software bug is characterized by its attributes. Various prediction models have been developed using these attributes to enhance the quality of software products. The reporting of bugs leads to high irregular patterns. The repository size is also increasing with enormous rate, resulting in uncertainty and irregularities. These uncertainty and irregularities are termed as veracity in the context of big data. In order to quantify these irregular and uncertain patterns, the authors have appliedentropy-based measures of the terms reported in the summary and the comments submitted by the users. Both uncertainties and irregular patterns have been taken care of byentropy-based measures. In this paper, the authors considered that the bug fixing process does not only depend upon the calendar time, testing effort and testing coverage, but it also depends on the bug summary description and comments. The paper proposed bug dependency-based mathematical models by considering the summary description of bugs and comments submitted by users in terms of the entropy-based measures. The models were validated on different Eclipse project products. The models proposed in the literature have different types of growth curves. The models mainly follow exponential, S-shaped or mixtures of both types of curves. In this paper, the proposed models were compared with the modelsfollowingexponential, S-shaped and mixtures of both types of curves.
Citation - WoS: 3
Citation - Scopus: 3
An Information-Theoretic Instance-Based Classifier
(Elsevier Science inc, 2020) Gokcay, Erhan
Classification algorithms are used in many areas to determine new class labels given a training set. Many classification algorithms, linear or not, require a training phase to determine model parameters by using an iterative optimization of the cost function for that particular model or algorithm. The training phase can adjust and fine-tune the boundary line between classes. However, the process may get stuck in a local optimum, which may or may not be close to the desired solution. Another disadvantage of training processes is that upon arrival of a new sample, a retraining of the model is necessary. This work presents a new information-theoretic approach to an instance-based supervised classification. The boundary line between classes is calculated only by the data points without any external parameters or weights, and it is given in closed-form. The separation between classes is nonlinear and smooth, which reduces memorization problems. Since the method does not require a training phase, classified samples can be incorporated in the training set directly, simplifying a streaming classification operation. The boundary line can be replaced with an approximation or regression model for parametric calculations. Features and performance of the proposed method are discussed and compared with similar algorithms. (C) 2020 Elsevier Inc. All rights reserved.

Filters

Settings

Sort By

Results per page

Search Results