CN113656279B - Code odor detection method based on residual network and metric attention mechanism - Google Patents

Code odor detection method based on residual network and metric attention mechanism Download PDF

Info

Publication number
CN113656279B
CN113656279B CN202110732549.XA CN202110732549A CN113656279B CN 113656279 B CN113656279 B CN 113656279B CN 202110732549 A CN202110732549 A CN 202110732549A CN 113656279 B CN113656279 B CN 113656279B
Authority
CN
China
Prior art keywords
code
odor
attention mechanism
metric
residual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110732549.XA
Other languages
Chinese (zh)
Other versions
CN113656279A (en
Inventor
张杨
东春浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Science and Technology
Original Assignee
Hebei University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Science and Technology filed Critical Hebei University of Science and Technology
Priority to CN202110732549.XA priority Critical patent/CN113656279B/en
Publication of CN113656279A publication Critical patent/CN113656279A/en
Application granted granted Critical
Publication of CN113656279B publication Critical patent/CN113656279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a code bad smell detection method based on a residual error network and a measurement attention mechanism, which adopts an iPLANMA tool to analyze 20 application programs, obtains structural information and labels of classes and methods in each program, and generates a data set of brain classes and brain methods; acquiring characteristic information of different layers by adopting a residual error network, and re-weighting and distributing different characteristics by introducing a characteristic attention mechanism; training and evaluating the model by adopting two odor data sets, and judging whether the model has code bad odor or not by the structural information of the class or the method after training. The invention discloses two odor data sets, which improves the accuracy of detecting the odor of brains and brain methods, thereby helping developers to more accurately find the design defect problem in the program.

Description

Code odor detection method based on residual network and metric attention mechanism
Technical Field
The invention relates to the field of computer software maintenance and evolution, and provides a code odor detection method based on a residual error network and a metric attention mechanism.
Background
Software maintenance and development is a complex activity that forces developers to steadily modify source code to accommodate new requirements or to repair design flaws found in software. Such activities are typically completed within a strict time frame, and developers are often forced to put down good programming practices and guidelines to deliver the most appropriate products on time, which may lead to technical liabilities, i.e., introduce design issues that may negatively impact future system maintainability.
The reconstruction technology optimizes the software, improves the design quality of the software without changing the external characteristics of the software, and further improves the maintainability and expandability of the software. One key step in software reconstruction is determining where to apply the reconstruction. To facilitate identification of these reconstruction locations, researchers have proposed the concept of code odor to describe design flaws in software. Code odor detection has become an established method to discover problems in source code and correct them by software reconstruction. The research of the code odor detection method also becomes one of research hotspots, and the research and development of the field of software reconstruction are greatly promoted.
Currently, most conventional code odor detection methods rely on manually designed heuristic rules to determine whether there is an odor or not. It is a tedious and laborious task for programmers to manually identify code odors. In addition, the formulation of heuristic rules requires experienced researchers to assist in the formulation, resulting in poor results among the different detection tools due to subjectivity of the developer.
To solve the problems of the conventional methods, various automatic or semi-automatic methods are applied to the code odor detection, such as: support vector machines, J-48, and decision trees, which are used to build code metrics and complex mappings between lexical similarities and predictions. However, demonstration studies indicate that these machine learning-based code odor detection methods have key limitations and deserve further investigation. In contrast to machine learning algorithms, deep neural networks are able to automatically extract features useful for code odor detection from source code and build complex mappings between these features and tags.
Although many promising techniques are proposed, there are still some problems at present. Existing work has focused mainly on those popular code flavors such as jealous odor, emperor, and cuboidal methods, while little research has been done on brain classes and brain methods; secondly, the accuracy of the existing method is not satisfactory, and can be further improved; furthermore, the lack of a publicly available dataset can be used to detect both code odors. Therefore, training of deep learning models by data sets of how to construct brain classes and brain methods is becoming more and more urgent.
Disclosure of Invention
The invention aims to provide a code odor detection method based on a residual error network and a metric attention mechanism, which is time-saving and high in accuracy.
The invention adopts the following technical scheme:
a code odor detection method based on a residual network and a metric attention mechanism, comprising the steps of:
(1) Generating a code odor dataset;
(2) Data balancing;
(3) Constructing a MARS model, wherein the MARS model comprises a convolution layer, a normalization layer, a ReLU layer, a residual error network, an average pooling layer and a full connection layer, the residual error network comprises a plurality of residual error blocks, and each residual error block introduces a measurement attention mechanism;
(4) Training a MARS model;
(5) Judging whether the input information has peculiar smell or not by using the trained MARS model.
The method comprises the steps of (1) taking Github as an open source corpus, analyzing 20 programs in different fields through an iPLANM tool, and extracting code structure information of 13 method levels and 9 kinds of levels; meanwhile, two code odor examples of the brain class and the brain method are obtained, a label is generated by marking the code odor examples, 0 indicates no code odor, 1 indicates code odor, and code measurement information and the label are combined to generate a data set of the two code odors.
The data balancing method in the step (2) specifically includes the steps of generating the number of samples containing code odor by applying a Smote algorithm; for each code odor sample, calculating the distance from the code odor sample to other code odor samples by taking Euclidean distance as a standard, selecting n neighbors of the code odor sample, setting the sampling proportion to be 5:2, randomly selecting a plurality of samples from the n neighbors of the code odor sample, and constructing new samples according to the formula X (new) =X+rand (0, 1) ×X-K) with the original samples respectively assuming that the selected neighbors are K.
In the step (3), each residual block consists of two convolution layers and a jump structure, features in code structure information are obtained through CNN, normalization is performed on Batch Normalization, and ReLu is used as an activation function; a metric attention mechanism is added at the end of each residual block.
The construction method of the attention measurement mechanism in the step (3) comprises the following steps: firstly, compressing the extracted features in a plurality of measurement information into a C-dimensional channel by adopting average pooling, and taking the global space features of each channel as the representation of the channel; secondly, calculating a weight matrix, firstly, carrying out a full connection layer to obtain a vector in C/n dimension, carrying out tanh activation, carrying out full connection again, converting the vector in C/n dimension into a vector in C dimension, adopting sigmoid activation to enable a numerical value to be between 0 and 1, obtaining the weight matrix, finally, multiplying the weight matrix by characteristic information to recalculate the importance degree of the characteristics on the odor of the detection code, and reassigning weights to different characteristics according to the importance degree of the characteristics to increase the weight on the important characteristics of the odor of the detection code.
The training method of the MARS model comprises the steps of dividing a data set into a training set and a testing set, continuously updating parameters by calculating an error value between an output value and a label, finally obtaining a trained classifier, randomly selecting one training set as a verification set by cross verification, and verifying the performance of the model to prevent over fitting.
The invention has the beneficial effects that: the invention supports detection of two kinds of smell of brain class and brain method, can avoid programmer to manually detect whether the smell of the two kinds of codes exists in the source code, and saves time.
In the aspect of detecting brains and brain methods, the average accuracy of the method of the invention is improved by more than 2% compared with the existing code odor detection method.
Drawings
Fig. 1 is a general framework of the method of the invention.
Fig. 2 is a schematic diagram of a modified residual network.
Fig. 3 is a schematic diagram of an original and modified residual block.
Fig. 4 is a schematic diagram of a metric attention mechanism.
FIG. 5 is a schematic diagram of an example of an application of the detection model.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the embodiments of the present application and the accompanying drawings, it being evident that the embodiments described are only some, but not all, of the embodiments of the present application. The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the application, its application, or uses. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The code smell detection method based on the residual network and the attention measurement mechanism is used for detecting the smell of the brain class and the brain method through the steps of code smell data set generation, data balance, attention measurement construction, residual network construction improvement, model training and evaluation and the like.
As shown in fig. 1, because the model is trained and tested, the data sets of two code odors, namely, brain class and brain method, are first generated, 20 open source application programs are analyzed by using an iplama tool, structural information and labels of all classes and methods in each program are extracted to generate the data sets, and the data sets are balanced by using a smote algorithm.
The invention consists of a residual network and a measurement attention mechanism, wherein the characteristic information of deeper different layers is extracted through the residual network. Training the built model by using a training set, preventing the model from being fitted by using a verification set, and evaluating the performance of the model by using a test set. A model of combining a residual network and a metric attention mechanism is adopted to judge whether a certain code smell exists in the program.
1. Code odor dataset generation
With Github as an open source corpus, 20 different field programs are analyzed through an iPLASMA tool, and 13 kinds of code structure information of method levels and 9 kinds of levels are extracted. Meanwhile, two code odor examples of the brain class and the brain method are obtained, a label is generated by marking the code odor examples, 0 indicates no code odor, 1 indicates code odor, and code measurement information and the label are combined to generate a data set of the two code odors. The data set is used to train, verify and evaluate the proposed method to determine if the method has better performance in detecting code odor.
2. Data balancing
The method comprises the steps of generating the number of samples containing code odor by applying a Smote algorithm, calculating the distance from each code odor sample to other code odor samples by taking Euclidean distance as a standard, selecting n neighbors of each code odor sample, setting the sampling proportion to be 5:2, randomly selecting a plurality of samples from the n neighbors, assuming that the selected neighbors are K, respectively constructing new samples with the original samples according to the formula X (new) =X+rand (0, 1) ×X-K, and balancing the number between bad odor data and no bad odor data, thereby solving the problems of low training efficiency and the problem of the whole model performance reduction caused by excessive data of a certain class.
3. Improved residual network construction
Each residual block consists of two convolution layers and a jump structure, the features in the code structure information are obtained through CNN, batch Normalization is normalized, reLu is used as an activation function, the jump structure does not train the trained features, the training parameters of the model are reduced, and the training depth of the model is improved. And finally, adding a measurement attention mechanism into each residual block, and recalculating weights by acquiring the characteristic information of different layers through each residual block, and carrying out weight distribution again to improve the accuracy of code odor detection.
Fig. 2 shows a modified residual network comprising 17 convolutional layers and 1 dense layer. Each residual block has the same structure except for an increase in the number of channels and a decrease in the output size. The present invention introduces a metric attention mechanism for each residual block. An average pool layer is employed to reduce the computational effort of the network. The fully connected layer is used as a classifier for the entire convolutional nerve. The output layer only has one neuron to judge whether the peculiar smell exists by learning the input structural information.
Fig. 3 depicts the residual block before and after modification, which consists of two parts, residual mapping and direct mapping. In the residual part, the structural features are extracted through two convolution layers, the weight of a convolution kernel is learned, the wanted features are extracted according to an objective function, the jump structure is used for carrying out jump processing on the trained features, the training parameters of the model are reduced, and the training depth of the model is improved. The gradient is prevented from disappearing by utilizing the Relu activation function, the nonlinear capability of the network is increased, and the training speed of the network is improved. The characteristic information is processed by adopting normalization, so that the internal covariate offset is solved, the gradient saturation problem is relieved, and the convergence speed is increased. After the second layer of normalization of the original residual block, a metric attention mechanism is introduced, and the metric attention mechanism performs twice scaling on the input features to obtain weight coefficients, and then performs weighted distribution again, so that the weight of important feature information is increased.
4. Metric attention construction
Fig. 4 introduces a metric attention mechanism.
The first step is to compress the extracted features in the plurality of measurement information into a C-dimensional channel by adopting average pooling, and the global space features of each channel are used as the representation of the channel.
And secondly, calculating a weight matrix, firstly, performing a full connection layer to obtain a vector with C/n dimension, using tanh as an activation function to accelerate the convergence rate of the model, performing full connection again, converting the vector with C/n dimension into a vector with C dimension, and using sigmoid activation to enable the value to be between 0 and 1 to obtain the weight matrix.
Finally, multiplying the weight matrix with the feature information to recalculate the importance degree of the features on the detected code smell, and respectively carrying out twice scaling on the extracted feature information of different layers, wherein the twice sampling not only reduces the calculated amount in the network, but also obtains the weight of each feature. The last full-connection layer aims to enhance the adaptability of the network, solve the problem of nonlinearity when the number of full-connection layers is small, and further improve the learning efficiency and nonlinearity expression of the model.
5. Model training and evaluation
The data set is divided into a training set and a testing set, the proportion is 7:3, the training set is used as the input of a model, the batch processing times are 100 after 50 iterations, the label is used as the expected output of the model, the trained classifier is finally obtained by calculating the error value between the output value and the label and continuously updating the parameters, the cross verification is used as the verification set, one training set is randomly selected as the verification set, and the performance of the verification model is prevented from being overfitted. And testing the trained model by adopting a test set, and obtaining the performance of the model on the test set according to three performance indexes of accuracy, precision and F1 value.
6. Application instance
Fig. 5 shows an application example of the trained model, for a Player class in an open source program redox, by obtaining 13 metrics of the Player class as input of the model, where the number of code lines is 231, the ring complexity of the class is 59, and the coupling degree between object classes is 14. And (5) giving a result through model analysis, and judging that the brain smell exists in the Player.

Claims (5)

1. A code odor detection method based on a residual network and a metric attention mechanism, characterized in that it comprises the following steps:
(1) Generating a code odor dataset;
(2) Data balancing;
(3) Constructing a MARS model, wherein the MARS model comprises a convolution layer, a normalization layer, a ReLU layer, a residual error network, an average pooling layer and a full connection layer, the residual error network comprises a plurality of residual error blocks, and each residual error block introduces a measurement attention mechanism;
the construction method of the measurement attention mechanism comprises the following steps: firstly, compressing the extracted features in a plurality of measurement information into a C-dimensional channel by adopting average pooling, and taking the global space features of each channel as the representation of the channel; secondly, calculating a weight matrix, firstly, carrying out a full connection layer to obtain a vector in C/n dimension, carrying out tanh activation, carrying out full connection again, converting the vector in C/n dimension into a vector in C dimension, adopting sigmoid activation to enable a numerical value to be between 0 and 1 to obtain the weight matrix, finally multiplying the weight matrix by characteristic information to recalculate the importance degree of the characteristics on the odor of the detection code, and reassigning weights to different characteristics according to the importance degree of the characteristics to increase the weight on the important characteristics of the odor of the detection code;
(4) Training a MARS model;
(5) Judging whether the input information has peculiar smell or not by using the trained MARS model.
2. The code odor detection method based on residual network and metric attention mechanism according to claim 1, wherein step (1) uses Github as an open source corpus, analyzes 20 different domain programs through an iplama tool, and extracts 13 kinds of code structure information of method level and 9 kinds of level; meanwhile, two code odor examples of the brain class and the brain method are obtained, a label is generated by marking the code odor examples, 0 indicates no code odor, 1 indicates code odor, and code measurement information and the label are combined to generate a data set of the two code odors.
3. The method for detecting code odor based on residual network and metric attention mechanism of claim 1 wherein the data balancing method of step (2) is specifically that the number of samples containing code odor is generated by applying Smote algorithm; for each code odor sample, calculating the distance from the code odor sample to other code odor samples by taking Euclidean distance as a standard, selecting n neighbors of the code odor sample, setting the sampling proportion to be 5:2, randomly selecting a plurality of samples from the n neighbors of the code odor sample, and constructing new samples according to the formula X (new) =X+rand (0, 1) ×X-K) with the original samples respectively assuming that the selected neighbors are K.
4. The method for code odor detection based on residual network and metric attention mechanism of claim 1, wherein in step (3), each residual block consists of two convolution layers and one jump structure, features in code structure information are obtained through CNN, batch Normalization is normalized, and ReLu is used as an activation function; a metric attention mechanism is added at the end of each residual block.
5. The code odor detection method based on the residual network and the metric attention mechanism according to claim 1, wherein the training method of the MARS model is characterized in that a data set is divided into a training set and a test set, the trained classifier is finally obtained by calculating an error value between an output value and a label and continuously updating parameters, a training set is randomly selected as a verification set through cross verification, and the performance of the verification model is prevented from being overfitted.
CN202110732549.XA 2021-06-30 2021-06-30 Code odor detection method based on residual network and metric attention mechanism Active CN113656279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110732549.XA CN113656279B (en) 2021-06-30 2021-06-30 Code odor detection method based on residual network and metric attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110732549.XA CN113656279B (en) 2021-06-30 2021-06-30 Code odor detection method based on residual network and metric attention mechanism

Publications (2)

Publication Number Publication Date
CN113656279A CN113656279A (en) 2021-11-16
CN113656279B true CN113656279B (en) 2023-07-21

Family

ID=78477327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110732549.XA Active CN113656279B (en) 2021-06-30 2021-06-30 Code odor detection method based on residual network and metric attention mechanism

Country Status (1)

Country Link
CN (1) CN113656279B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148585A (en) * 2019-06-27 2020-12-29 英特尔公司 Method, system, article of manufacture, and apparatus for code review assistance for dynamic type languages

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148585A (en) * 2019-06-27 2020-12-29 英特尔公司 Method, system, article of manufacture, and apparatus for code review assistance for dynamic type languages

Also Published As

Publication number Publication date
CN113656279A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN110221975B (en) Method and device for creating interface case automation test script
KR20160122452A (en) Deep learnig framework and image recognition method for content-based visual image recognition
CN114155244B (en) Defect detection method, device, equipment and storage medium
CN109815855B (en) Electronic equipment automatic test method and system based on machine learning
CN113434685A (en) Information classification processing method and system
CN112527676A (en) Model automation test method, device and storage medium
CN104915680B (en) Multi-tag transformation Relationship Prediction method based on Ameliorative RBF Neural Networks
CN113900654A (en) Code plagiarism detection method and system based on program language teaching practice platform
CN109101414B (en) Massive UI test generation method and device based on buried point data
CN114742122A (en) Equipment fault diagnosis method and device, electronic equipment and storage medium
CN113656279B (en) Code odor detection method based on residual network and metric attention mechanism
CN112712181A (en) Model construction optimization method, device, equipment and readable storage medium
CN117574383A (en) Feature fusion and code visualization technology-based software vulnerability detection model method
CN111880957A (en) Program error positioning method based on random forest model
CN116578475A (en) Code verification method, device, equipment and readable storage medium
CN116152609A (en) Distributed model training method, system, device and computer readable medium
CN111415326A (en) Method and system for detecting abnormal state of railway contact net bolt
CN110968518A (en) Analysis method and device for automatic test log file
CN115830419A (en) Data-driven artificial intelligence technology evaluation system and method
CN111459787A (en) Test plagiarism detection method based on machine learning
CN115358473A (en) Power load prediction method and prediction system based on deep learning
CN111474894B (en) Variable target PLC simulation debugging method, storage medium and functional module
CN113918471A (en) Test case processing method and device and computer readable storage medium
CN114138328A (en) Software reconstruction prediction method based on code peculiar smell
Jang et al. Machine Learning-Based Programming Analysis Model Proposal: Based on User Behavioral Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant