CN116662160A - Software defect prediction method and processing device based on cost sensitive width learning - Google Patents

Software defect prediction method and processing device based on cost sensitive width learning Download PDF

Info

Publication number
CN116662160A
CN116662160A CN202310502274.XA CN202310502274A CN116662160A CN 116662160 A CN116662160 A CN 116662160A CN 202310502274 A CN202310502274 A CN 202310502274A CN 116662160 A CN116662160 A CN 116662160A
Authority
CN
China
Prior art keywords
software defect
defect prediction
data set
cost
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310502274.XA
Other languages
Chinese (zh)
Inventor
曹鹤玲
王兆龙
贾俊亮
廖天力
楚永贺
李磊
赵晨阳
刘广恩
王峰
王盼盼
张硕
李庆宇
王兴亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Jinmingyuan Information Technology Co ltd
Henan University of Technology
Original Assignee
Henan Jinmingyuan Information Technology Co ltd
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Jinmingyuan Information Technology Co ltd, Henan University of Technology filed Critical Henan Jinmingyuan Information Technology Co ltd
Priority to CN202310502274.XA priority Critical patent/CN116662160A/en
Publication of CN116662160A publication Critical patent/CN116662160A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a software defect prediction method and a processing device based on cost sensitive width learning, belonging to the field of software defect prediction, wherein the method comprises the following steps: the method comprises the steps of constructing a software defect prediction data set, dividing the software defect prediction data set into a defective example and a non-defective example by adopting a cost sensitive learning algorithm, introducing a pulse neural network, carrying out normalization pretreatment and linear coding on the data set, converting the data set into a pulse sequence, acquiring an output pulse sequence through a pulse neural network model, then carrying out reverse decoding to acquire a continuous output value of a software defect characteristic, inputting the continuous output value of the acquired software defect characteristic into a cost sensitive width learning software defect prediction model for training, inputting the data set to be predicted into the cost sensitive width learning software defect prediction model, and outputting a prediction result. According to the application, the software defect prediction accuracy and reliability are improved through the cost-sensitive width learning software defect prediction model, the training time is shortened, and the accuracy is improved.

Description

Software defect prediction method and processing device based on cost sensitive width learning
Technical Field
The application relates to the technical field of software defect prediction, in particular to a software defect prediction method and a processing device based on cost sensitive width learning.
Background
In the software development process, defects are unavoidable, and the larger the software scale is, the larger the number of defects is. In view of the increasing functional demands, large and medium-sized programs with frequently-used tens of thousands of lines of codes are becoming mainstream, and how to automatically find defect positions of a plurality of defects in these large-scale programs is becoming a pursuing goal. Aiming at the problem of multi-defect positioning, the method has the advantages that a defect program is divided into different parts according to the execution result of test data, then a developer manually checks codes to manually determine the defect positions, and the method not only consumes a great deal of time, but also needs more manpower. On the basis, researchers are provided to solve the problem of multi-defect positioning, and then the problem is solved by a single-defect positioning method, but the defects still need to be identified through full-process interactive guidance of technicians.
In order to promote automation of a positioning method and improve multi-defect positioning efficiency, researchers self-deduce 30 different statement suspicion calculation formulas for 92 defects in Unix tool set under the condition of no technician participation guidance by means of genetic programming, and find 4 defects with good defect positioning effect after theoretical analysis. On the basis of the research, the research proposes to use a genetic algorithm to locate multiple defect positions in a defect program, the algorithm converts the multiple defect locating problem into a search optimization problem which can be processed by the genetic algorithm, the automation of the method is realized, the defect locating speed is improved, the defect locating precision is improved, the overall performance is superior to that of the prior multiple defect locating method, but the problems such as complicated parameter setting, complex algorithm rule and the like still exist.
The software defect prediction can accurately judge whether a defect program exists in a software module to be predicted by designing a robust machine learning model, so that guidance is provided for reasonably distributing test resources and improving the reliability of software. Software defect prediction is a cost-sensitive learning problem, i.e., the cost caused by misjudging a defective program as a non-defective program is greater than the cost caused by misjudging a non-defective program as a defective program.
It is pointed out that the defect positioning effect is influenced by the defect position and defect type, and many existing multi-defect positioning methods have serious performance degradation when facing to multiple types and multiple numbers of defects in a large program, and valuable defect positioning information cannot be obtained, so that the need for efficient multi-defect positioning methods is a problem to be solved.
Disclosure of Invention
In order to solve the problems that the performance of the existing multi-defect positioning method is seriously reduced when facing to multiple types and multiple numbers of defects in a large program and valuable defect positioning information cannot be obtained, the application provides a software defect prediction method and a processing device based on cost sensitive width learning, which adopts the following technical scheme:
in a first aspect, the present application provides a software defect prediction method based on cost sensitive width learning, including:
Step S1, a software defect prediction data set is constructed, wherein the mode of constructing the software defect prediction data set is as follows: extracting a historical software module from an existing software historical warehouse, extracting static properties of program codes from the historical software module, designing a software defect measuring element with strong correlation with similar software defects, and constructing a software defect prediction data set;
s2, dividing a software defect prediction data set into a defective example and a non-defective example by adopting a cost sensitive learning algorithm, and acquiring a first data set;
step S3, introducing a pulse neural network, and carrying out normalization pretreatment on software defect characteristics of the first data set to obtain a second data set;
step S4, performing linear coding on the second data set, and converting the second data set into an input pulse sequence;
s5, inputting the input pulse sequence into a pulse neural network model, and calculating through a pulse neuron operation model to obtain an output pulse sequence;
s6, reversely decoding the output pulse sequence to obtain a continuous output value of the software defect characteristic, and taking the continuous output value of the software defect characteristic as third data;
Step S7, inputting the third data into a cost-sensitive width learning software defect prediction model, training the cost-sensitive width learning software defect prediction model, and obtaining a trained cost-sensitive width learning software defect prediction model;
and S8, inputting the data set to be predicted into a trained cost sensitive width learning software defect prediction model after processing in the steps S2-S6, and outputting a software defect prediction result.
Further, the step S2 of dividing the software defect prediction dataset into a defective instance and a non-defective instance by using a cost-sensitive learning algorithm is specifically shown as follows:
adopting a cost sensitive learning algorithm to allocate different misclassification costs to defective examples and non-defective examples in the software defect prediction data set, adding a cost sensitive matrix, giving different misclassification penalty forces to the non-defective examples and the defective examples in the software defect prediction data set, and using 0 and 1 to respectively refer to the non-defective examples and the defective examples;
suppose C ij Representing classification cost for classifying an i-th class instance into a j-th class, C ij The larger the value, the greater the loss due to the misclassification; the loss cost of the error classification is larger than that of the correct classification, and because the correct classification does not bring loss, the following size relationship exists between the classification costs: C10C 10 >C01>C00 =c11=0, where C00 is actual defect free, predicted defect free, C11 is actual defect free, predicted defect free, C10 is actual defect free, predicted defect free, C01 is actual defect free, predicted defect free.
Further, the first data set is subjected to normalization preprocessing in the step S3, and the first data set is subjected to normalization preprocessing by adopting z-score.
Further, the normalization preprocessing is performed on the first data set, namely the normalization processing is performed on the data, so that differences among different characteristic data sets of software defect characteristics are eliminated, and the data are easy to compare and process. And carrying out normalization pretreatment on the first data set by adopting a z-score, so that the mean value of different characteristic data sets of the software defect characteristics after normalization treatment is 0, and the standard deviation is 1.
Further, in the step S4, the second data set is linearly encoded, and the second data set is converted into an input pulse sequence, which is specifically expressed as follows:
the continuous values in the second data set are converted to discrete values by mapping the real vectors in the second data set to binary pulse sequences to accommodate the input format of the pulsed neural network model.
Further, the step S5 inputs the input pulse sequence into a pulse neural network model, and obtains an output pulse sequence through calculation of a pulse neuron operation model, which is specifically expressed as follows:
the neuron input in the impulse response neuron model is a pulse sequence transmitted to the neuron along the synapse, all pulses can complete the generation of postsynaptic potential in the neuron after reaching the neuron, and the sum of the postsynaptic potential under the influence of the corresponding synaptic weight is an important component of the membrane potential value;
assuming that the neuron has N input synapses, the ith synapse has G i Pulse inputs whose time set of arrival at the neuron is noted as The firing time of the g-th pulse representing synapse i, the membrane potential V (t) of the neuron at time t can be expressed as:
wherein w is i Is the weight, t, of the ith input synapse of the neuron f The f-th pulse emission time is represented, the refractory period function ρ represents the resetting process of the membrane potential, the impulse response function ε represents the response of the pulse to the postsynaptic neuron membrane potential, and t represents any one of the assumed times.
Further, in the step S7, the third data set is input into a cost-sensitive width learning software defect prediction model, and the cost-sensitive width learning software defect prediction model is trained, which is specifically expressed as follows:
Given a training set { X, Y }, and M i A plurality of feature nodes; ith feature node Z in feature mapping layer i Can be expressed as:
Z i =φ i (XW i +b i ),i=1,2,...,p
wherein the weight W i And deviation term b i Is a random matrix with proper dimension, X is input data, phi i For the selected activation function; for the representation of feature node groups, Z can be p =(Z 1 ,Z 2 ,...,Z p ) Feature node group as feature mapping layer and Z p Further connected to the enhancement node group at the enhancement layer, Z p A similar transformation needs to be performed; in a different manner, the activation mode adopts nonlinear activation, and the output of the jth enhancement node can be expressed as:
wherein H is j Representing nonlinear activation output of enhancement layer from feature node group set Z p And by activating a function epsilon j Obtained by a non-linear transformation of (a). Weight W j And deviation term b j Is a random matrix with suitable dimensions. The activation output of the enhancement layer may then be further represented as
In order to ensure the sparsity of data during training, a ridge regression algorithm is introduced to finely adjust the weights of a feature mapping layer and an enhancement layer, and finally, the output of the cost-sensitive width learning software defect prediction model adopts the following form:
Y=(Z 1 ,Z 2 ,...,Z p ,H 1 ,H 2 ,...,H q )W q
=(Z p ,H q )W q
wherein Z is p Is a characteristic node, H q To strengthen the node, W q Is the weight; z is Z p For characteristic node groups, H q To enhance the node group.
Further, the feature nodes are features mapped by the input software defect prediction data, the mapped features are enhanced to enhancement nodes with randomly generated weights, the mapped features and the enhancement nodes are connected to the output end of the cost sensitive width learning software defect prediction model, and the corresponding output weights of the feature mapping layer and the enhancement layer are obtained through pseudo-inverse of a gradient descent method.
Further, the output weight of the enhancement node represents the quality degree of the enhancement node, and the larger the output weight is, the more the enhancement node is fit with the real data when performing nonlinear transformation on the data, and the more accurate the prediction is.
In a second aspect, the present application further provides a software defect prediction apparatus based on cost sensitive width learning, including: the device comprises a software defect prediction data set acquisition module, a software defect prediction data set division module, a normalization processing module, an input pulse sequence acquisition module, an output pulse sequence reverse decoding module, a cost sensitive width learning software defect prediction model training module and a cost sensitive width learning software defect prediction model training module;
the software defect prediction data set acquisition module is used for constructing a software defect prediction data set in the following manner: extracting a historical software module from an existing software historical warehouse, extracting static properties of program codes from the historical software module, designing a software defect measuring element with strong correlation with similar software defects, and constructing a software defect prediction data set;
The software defect prediction data set dividing module divides the software defect prediction data set into a defective example and a non-defective example by adopting a cost sensitive learning algorithm, and acquires a first data set;
the normalization processing module is used for introducing a pulse neural network, performing normalization preprocessing on the software defect characteristics of the first data set, and acquiring a second data set;
the input pulse sequence acquisition module is used for carrying out linear coding on the second data set and converting the second data set into an input pulse sequence;
the output pulse sequence acquisition module is used for inputting the input pulse sequence into a pulse neural network model and acquiring an output pulse sequence through calculation of a pulse neuron operation model;
the output pulse sequence reverse decoding module is used for carrying out reverse decoding on the output pulse sequence, obtaining continuous output values of software defect characteristics and taking the continuous output values of the software defect characteristics as third data;
the cost sensitive width learning software defect prediction model training module inputs the third data into a cost sensitive width learning software defect prediction model, trains the cost sensitive width learning software defect prediction model, and acquires a trained cost sensitive width learning software defect prediction model;
And (3) inputting the data set to be predicted into the cost sensitive width learning software defect prediction model after the data set to be predicted is processed in the steps S2-S6, and outputting a software defect prediction result.
In a third aspect, the present application provides an electronic device, comprising:
one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the device, cause the device to perform the method of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the method according to the first aspect.
In a fifth aspect, the present application provides a computer program for performing the method of the first aspect when the computer program is executed by a computer.
In one possible design, the program in the fifth aspect may be stored in whole or in part on a storage medium packaged with the processor, or in part or in whole on a memory not packaged with the processor.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
1. according to the application, the software defect prediction data set is divided into the defective examples and the non-defective examples by adopting the cost-sensitive learning algorithm, and the accuracy of model prediction is optimized by considering the cost of classification errors by the cost-sensitive learning algorithm, so that the accuracy and reliability of software defect prediction are improved.
2. According to the application, the performance and the training speed of the cost-sensitive width learning software defect prediction model can be improved by carrying out normalization pretreatment on the software defect characteristics in the first data set.
3. The application can not only effectively reduce the dimension of the data, but also well process the problems of dynamic numerical signals and noise by adopting a mode of introducing a pulse neural network to carry out linear coding on the data.
4. According to the application, a cost-sensitive width learning software defect prediction model is adopted in the software defect prediction process, and the expression capacity of the model can be improved by increasing the width of the hidden layer.
5. The incremental learning algorithm of the input sample, the characteristic node and the enhancement node can be easily realized in the cost sensitive width learning software defect prediction model, the existing structure and parameters do not need to be retrained, and only the parameters of the newly added part need to be calculated, so that the training time is greatly shortened and higher precision can be ensured.
Drawings
FIG. 1 is a diagram of an exemplary system architecture in which embodiments of the present application may be applied.
Fig. 2 is a flow chart of a software defect prediction method based on cost-sensitive width learning.
Fig. 3 is an overall framework diagram of the software defect prediction method based on cost-sensitive width learning of the present application.
FIG. 4 is a flow chart of the software defect prediction method based on cost sensitive width learning of the present application.
FIG. 5 is a flow chart of a software defect prediction dataset constructed based on the cost-sensitive width learning software defect prediction method of the present application.
FIG. 6 is a schematic diagram of a software defect prediction model training for width learning based on software defect prediction for cost-sensitive width learning according to the present application.
FIG. 7 is a flow chart of the software defect prediction model operation of the present application based on cost sensitive width learning software defect prediction.
FIG. 8 is a flow chart of the software defect prediction model operation of the present application based on cost sensitive width learning software defect prediction.
FIG. 9 is a flow chart of the software defect prediction model operation of the present application based on cost sensitive width learning software defect prediction.
FIG. 10 is a schematic diagram of an apparatus for software defect prediction based on cost-sensitive width learning in accordance with the present application.
FIG. 11 is a prototype tool framework of software defect prediction based on cost-sensitive width learning of the present application.
FIG. 12 is a tool set up block diagram of software defect prediction based on cost sensitive width learning of the present application.
FIG. 13 is a schematic diagram of a computer device according to an embodiment of the application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the software defect prediction method based on cost sensitive width learning provided by the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the software defect prediction device based on cost sensitive width learning is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flowchart of the software defect prediction method based on cost-sensitive width learning according to the present application is shown, an overall framework of the software defect prediction method based on cost-sensitive width learning is shown in fig. 3, an operation flowchart is shown in fig. 4, and the method includes the following steps:
step S1, a software defect prediction data set is constructed, wherein the mode of constructing the software defect prediction data set is as follows: extracting a historical software module from an existing software historical warehouse, extracting static properties of program codes from the historical software module, designing a software defect measuring element with strong correlation with similar software defects, and constructing a software defect prediction data set;
The flowchart of the software defect prediction data set construction in step S1 is shown in fig. 5:
s101, extracting a historical software module from an existing software historical warehouse;
s102, extracting static properties of program codes from the historical software module;
s103, designing a software defect metric element with strong correlation with similar software defects;
s104, constructing a software defect prediction data set.
S2, dividing a software defect prediction data set into a defective example and a non-defective example by adopting a cost sensitive learning algorithm, and acquiring a first data set;
the step S2 of dividing the software defect prediction dataset into a defective example and a non-defective example by using a cost-sensitive learning algorithm is specifically expressed as follows:
adopting a cost sensitive learning algorithm to allocate different misclassification costs to defective examples and non-defective examples in the software defect prediction data set, adding a cost sensitive matrix, giving different misclassification penalty forces to the non-defective examples and the defective examples in the software defect prediction data set, and using 0 and 1 to respectively refer to the non-defective examples and the defective examples;
suppose C ij Representing classification cost for classifying an i-th class instance into a j-th class, C ij The larger the value, the greater the loss due to the misclassification; the loss cost of the error classification is larger than that of the correct classification, and because the correct classification does not bring loss, the following size relationship exists between the classification costs: C10C 10>C01>C00 =c11=0, where C00 is actual defect free, predicted defect free, C11 is actual defect free, predicted defect free, C10 is actual defect free, predicted defect free, C01 is actual defect free, predicted defect free.
And S3, introducing a pulse neural network, and carrying out normalization preprocessing on the software defect characteristics of the first data set to obtain a second data set.
Explanation: the normalization preprocessing enables standardized input data to be transmitted between nodes, so that the problem that gradient explosion or gradient disappearance occurs in the data in the training process is avoided, the training effect of the network is improved, the comparability among different features is enhanced by the normalization preprocessing, and the general-Chinese capability of the network is improved.
And in the step S3, the first data set is subjected to normalization pretreatment, and the first data set is subjected to normalization pretreatment by adopting a z-score.
The normalization preprocessing is performed on the first data set, namely the data is subjected to normalization processing, so that differences among different characteristic data sets of software defect characteristics are eliminated, and the data are easy to compare and process. And carrying out normalization pretreatment on the first data set by adopting a z-score, so that the mean value of different characteristic data sets of the software defect characteristics after normalization treatment is 0, and the standard deviation is 1.
And S4, performing linear coding on the second data set, and converting the second data set into an input pulse sequence.
The step S4 of linearly encoding the second data set, converting the second data set into an input pulse sequence, specifically includes:
the continuous values in the second data set are converted to discrete values by mapping the real vectors in the second data set to binary pulse sequences to accommodate the input format of the pulsed neural network model.
In one possible implementation, in the process of linearly encoding the second data set and converting the second data set into the input pulse sequence, noise in the pulse sequence is filtered through setting a threshold value, so that the influence on the pulse neurons when the pulse sequence is input into the pulse neural network model is reduced.
And S5, inputting the input pulse sequence into a pulse neural network model, and calculating through a pulse neuron operation model to obtain an output pulse sequence.
The step S5 is to input the input pulse sequence into a pulse neural network model, calculate through a pulse neuron operation model, and obtain an output pulse sequence, which is specifically expressed as follows:
the neuron input in the impulse response neuron model is a pulse sequence transmitted to the neuron along the synapse, all pulses can complete the generation of postsynaptic potential in the neuron after reaching the neuron, and the sum of the postsynaptic potential under the influence of the corresponding synaptic weight is an important component of the membrane potential value;
Assuming that the neuron has N input synapses, the ith synapse has G i Pulse inputs whose time set of arrival at the neuron is noted as Upon issuance of the g-th pulse representing synapse iThe membrane potential V (t) of a neuron at time t can be expressed as:
wherein w is i Is the weight, t, of the ith input synapse of the neuron f The f-th pulse emission time is represented, the refractory period function ρ represents the resetting process of the membrane potential, the impulse response function ε represents the response of the pulse to the postsynaptic neuron membrane potential, and t represents any one of the assumed times.
And S6, reversely decoding the output pulse sequence to obtain continuous output values of the software defect characteristics, and taking the continuous output values of the software defect characteristics as third data.
And S7, inputting the third data into a cost-sensitive width learning software defect prediction model, training the cost-sensitive width learning software defect prediction model, and obtaining a trained cost-sensitive width learning software defect prediction model.
Referring to fig. 6 specifically for training the defect prediction model of the cost sensitive width learning software, the operation flow chart is shown in fig. 7, and the main steps are as follows:
step 401, using the input software defect prediction data mapped feature as a feature node of the network;
Step 402, the mapped features are enhanced into enhancement nodes that randomly generate weights;
step 403, connecting the mapped features and the enhancement nodes to the output end of the cost sensitive width learning software defect prediction model;
in step 404, the corresponding output weights of the feature mapping layer and the enhancement layer are derived by the pseudo-inverse of the gradient descent method.
The output weight of the enhancement node represents the quality degree of the enhancement node, and the larger the output weight is, the more the enhancement node is fit with the real data when performing nonlinear transformation on the data, and the more accurate the prediction is.
In the step S7, the third data set is input into a cost-sensitive width learning software defect prediction model, and the cost-sensitive width learning software defect prediction model is trained, which specifically includes:
given a training set { X, Y }, and M i A plurality of feature nodes; ith feature node Z in feature mapping layer i Can be expressed as:
Z i =φ i (XW i +b i ),i=1,2,...,p
wherein the weight W i And deviation term b i Is a random matrix with proper dimension, X is input data, phi i For the selected activation function; for the representation of feature node groups, Z can be p =(Z 1 ,Z 2 ,...,Z p ) Feature node group as feature mapping layer and Z p Further connected to the enhancement node group at the enhancement layer, Z p A similar transformation needs to be performed; in a different manner, the activation mode adopts nonlinear activation, and the output of the jth enhancement node can be expressed as:
wherein H is j Representing nonlinear activation output of enhancement layer from feature node group set Z p And by activating a function epsilon j Obtained by a non-linear transformation of (a). Weight W j And deviation term b j Is a random matrix with suitable dimensions. The activation output of the enhancement layer may then be further represented as
In order to ensure the sparsity of data during training, a ridge regression algorithm is introduced to finely adjust the weights of a feature mapping layer and an enhancement layer, and finally, the output of the cost-sensitive width learning software defect prediction model adopts the following form:
Y=(Z 1 ,Z 2 ,...,Z p ,H 1 ,H 2 ,...,H q )W q
=(Z p ,H q )W q
wherein Z is p Is a characteristic node, H q To strengthen the node, W q Is the weight; z is Z p For characteristic node groups, H q To enhance the node group.
And S8, inputting the data set to be predicted into a trained cost sensitive width learning software defect prediction model after processing in the steps S2-S6, and outputting a software defect prediction result.
In a possible implementation manner, with continued reference to fig. 8, fig. 8 is a framework diagram of a software defect prediction method based on cost-sensitive width learning according to an embodiment of the present application, and an operation flow chart is shown in fig. 9, where specific steps are as follows:
And step 1, extracting the software module from the existing software history warehouse.
And 2, extracting static attributes of the program codes from the historical software module, designing effective software defect metric elements, and constructing a data set of software defect prediction.
And step 3, inputting the data set of the software defect prediction into a cost-sensitive width learning software defect prediction model for training.
And step 4, after software measurement and preprocessing are carried out on the new program module, the new program module is input into a trained cost-sensitive width learning software defect prediction model, and the cost-sensitive width learning software defect prediction model can be classified into a defective module or a non-defective module.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With continued reference to fig. 10, the software defect prediction apparatus based on cost sensitive width learning according to the present embodiment includes: the system comprises a software defect prediction data set acquisition module 101, a software defect prediction data set division module 102, a normalization processing module 103, an input pulse sequence acquisition module 104, an output pulse sequence acquisition module 105, an output pulse sequence reverse decoding module 106, a cost sensitive width learning software defect prediction model training module 107 and a cost sensitive width learning software defect prediction model training module 1008;
The software defect prediction data set obtaining module 101 constructs a software defect prediction data set in the following manner: extracting a historical software module from an existing software historical warehouse, extracting static properties of program codes from the historical software module, designing a software defect measuring element with strong correlation with similar software defects, and constructing a software defect prediction data set;
the software defect prediction data set dividing module 102 divides the software defect prediction data set into a defective example and a non-defective example by adopting a cost sensitive learning algorithm to obtain a first data set;
the normalization processing module 103 introduces a pulse neural network to perform normalization preprocessing on the software defect characteristics of the first data set to obtain a second data set;
an input pulse sequence acquisition module 104 for performing linear encoding on the second data set, and converting the second data set into an input pulse sequence;
the output pulse sequence acquisition module 105 inputs the input pulse sequence into a pulse neural network model, and acquires an output pulse sequence through calculation of a pulse neuron operation model;
the output pulse sequence reverse decoding module 106 is used for performing reverse decoding on the output pulse sequence to obtain continuous output values of the software defect characteristics, and the continuous output values of the software defect characteristics are used as third data;
The cost sensitive width learning software defect prediction model training module 107 inputs the third data into a cost sensitive width learning software defect prediction model, trains the cost sensitive width learning software defect prediction model, and acquires a trained cost sensitive width learning software defect prediction model;
and (2) inputting the data set to be predicted into the cost sensitive width learning software defect prediction model after processing in the steps S2-S6, and outputting a software defect prediction result.
In one possible implementation, a prototype tool frame diagram of the software defect prediction processing device based on cost sensitive width learning is shown in fig. 11, a tool device module relation diagram is shown in fig. 12, and the main modules are as follows:
module 1, data processing module. Extracting a software module from the existing software history warehouse, extracting static attributes of program codes from the history software module, designing an effective software defect metric element, constructing a data set of software defect prediction, and preprocessing data of the data set to solve the problem of data unbalance.
Module 2, training module. The processed data set is input into a software defect prediction model constructed based on width learning for training.
And 3, a prediction module. And inputting the software module to be predicted into a defect prediction model for prediction after data processing, and dividing the software module to be predicted into a defective module and a non-defective module.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 13, fig. 13 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 13 comprises a memory 13a, a processor 13b, a network interface 13c communicatively connected to each other via a system bus. It should be noted that only a computer device 13 having components 13a-13c is shown in the figures, but it should be understood that not all of the illustrated components need be implemented, and that more or fewer components may alternatively be implemented. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 13a includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 13a may be an internal storage unit of the computer device 13, such as a hard disk or a memory of the computer device 13. In other embodiments, the memory 13a may also be an external storage device of the computer device 13, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 13. Of course, the memory 13a may also include both an internal memory unit of the computer device 13 and an external memory device. In this embodiment, the memory 13a is generally used for storing an operating system and various application software installed in the computer device 13, such as a software defect prediction method based on cost sensitive width learning, a program code of a processing device, and the like. Further, the memory 13a may be used to temporarily store various types of data that have been output or are to be output.
The processor 13b may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 13b is typically used to control the overall operation of the computer device 13. In this embodiment, the processor 13b is configured to execute the program code or the processing data stored in the memory 13a, for example, the program code of the software defect prediction method and the processing device based on the cost sensitive width learning.
The network interface 13c may comprise a wireless network interface or a wired network interface, which network interface 13c is typically used to establish a communication connection between the computer device 13 and other electronic devices.
The present application also provides another embodiment, namely, a non-volatile computer readable storage medium, where a program of a software defect prediction method and a processing device based on cost sensitive width learning is stored, where the software defect prediction method and the processing device based on cost sensitive width learning can be executed by at least one processor, so that the at least one processor performs the steps of the software defect prediction method and the processing device based on cost sensitive width learning described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims (8)

1. The software defect prediction method based on cost sensitive width learning is characterized by comprising the following steps:
step S1, a software defect prediction data set is constructed, wherein the mode of constructing the software defect prediction data set is as follows: extracting a historical software module from an existing software historical warehouse, extracting static properties of program codes from the historical software module, designing a software defect measuring element with strong correlation with similar software defects, and constructing a software defect prediction data set;
s2, dividing a software defect prediction data set into a defective example and a non-defective example by adopting a cost sensitive learning algorithm, and acquiring a first data set;
step S3, introducing a pulse neural network, and carrying out normalization pretreatment on software defect characteristics of the first data set to obtain a second data set;
step S4, performing linear coding on the second data set, and converting the second data set into an input pulse sequence;
s5, inputting the input pulse sequence into a pulse neural network model, and calculating through a pulse neuron operation model to obtain an output pulse sequence;
s6, reversely decoding the output pulse sequence to obtain a continuous output value of the software defect characteristic, and taking the continuous output value of the software defect characteristic as third data;
Step S7, inputting the third data into a cost-sensitive width learning software defect prediction model, training the cost-sensitive width learning software defect prediction model, and obtaining a trained cost-sensitive width learning software defect prediction model;
and S8, inputting the data set to be predicted into a trained cost sensitive width learning software defect prediction model after processing in the steps S2-S6, and outputting a software defect prediction result.
2. The software defect prediction method based on cost-sensitive width learning according to claim 1, wherein the software defect prediction dataset is divided into a defective instance and a non-defective instance by using a cost-sensitive learning algorithm in step S2, which is specifically expressed as:
adopting a cost sensitive learning algorithm to allocate different misclassification costs to defective examples and non-defective examples in the software defect prediction data set, adding a cost sensitive matrix, giving different misclassification penalty forces to the non-defective examples and the defective examples in the software defect prediction data set, and using 0 and 1 to respectively refer to the non-defective examples and the defective examples;
suppose C ij Representing classification cost for classifying an i-th class instance into a j-th class, C ij The larger the value, the greater the loss due to the misclassification; the loss cost of the error classification is larger than that of the correct classification, and because the correct classification does not bring loss, the following size relationship exists between the classification costs: C10C 10>C01>C00 =c11=0, wherein C00 is virtually absentDefect, predicted no defect, C11 is actual defective, predicted defective, C10 is actual defective, predicted no defect, C01 is actual no defect, predicted defective.
3. The software defect prediction method based on cost-sensitive width learning according to claim 2, wherein the normalizing pretreatment is performed on the first data set in step S3, and the normalizing pretreatment is performed on the first data set by using z-score.
4. The software defect prediction method based on cost sensitive width learning according to claim 1, wherein the step S5 is to input the input pulse sequence into a pulse neural network model, calculate through a pulse neuron operation model, and obtain an output pulse sequence, which is specifically expressed as:
the neuron input in the impulse response neuron model is a pulse sequence transmitted to the neuron along the synapse, all pulses can complete the generation of postsynaptic potential in the neuron after reaching the neuron, and the sum of the postsynaptic potential under the influence of the corresponding synaptic weight is an important component of the membrane potential value;
Assuming that the neuron has N input synapses, the ith synapse has G i Pulse inputs whose time set of arrival at the neuron is noted as The firing time of the g-th pulse representing synapse i, the membrane potential V (t) of the neuron at time t can be expressed as:
wherein w is i Is the weight, t, of the ith input synapse of the neuron f Representation ofThe firing time of the f-th pulse, the refractory period function ρ represents the resetting process of the membrane potential, the impulse response function ε represents the response of the pulse to the postsynaptic neuron membrane potential, and t represents any one of the assumed times.
5. The software defect prediction method based on cost-sensitive width learning according to claim 1, wherein the step S7 of inputting the third data into a cost-sensitive width learning software defect prediction model, and training the cost-sensitive width learning software defect prediction model is specifically shown as follows:
given a training set { X, Y }, and M i A plurality of feature nodes; ith feature node Z in feature mapping layer i Can be expressed as:
Z i =φ i (XW i +b i ),i=1,2,...,p
wherein the weight W i And deviation term b i Is a random matrix with proper dimension, X is input data, phi i For the selected activation function; for the representation of feature node groups, Z can be p =(Z 1 ,Z 2 ,...,Z p ) Feature node group as feature mapping layer and Z p Further connected to the enhancement node group at the enhancement layer, Z p A similar transformation needs to be performed; in a different manner, the activation mode adopts nonlinear activation, and the output of the jth enhancement node can be expressed as:
wherein H is j Representing nonlinear activation output of enhancement layer from feature node group set Z p And by activating a function epsilon j Is obtained by nonlinear transformation of (a); weight W j And deviation term b j Is a random matrix with suitable dimensions; the activation output of the enhancement layer may then be further represented as
In order to ensure the sparsity of data during training, a ridge regression algorithm is introduced to finely adjust the weights of a feature mapping layer and an enhancement layer, and finally, the output of the cost-sensitive width learning software defect prediction model adopts the following form:
Y=(Z 1 ,Z 2 ,...,Z p ,H 1 ,H 2 ,...,H q )W q
=(Z p ,H q )W q
wherein Z is p Is a characteristic node, H q To strengthen the node, W q Is the weight; z is Z p For characteristic node groups, H q To enhance the node group.
6. Software defect prediction processing device based on cost sensitive width study, which is characterized in that the processing device comprises: the device comprises a software defect prediction data set acquisition module, a software defect prediction data set division module, a normalization processing module, an input pulse sequence acquisition module, an output pulse sequence reverse decoding module, a cost sensitive width learning software defect prediction model training module and a cost sensitive width learning software defect prediction model training module;
The software defect prediction data set acquisition module is used for constructing a software defect prediction data set in the following manner: extracting a historical software module from an existing software historical warehouse, extracting static properties of program codes from the historical software module, designing a software defect measuring element with strong correlation with similar software defects, and constructing a software defect prediction data set;
the software defect prediction data set dividing module divides the software defect prediction data set into a defective example and a non-defective example by adopting a cost sensitive learning algorithm, and acquires a first data set;
the normalization processing module is used for introducing a pulse neural network, performing normalization preprocessing on the software defect characteristics of the first data set, and acquiring a second data set;
the input pulse sequence acquisition module is used for carrying out linear coding on the second data set and converting the second data set into an input pulse sequence;
the output pulse sequence acquisition module is used for inputting the input pulse sequence into a pulse neural network model and acquiring an output pulse sequence through calculation of a pulse neuron operation model;
the output pulse sequence reverse decoding module is used for carrying out reverse decoding on the output pulse sequence, obtaining continuous output values of software defect characteristics and taking the continuous output values of the software defect characteristics as third data;
The cost sensitive width learning software defect prediction model training module inputs the third data into a cost sensitive width learning software defect prediction model, trains the cost sensitive width learning software defect prediction model, and acquires a trained cost sensitive width learning software defect prediction model;
and (3) inputting the data set to be predicted into the cost sensitive width learning software defect prediction model after the data set to be predicted is processed in the steps S2-S6, and outputting a software defect prediction result.
7. An electronic device, comprising:
one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the device, cause the device to perform the method of any of claims 1-5.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to perform the method according to any of claims 1 to 5.
CN202310502274.XA 2023-05-06 2023-05-06 Software defect prediction method and processing device based on cost sensitive width learning Pending CN116662160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310502274.XA CN116662160A (en) 2023-05-06 2023-05-06 Software defect prediction method and processing device based on cost sensitive width learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310502274.XA CN116662160A (en) 2023-05-06 2023-05-06 Software defect prediction method and processing device based on cost sensitive width learning

Publications (1)

Publication Number Publication Date
CN116662160A true CN116662160A (en) 2023-08-29

Family

ID=87714380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310502274.XA Pending CN116662160A (en) 2023-05-06 2023-05-06 Software defect prediction method and processing device based on cost sensitive width learning

Country Status (1)

Country Link
CN (1) CN116662160A (en)

Similar Documents

Publication Publication Date Title
Yang et al. Fast economic dispatch in smart grids using deep learning: An active constraint screening approach
CN112863683B (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN110232128A (en) Topic file classification method and device
CN113239702A (en) Intention recognition method and device and electronic equipment
CN112988851B (en) Counterfactual prediction model data processing method, device, equipment and storage medium
CN113807728A (en) Performance assessment method, device, equipment and storage medium based on neural network
CN117093477A (en) Software quality assessment method and device, computer equipment and storage medium
CN116703466A (en) System access quantity prediction method based on improved wolf algorithm and related equipment thereof
CN116934283A (en) Employee authority configuration method, device, equipment and storage medium thereof
CN116089605A (en) Text emotion analysis method based on transfer learning and improved word bag model
CN116662160A (en) Software defect prediction method and processing device based on cost sensitive width learning
CN113627514A (en) Data processing method and device of knowledge graph, electronic equipment and storage medium
CN112734205A (en) Model confidence degree analysis method and device, electronic equipment and computer storage medium
CN112434889A (en) Expert industry analysis method, device, equipment and storage medium
CN116340864B (en) Model drift detection method, device, equipment and storage medium thereof
CN111178630A (en) Load prediction method and device
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium
CN117056782A (en) Data anomaly identification method, device, equipment and storage medium thereof
CN116167872A (en) Abnormal medical data detection method, device and equipment
CN112307227B (en) Data classification method
CN117609013A (en) Software defect prediction method and device based on fuzzy weighted width learning
CN116824600A (en) Company seal identification method and related equipment thereof
CN117034875A (en) Text data generation method, device, equipment and storage medium thereof
CN116866202A (en) Network traffic prediction method and device and storage medium
CN118037455A (en) Financial data prediction method, device, equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination