CN117312138A - Software defect detection method, device, computer equipment, storage medium and product - Google Patents

Software defect detection method, device, computer equipment, storage medium and product Download PDF

Info

Publication number
CN117312138A
CN117312138A CN202311197320.6A CN202311197320A CN117312138A CN 117312138 A CN117312138 A CN 117312138A CN 202311197320 A CN202311197320 A CN 202311197320A CN 117312138 A CN117312138 A CN 117312138A
Authority
CN
China
Prior art keywords
index
software
defect
combination
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311197320.6A
Other languages
Chinese (zh)
Inventor
鲍睿
蔡晓东
解媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202311197320.6A priority Critical patent/CN117312138A/en
Publication of CN117312138A publication Critical patent/CN117312138A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a software defect detection method, a software defect detection device, a software defect detection computer device, a software defect detection storage medium and a software defect detection product. The method comprises the following steps: determining a software project to be detected, and acquiring a defect number prediction model aiming at the software project to be detected; obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants; determining N pieces of software item index data corresponding to N index categories respectively for the software item to be tested; substituting the index data of the N software items into corresponding index variables in the M index combination expressions respectively to obtain M combination index values; and inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model. The method can improve the reliability of the test result.

Description

Software defect detection method, device, computer equipment, storage medium and product
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a computer device, a storage medium, and a product for detecting software defects.
Background
With the development of internet technology, the reliability of the quality of software products has become a major concern in the field of software engineering, and after software development is completed, software testing is usually performed. Software testing is a process for facilitating the verification of the correctness, integrity, security and quality of software, and is specifically a process for operating a program under specified conditions to find program errors, measure the quality of the software, and evaluate whether it meets design requirements.
However, in the conventional software testing process, the software testing is performed depending on the familiarity degree of the tester with the service and the accumulation of experience, and the software testing is performed depending on the manual experience judgment of the tester, so that the problem of unreliable testing result exists.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a software defect detection method, apparatus, computer device, storage medium, and product that can improve the reliability of test results.
In a first aspect, the present application provides a software defect detection method. The method comprises the following steps:
determining a software project to be tested, and acquiring a defect number prediction model aiming at the software project to be tested;
Obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants;
determining N pieces of software item index data corresponding to the N index categories respectively aiming at the software item to be tested;
substituting the N pieces of software item index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values;
and inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
In a second aspect, the present application further provides a software defect prediction apparatus. The device comprises:
the system comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for determining a software item to be detected and acquiring a defect number prediction model aiming at the software item to be detected; obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants; determining N pieces of software item index data corresponding to the N index categories respectively aiming at the software item to be tested;
The substituting module is used for substituting the N pieces of software project index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values;
and the output module is used for inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
determining a software project to be tested, and acquiring a defect number prediction model aiming at the software project to be tested;
obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants;
determining N pieces of software item index data corresponding to the N index categories respectively aiming at the software item to be tested;
substituting the N pieces of software item index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values;
And inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
determining a software project to be tested, and acquiring a defect number prediction model aiming at the software project to be tested;
obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants;
determining N pieces of software item index data corresponding to the N index categories respectively aiming at the software item to be tested;
substituting the N pieces of software item index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values;
and inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
Determining a software project to be tested, and acquiring a defect number prediction model aiming at the software project to be tested;
obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants;
determining N pieces of software item index data corresponding to the N index categories respectively aiming at the software item to be tested;
substituting the N pieces of software item index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values;
and inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
According to the software defect detection method, the device, the computer equipment, the storage medium and the product, through N software item index data respectively corresponding to the software item to be detected and N index categories, M combination index values corresponding to the M index combination expressions are determined, and then the M combination index values are input into the defect number prediction model to obtain the software defect number output by the defect number prediction model, so that the software defect number of the software item to be detected can be predicted, and a tester can evaluate the test result of the software item by referring to the predicted software defect number, so that the reliability of the test result is improved.
Drawings
FIG. 1 is a flow chart of a software defect detection method according to one embodiment;
FIG. 2 is a flow chart of a training step in one embodiment;
FIG. 3 is a block diagram of a software defect detection device in one embodiment;
fig. 4 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a software defect detection method is provided, where the method is applied to a computer device, and the computer device may be a terminal or a server, and it is understood that the method may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step 102, determining a software project to be tested, and obtaining a defect number prediction model for the software project to be tested.
The software project to be tested is a software project of which the number of software defects is to be predicted. The software items are items for performing software testing on software. Software is a collection of computer data and instructions organized in a particular order. In the banking field, software such as mobile banking software, banking software or others. The defect number prediction model is a model for predicting the number of software defects for a software project.
In one embodiment, a computer device may determine a software project under test for which a defect number prediction event is intended in response to the defect number prediction event, and obtain a defect number prediction model for the software project under test. Wherein the defect number prediction event is an event that triggers to perform software defect number prediction on the software project. The defect number prediction event may be an automatic trigger event, such as a timed trigger of the item of software to be tested; the operation may also be a manual triggering operation, such as a clicking operation or a triggering operation on a function key of the software item identifier to be tested displayed in the interface. The software item to be tested identification may be a software item to be tested number, name or icon.
104, obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; where M and N are integer constants.
Wherein the index category is the category of the index. The index categories may include the number of test cases, the length of test time, the scale of software code, the number of test team personnel, or others. The software code scale is the code scale of the software code tested by the software project, and the code scale can be specifically the number of code lines. The index variable is an amount that characterizes a corresponding index class and whose value can be changed. For example, when the index category is a software code scale, different software items, and the values of the index variables corresponding to the software code scale are different.
The linear combination of the index variables corresponding to the N index categories may be N algebraic additions formed by multiplying the index variables corresponding to the N index categories with constant coefficients. M is the number of index combination expressions. N is the number of index categories. M, N are positive integers, take different values, and M < N, for example N may be 4 and M may be 2. The constant coefficient is a coefficient of an index variable, and takes a constant value. It is understood that each index combination expression in the M index combination expressions is different.
In one embodiment, the computer device may directly obtain M index combination expressions associated with the defect number prediction model that are pre-stored for the defect number prediction model.
In one embodiment, the computer device may obtain M sets of constant coefficients associated with the defect number prediction model, and linearly combine each set of constant coefficients with the index variables corresponding to the N index categories, respectively, to obtain M index combination expressions. Wherein each constant coefficient in each set of constant coefficients has a corresponding index variable.
And 106, determining N pieces of software item index data corresponding to the N index categories respectively for the software item to be tested.
Wherein the software item index data is a specific index value corresponding to the index category of the software item. For example, when the index class is the number of test cases, the software item index data corresponding to the number of test cases may be a specific value of the number of test cases, for example, 10.
In one embodiment, the computer device may obtain N pieces of software item index data corresponding to the N index categories, respectively, pre-counted for the software item to be tested from the item data table. The item data table is a data table for recording N pieces of software item index data corresponding to N index categories, respectively, for each software item.
In one embodiment, the computer device may determine, for the software item to be tested, a statistical object and a preset statistical algorithm corresponding to the N index categories, respectively, and perform statistics on the statistical objects corresponding to the N index categories, respectively, according to the corresponding preset statistical algorithm, to obtain N software item index data corresponding to the N index categories, respectively. The statistical object is an object of the software item and index category corresponding to the software item to be statistically obtained. The preset statistical algorithm is a specific calculation method for obtaining the index data of the software project by counting the statistical objects. For example, the index category is the scale of the software code, the statistical object may be the software code tested by the software project, and the preset statistical algorithm may be the number of lines of the statistical code for the software code tested by the software project.
And step 108, substituting the N pieces of software item index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values.
The combination index value is obtained by substituting N pieces of software item index data into corresponding index variables in an index combination expression.
In one embodiment, the computer device may, for each of the M index combination expressions, substitute the N software item index data into the index combination expression for which it is intended, obtain a combination index value corresponding to the index combination expression for which it is intended, and obtain the M combination index values.
And 110, inputting the M combined index values into a defect number prediction model to obtain the software defect number output by the defect number prediction model.
The defect number prediction model may specifically be a multiple linear regression model, and the independent variables of the model expressions corresponding to the multiple linear regression model may be M index combination variables corresponding to the M index combination expressions one to one respectively.
Software defects are problems, errors or hidden functional defects of the computer software that disrupt the normal running capability. The software Bug may be referred to as a Bug. The number of software defects is the number of software defects. The output software defect number is the number of predicted software defects of the software tested by the software project to be tested through the defect number prediction model. Before the software project is finished, the tester can compare the predicted software defect number with the actual defect number obtained by the actual test, so as to evaluate the reliability of the test result of the software project; when the difference between the predicted software defect number and the actual defect number is smaller, the test result of the software project can be considered to be more reliable; when the difference between the predicted software defect number and the actual defect number is large, the software project can be adjusted in time to continue the software test, and a test result with higher reliability is obtained.
In one embodiment, the computer device may substitute the M combination index values into corresponding index combination variables in a model expression corresponding to the defect number prediction model, to obtain the number of software defects output by the defect number prediction model. The index combination variable can be a variable referring to an index combination expression, and the value of the index combination variable is a combination index value of the corresponding index combination expression.
According to the software defect detection method, the M combination index values corresponding to the M index combination expressions are determined through the N software item index data respectively corresponding to the software item to be detected and the N index categories, and then the M combination index values are input into the defect number prediction model to obtain the software defect number output by the defect number prediction model, so that the software defect number of the software item to be detected can be predicted, and a tester can evaluate the test result of the software item by referring to the predicted software defect number, so that the reliability of the test result is improved.
In one embodiment, as shown in FIG. 2, the defect number prediction model is generated through a training step that includes:
step 202, obtaining training samples of each sample software item in the sample set, wherein each training sample comprises N sample software item index data of N index categories of one sample software item.
Wherein the sample set is a set formed by training samples of S sample software items. S is the number of sample software items, S is a positive integer, and the value of S can be 100, 200 or other. The training samples are samples for training the defect number prediction model to be trained. The sample software items are software items corresponding to training samples. The sample software item index data is a specific index value of the sample software item corresponding to the index category.
Step 204, based on the sample software project index data in each training sample in the sample set, constructing M index combination expressions expressed in N index variable combinations corresponding to N index categories; wherein M < N.
In one embodiment, the computer device may construct a covariance matrix based on sample software project index data in each training sample in the sample set, and construct M index combination expressions expressed in N index variable combinations corresponding to N index categories based on eigenvectors of the covariance matrix.
Step 206, determining M sample combination index values of each training sample according to the sample software project index data and M index combination expressions in each training sample in the sample set.
The sample combination index value is an index value corresponding to the index combination expression and is calculated by substituting N sample software index data of each training sample into a corresponding index variable in the index combination expression.
In one embodiment, the computer device may respectively substitute the N sample software item index data of each training sample in the sample set into corresponding index variables in the M index combination expressions to obtain M sample combination index values of each training sample.
Step 208, performing model training iteratively based on the M sample combination index values of each training sample to obtain a defect number prediction model.
In one embodiment, the computer device may divide the sample set into a test sample set and a training sample set, and iteratively train the defect number prediction model to be trained based on M sample combination index values of each training sample in the training sample set until the function value of the loss function converges, to obtain a defect number prediction model when the function value converges;
and predicting the sample software items corresponding to each test sample in the test sample set to obtain the software defect number according to the defect number prediction model when the function value is converged, determining the defect number prediction model when the function value is converged according to the predicted software defect number corresponding to each test sample in the test sample set and the actual defect number of the sample software items corresponding to each test sample in the test sample set, and determining the defect number prediction model when the function value is converged as a final defect number prediction model when the prediction accuracy is not lower than the preset accuracy.
In one embodiment, the computer device may iteratively train the defect number prediction model to be trained based on the M sample combination index values of each training sample in the training sample set until the function value of the loss function converges, to obtain the defect number prediction model when the function value converges; and (3) each time of iterative training, inputting M sample combination index values of each training sample in a training sample set into a defect number prediction model to be trained, obtaining the software defect number output by the defect number prediction model to be trained, determining the function value of a loss function according to the software defect number and the actual defect number output by the model corresponding to each training sample, optimizing the model parameters of the defect number prediction model to be trained, continuously executing the steps of combining index values of M samples of each training sample in the training sample set, and inputting the defect number prediction model to be trained until the function value of the loss function converges.
In this embodiment, N index variables corresponding to N index categories are combined into M index combination expressions, and model training is performed iteratively based on M sample combination index values, where M is less than N, so that the number of parameters of model training is reduced, and the sample combination index values are related to N index categories of a sample software item, so that accuracy of a model is ensured, and meanwhile, efficiency of model training is improved.
In one embodiment, step 204 includes: constructing a covariance matrix based on sample software project index data in each training sample in the sample set; determining N eigenvalues and N corresponding eigenvectors of the covariance matrix; determining a contribution rate corresponding to each of the N characteristic values; sorting the N characteristic values in descending order according to the corresponding contribution rate to obtain a characteristic value sequence; determining M continuous characteristic values from the beginning of the characteristic value sequence, wherein the sum of contribution rates corresponding to the M characteristic values reaches a preset value, and the sum of contribution rates corresponding to the M-1 continuous characteristic values from the beginning is smaller than the preset value; m index combination expressions expressed in N index variable combinations corresponding to N index categories are constructed based on the feature vectors corresponding to the M feature values, respectively.
The covariance matrix (Covariance Matrix) is a matrix formed by covariance between every two index vectors in N index vectors corresponding to N index categories. The covariance matrix is a matrix of n×n, N index vectors are set as index vectors respectively numbered from 1 to N, and each element in the covariance matrix is the covariance between the index vector numbered as the row of the element and the index vector numbered as the column of the element. For example, the element in row 2, column 1 of the covariance matrix is the covariance between the index vector labeled 2 and the index vector labeled 1. When the row and the column of the element in the covariance matrix are the same, the index vector of the row of the element is the same as the index vector of the column of the element, and the covariance of the index vector corresponding to the element is the variance of the index vector corresponding to the element.
Each index vector of the N index vectors is a matrix with a number of rows S and a number of columns 1, i.e., the index vector includes S elements. The elements in the index vector are sample software project index data of corresponding index categories of the index vector of each of S training samples in the sample set. For example, when the index class to which the index vector corresponds is a software code scale, the element of the index vector is a specific value of the respective software code scale in the S training samples.
The N eigenvalues and the N eigenvectors may be in one-to-one correspondence. The product of the covariance matrix and one eigenvector is the same as the product of the one eigenvector and the corresponding eigenvalue. Each of the N feature vectors is a matrix with N rows and 1 columns, and the elements of each row correspond to the index vector with the same reference number as the row number in the N index vectors, and simultaneously correspond to the index category corresponding to the index vector. The contribution rate is the ratio of each eigenvalue to the sum of the N eigenvalues. The preset value is a preset contribution rate such as 85%, 90%, or others. M-1 is a number one less than M, e.g., M is 4 when M is 5.
In this embodiment, the covariance matrix is constructed by using the index data of the sample software item in each training sample in the sample set, so as to determine N eigenvalues and corresponding N eigenvectors of the covariance matrix, and M eigenvalues with higher contribution rates and accumulated contribution rates reaching a preset value can be determined by sorting the N eigenvalues in descending order of the contribution rates, so that M index combination expressions can be determined, N index variables can be accurately expressed while the index dimension is reduced, and the accuracy of the defect number prediction model for subsequent training can be improved.
In one embodiment, the computer device may construct an index variable matrix according to N index variables corresponding to the N index categories, and multiply feature vectors corresponding to the M feature values with the index variable matrix respectively to obtain M index combination expressions expressed in N index variable combinations. The number of rows of the index variable matrix is 1, the number of columns of the index variable matrix is N, and the N index variables in the index variable matrix are ordered according to the labels of the corresponding index vectors.
In one embodiment, the software defect detection further includes the steps of: acquiring a software code corresponding to a software item to be tested; extracting code characteristic values from the software codes; determining a combined feature value based on the code feature value and the number of software defects; and inputting the combined characteristic values into a defect detection model to obtain defect positioning information output by the defect detection model, wherein the quantity of the output defect positioning information is matched with the quantity of software defects.
The code feature value is a value of a code feature for predicting defect positioning information, and may specifically be a text feature value of a software code text. The combined feature value is a feature value of a combination of the code feature value and the software defect number. The combined feature value may be a concatenation of the code feature value and the number of software defects. The defect positioning information represents the position of a code segment with defects in a software code corresponding to a software item to be detected.
The defect detection model is a model for predicting the positions of code segments with software defects in the software code tested by the software project to be tested. The defect detection model may specifically be a natural language processing model, such as a GPT model (generating Pre-trained Transformer, pre-training language model based on a transducer architecture), a recurrent neural network (Recurrent Neural Network, RNN), a Long Short-Term Memory network (LSTM), and the like. The defect detection model may be obtained by training based on samples of software code composition tested by each of a plurality of software items, and each sample may include a software code tested by one software item, an actual number of defects in the software code, and a tag for whether a defect exists for each code segment mark in the software code.
In this embodiment, the code feature value is extracted from the software code corresponding to the software item to be tested, the code feature value is combined with the number of software defects predicted by the defect number prediction model, and the combined feature value is input into the defect detection model, so that defect positioning information can be predicted and obtained, a tester can rapidly position the code segment with the defect, and software testing efficiency is improved.
In one embodiment, the computer device may extract code feature values for code text of the software code through a text feature extraction model. The text feature extraction model is a model for extracting feature values of text, and may include TF-IDF (Term Frequency-Inverse Document Frequency, word Frequency-inverse document Frequency, a common weighting technique for information retrieval and data mining), bag of Words model (BoW), word2vector model, or others.
In one embodiment, the step of extracting code feature values from the software code comprises: determining a first code segment in the software code, which is correspondingly marked with annotation text; determining a second code segment of the software code not marked with the annotation text; combining the first code segment and the annotation text of the corresponding mark to extract a first characteristic value; combining the second code segment and the annotation-free mark to extract a second characteristic value; a code feature value is determined based on the first feature value and the second feature value.
Where annotated text is text in the code that is used to improve the readability of the code, the purpose, structure, function, meaning, or otherwise of the code may be characterized. The first code segment is a code segment of the software code marked with annotated text. The second code segment is a code segment that is not marked with annotated text. The code segment may specifically be a segment obtained by hierarchically dividing the software code. Layering may be in particular layering according to code functions, e.g. may be divided into a user interface layer, an application layer, a domain layer, a data access layer, an infrastructure layer or others, each layer may also be further subdivided, e.g. dividing the code in the application layer providing different application services.
In this embodiment, after the first code segment and the annotation text corresponding to the mark are combined, the first feature value is extracted, and the marked annotation text is referred to, so that the capability of the first feature value to express the first code segment can be improved, the accuracy of the code feature value of the software code can be improved, and the accuracy of the defect positioning information of the subsequent prediction can be improved.
In one embodiment, the software defect detection further includes the steps of: obtaining W actual defect numbers corresponding to W historical test software items; determining the absolute value of the difference value between the W actual defect numbers and the corresponding software defect numbers respectively; determining the number of historical test software items of which the absolute value of the difference exceeds a preset threshold; determining the ratio of the number of items to W historical test software items; and when the ratio exceeds a preset ratio, updating N index categories, and retraining the defect number prediction model based on the updated index categories.
The historical test software item is a software item for testing software in a historical manner. W is the number of historical test software items, e.g. 10, 20 or others. The actual defect number is the number of software defects actually tested by the historical test software project, and specifically may be the number of software defects actually tested by the historical test software project from the beginning of the test to the end of the test. The software defect number of the historical test software project is obtained by taking the historical test software project as the software project to be tested and predicting through a defect number prediction model.
The absolute value of the difference between the W actual defect numbers and the corresponding software defect numbers is the absolute value of the difference between each actual defect number in the W actual defect numbers and the corresponding software defect numbers. The preset threshold is a preset threshold, which may be 10, 15 or others. The preset ratio is a preset ratio, and may be 20%, 30% or others.
In this embodiment, by comparing W actual defect numbers of W historical test software items with W software defect numbers predicted by the defect number prediction model, model accuracy of the defect number prediction model may be estimated, and when the ratio exceeds a preset ratio, the model accuracy is not high, and N index categories may be updated in time, so that prediction accuracy of the defect number prediction model may be improved.
In one embodiment, the computer device may determine K index class combinations from the index class set when the ratio exceeds a preset ratio, determine sample sets corresponding to the K index class combinations, and retrain based on the K sample sets to obtain K candidate defect number prediction models; respectively determining W software defect numbers corresponding to W historical test software items output by each candidate defect number prediction model through each candidate defect number prediction model in the K candidate defect number prediction models;
Based on W software defect numbers output by the K candidate defect number prediction models, the steps of determining the absolute difference value of W actual defect numbers and corresponding software defect numbers respectively, determining the item number of historical test software items with the absolute difference value exceeding a preset threshold value, and determining the ratio of the item number to the W historical test software items are executed, K ratios are obtained, the target ratio of the K ratios is determined, the target ratio is the smallest in the K ratios and does not exceed the preset ratio, and the candidate defect number prediction model corresponding to the target ratio is used as an updated defect number prediction model.
Wherein K is a positive integer. Each of the K index category combinations includes N candidate index categories, and different index category combinations may include the same candidate index category, but the number of the same candidate index category in different index category combinations is less than N. Each training sample in the sample set corresponding to the index category combination comprises N sample software item index data of N candidate index categories corresponding to one sample software item. N candidate index categories corresponding to the updated defect number prediction model are N index categories.
In one embodiment, in a specific application scenario, the software defect detection method specifically includes the following steps.
The computer device may obtain training samples for each sample software item in the sample set, each training sample comprising N sample software item index data for N index categories for one sample software item; constructing M index combination expressions expressed in N index variable combinations corresponding to N index categories based on sample software item index data in each training sample in the sample set; determining M sample combination index values of each training sample according to sample software project index data and M index combination expressions in each training sample in the sample set; and performing model training iteratively based on M sample combination index values of each training sample to obtain a defect number prediction model.
The computer equipment can determine a software project to be detected, and acquire a defect number prediction model aiming at the software project to be detected; obtaining M index combination expressions associated with a defect number prediction model; determining N pieces of software item index data corresponding to N index categories respectively for the software item to be tested; substituting the index data of the N software items into corresponding index variables in the M index combination expressions respectively to obtain M combination index values; and inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
The computer equipment can acquire a software code corresponding to a software item to be tested; extracting code characteristic values from the software codes; determining a combined feature value based on the code feature value and the number of software defects; and inputting the combined characteristic values into a defect detection model to obtain defect positioning information output by the defect detection model, wherein the quantity of the output defect positioning information is matched with the quantity of software defects.
The computer equipment can acquire W actual defect numbers corresponding to the W historical test software items; w is a positive integer; determining the absolute value of the difference value between the W actual defect numbers and the corresponding software defect numbers respectively; determining the number of historical test software items of which the absolute value of the difference exceeds a preset threshold; determining the ratio of the number of items to W historical test software items; and when the ratio exceeds a preset ratio, updating N index categories, and retraining the defect number prediction model based on the updated index categories.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a software defect detection device for implementing the software defect detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the software defect detection device provided below may refer to the limitation of the software defect detection method hereinabove, and will not be repeated herein.
In one embodiment, as shown in FIG. 3, a software defect detection apparatus 300 is provided, comprising: an acquisition module 310, a substitution module 320, and an output module 330, wherein:
an obtaining module 310, configured to determine a software item to be tested, and obtain a defect number prediction model for the software item to be tested; obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants; determining N pieces of software item index data corresponding to N index categories respectively for the software item to be tested;
a substituting module 320, configured to substitute the N pieces of software item index data into corresponding index variables in the M index combination expressions, respectively, to obtain M combination index values;
And the output module 330 is configured to input the M combined index values into the defect number prediction model, and obtain the number of software defects output by the defect number prediction model.
In one embodiment, the software defect detecting apparatus 300 further includes a constructing module and a training module; the obtaining module 310 is further configured to obtain training samples of each sample software item in the sample set, where each training sample includes N sample software item index data of N index categories of one sample software item; the construction module is used for constructing M index combination expressions expressed by N index variable combinations corresponding to N index categories based on the index data of the sample software project in each training sample in the sample set; wherein M is less than N; the substitution module 320 is further configured to determine M sample combination index values of each training sample according to the sample software project index data and the M index combination expressions in each training sample in the sample set; the training module is used for iteratively performing model training based on M sample combination index values of each training sample to obtain a defect number prediction model.
In one embodiment, the constructing module is further configured to construct a covariance matrix based on sample software project index data in each training sample in the sample set; determining N eigenvalues and N corresponding eigenvectors of the covariance matrix; determining a contribution rate corresponding to each of the N characteristic values; sorting the N characteristic values in descending order according to the corresponding contribution rate to obtain a characteristic value sequence; determining M continuous characteristic values from the beginning of the characteristic value sequence, wherein the sum of contribution rates corresponding to the M characteristic values reaches a preset value, and the sum of contribution rates corresponding to the M-1 continuous characteristic values from the beginning is smaller than the preset value; m index combination expressions expressed in N index variable combinations corresponding to N index categories are constructed based on the feature vectors corresponding to the M feature values, respectively.
In one embodiment, the software defect detection apparatus 300 further includes a defect detection module, where the defect detection module is configured to obtain a software code corresponding to a software item to be detected; extracting code characteristic values from the software codes; determining a combined feature value based on the code feature value and the number of software defects; and inputting the combined characteristic values into a defect detection model to obtain defect positioning information output by the defect detection model, wherein the quantity of the output defect positioning information is matched with the quantity of software defects.
In one embodiment, the defect detection module is further configured to determine a first code segment in the software code that is correspondingly marked with annotation text; determining a second code segment of the software code not marked with the annotation text; combining the first code segment and the annotation text of the corresponding mark to extract a first characteristic value; combining the second code segment and the annotation-free mark to extract a second characteristic value; a code feature value is determined based on the first feature value and the second feature value.
In one embodiment, the software defect detection apparatus 300 further includes an index type update module, where the index type update module is configured to obtain W actual defect numbers corresponding to W historical test software items; w is a positive integer; determining the absolute value of the difference value between the W actual defect numbers and the corresponding software defect numbers respectively; determining the number of historical test software items of which the absolute value of the difference exceeds a preset threshold; determining the ratio of the number of items to W historical test software items; and when the ratio exceeds a preset ratio, updating N index categories, and retraining the defect number prediction model based on the updated index categories.
The respective modules in the above-described software defect detecting device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data to be stored for performing software defect detection. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of software defect detection.
Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only and do not constitute a limitation of the computer device on which the present aspects apply, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method for detecting software defects, the method comprising:
determining a software project to be tested, and acquiring a defect number prediction model aiming at the software project to be tested;
obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants;
Determining N pieces of software item index data corresponding to the N index categories respectively aiming at the software item to be tested;
substituting the N pieces of software item index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values;
and inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
2. The method of claim 1, wherein the defect number prediction model is generated via a training step comprising:
acquiring training samples of each sample software item in a sample set, wherein each training sample comprises N sample software item index data of N index categories of one sample software item;
constructing M index combination expressions expressed in N index variable combinations corresponding to the N index categories based on sample software item index data in each training sample in the sample set; wherein M is less than N;
determining M sample combination index values of each training sample according to sample software item index data and the M index combination expressions in each training sample in the sample set;
And performing model training iteratively based on M sample combination index values of each training sample to obtain a defect number prediction model.
3. The method of claim 2, wherein constructing M index combination expressions expressed in N index variable combinations corresponding to the N index categories based on sample software item index data in each training sample in the sample set comprises:
constructing a covariance matrix based on sample software project index data in each training sample in the sample set;
determining N eigenvalues of the covariance matrix and N eigenvectors corresponding to the N eigenvalues;
determining a contribution rate corresponding to each of the N characteristic values;
sorting the N characteristic values in descending order according to the corresponding contribution rate to obtain a characteristic value sequence;
determining M continuous characteristic values from the first position of the characteristic value sequence, wherein the sum of contribution rates corresponding to the M characteristic values reaches a preset value, and the sum of contribution rates corresponding to M-1 continuous characteristic values from the first position is smaller than the preset value;
m index combination expressions expressed in N index variable combinations corresponding to the N index categories are constructed based on the feature vectors corresponding to the M feature values.
4. The method according to claim 1, wherein the method further comprises:
acquiring a software code corresponding to the software item to be tested;
extracting code characteristic values from the software codes;
determining a combined feature value based on the code feature value and the software defect number;
and inputting the combined characteristic values into a defect detection model to obtain defect positioning information output by the defect detection model, wherein the quantity of the output defect positioning information is matched with the quantity of the software defects.
5. The method of claim 4, wherein extracting code feature values from the software code comprises:
determining a first code segment in the software code, which is correspondingly marked with annotation text;
determining a second code segment of the software code not marked with annotated text;
extracting a first characteristic value after combining the first code segment and the annotation text of the corresponding mark;
combining the second code segment and the annotation-free mark to extract a second characteristic value;
a code feature value is determined based on the first feature value and the second feature value.
6. The method according to any one of claims 1 to 5, further comprising:
Obtaining W actual defect numbers corresponding to W historical test software items; w is a positive integer;
determining absolute difference values of the W actual defect numbers and the corresponding software defect numbers respectively;
determining the number of historical test software items of which the absolute value of the difference exceeds a preset threshold;
determining the ratio of the number of items to the W historical test software items;
and when the ratio exceeds a preset ratio, updating the N index categories, and retraining the defect number prediction model based on the updated index categories.
7. A software defect prediction apparatus, the apparatus comprising:
the system comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for determining a software item to be detected and acquiring a defect number prediction model aiming at the software item to be detected; obtaining M index combination expressions associated with the defect number prediction model, wherein each index combination expression is a linear combination of index variables corresponding to N index categories; wherein M and N are integer constants; determining N pieces of software item index data corresponding to the N index categories respectively aiming at the software item to be tested;
the substituting module is used for substituting the N pieces of software project index data into corresponding index variables in the M index combination expressions respectively to obtain M combination index values;
And the output module is used for inputting the M combined index values into the defect number prediction model to obtain the software defect number output by the defect number prediction model.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311197320.6A 2023-09-15 2023-09-15 Software defect detection method, device, computer equipment, storage medium and product Pending CN117312138A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311197320.6A CN117312138A (en) 2023-09-15 2023-09-15 Software defect detection method, device, computer equipment, storage medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311197320.6A CN117312138A (en) 2023-09-15 2023-09-15 Software defect detection method, device, computer equipment, storage medium and product

Publications (1)

Publication Number Publication Date
CN117312138A true CN117312138A (en) 2023-12-29

Family

ID=89249108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311197320.6A Pending CN117312138A (en) 2023-09-15 2023-09-15 Software defect detection method, device, computer equipment, storage medium and product

Country Status (1)

Country Link
CN (1) CN117312138A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118277264A (en) * 2024-04-08 2024-07-02 北京精微致合测试技术有限公司 Software defect static identification method based on machine learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118277264A (en) * 2024-04-08 2024-07-02 北京精微致合测试技术有限公司 Software defect static identification method based on machine learning

Similar Documents

Publication Publication Date Title
CN111881983B (en) Data processing method and device based on classification model, electronic equipment and medium
US20210149896A1 (en) Inferring joins for data sets
Galli Python feature engineering cookbook
CN110442516B (en) Information processing method, apparatus, and computer-readable storage medium
CN111639193A (en) Product risk assessment method and device, electronic equipment and storage medium
Molin Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
CN113011191A (en) Knowledge joint extraction model training method
CN109767308A (en) Time and cost feature selection method, equipment, medium in financial fraud detection
CN117312138A (en) Software defect detection method, device, computer equipment, storage medium and product
CN109376741A (en) Recognition methods, device, computer equipment and the storage medium of trademark infringement
CN112818162A (en) Image retrieval method, image retrieval device, storage medium and electronic equipment
CN112084330A (en) Incremental relation extraction method based on course planning meta-learning
CN116414815A (en) Data quality detection method, device, computer equipment and storage medium
CN116795977A (en) Data processing method, apparatus, device and computer readable storage medium
US11416712B1 (en) Tabular data generation with attention for machine learning model training system
CN111709225A (en) Event cause and effect relationship judging method and device and computer readable storage medium
US11809980B1 (en) Automatic classification of data sensitivity through machine learning
CN117251777A (en) Data processing method, device, computer equipment and storage medium
CN116501979A (en) Information recommendation method, information recommendation device, computer equipment and computer readable storage medium
CN115907954A (en) Account identification method and device, computer equipment and storage medium
US11797961B2 (en) Vectorization of transactions
CN113821571A (en) Food safety relation extraction method based on BERT and improved PCNN
CN110647630A (en) Method and device for detecting same-style commodities
Rafatirad et al. Machine learning for computer scientists and data analysts
US11816427B1 (en) Automated data classification error correction through spatial analysis using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination