CN114490396A - Software test requirement mining method and system - Google Patents
Software test requirement mining method and system Download PDFInfo
- Publication number
- CN114490396A CN114490396A CN202210103297.9A CN202210103297A CN114490396A CN 114490396 A CN114490396 A CN 114490396A CN 202210103297 A CN202210103297 A CN 202210103297A CN 114490396 A CN114490396 A CN 114490396A
- Authority
- CN
- China
- Prior art keywords
- software
- test
- fault
- tested
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3684—Test management for test design, e.g. generating new test cases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/10—Requirements analysis; Specification techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method and a system for mining software test requirements, which comprises the following steps: acquiring fault description information of different types of software, and establishing a software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information, wherein the software fault mode knowledge base comprises a software type, a software function, a fault mode and a test point; according to the software type and the software function of the software to be tested, a fault mode and a test point corresponding to the software function to be tested are searched in a software fault mode knowledge base, for each test point corresponding to the software function to be tested, whether the test point exists in a test requirement text of the software function to be tested is judged by adopting a similarity matching algorithm, and if the test point does not exist, the test point and the corresponding fault mode are pushed to a tester.
Description
Technical Field
The invention relates to the technical field of software testing, in particular to a method and a system for mining software testing requirements.
Background
Software testing has great significance for maintaining software security and reliability. And the software third party evaluation organization carries out third party evaluation work according to the software development task book, the software requirement specification description and the software design document. The purpose of the test is not only to verify the correctness of the software, but also to provide a basis for the evaluation of the software. Therefore, the software quality evaluated by the third party is delivered to have certain requirements on the software matching documents.
The degree of coverage of software functionality by software test requirements determines the quality of the software test. At present, the testing requirements for software still stay in the stage of manually decomposing the specification of the software requirements, further decomposing the software requirements to obtain the testing requirements, testing functional items and compiling the testing outline, and the coverage degree of the traditional testing requirement items and sub items on the software functions depends on the experience reserve of software testers to a great extent, so that the testing efficiency and the testing accuracy are low.
Disclosure of Invention
In view of the foregoing analysis, embodiments of the present invention provide a method and system for mining software testing requirements, so as to solve the problem of low testing efficiency and accuracy in the prior art.
In one aspect, an embodiment of the present invention provides a method for mining software test requirements, including:
acquiring fault description information of different types of software, and establishing a software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information, wherein the software fault mode knowledge base comprises a software type, a software function, a fault mode and a test point;
according to the software type and the software function of the software to be tested, a fault mode and a test point corresponding to the software function to be tested are searched in a software fault mode knowledge base, for each test point corresponding to the software function to be tested, whether the test point exists in a test requirement text of the software function to be tested is judged by adopting a similarity matching algorithm, and if the test point does not exist, the test point and the corresponding fault mode are pushed to a tester.
Based on the further improvement of the technical scheme, the method for establishing the software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information comprises the following steps:
for each software function of each type of software, converting all corresponding software fault description information into vector representation;
clustering fault description information represented by the vectors by adopting a mean shift clustering algorithm to obtain a software fault cluster;
performing fault tree analysis on the software fault cluster to form a software fault mode; establishing a test point corresponding to the fault mode according to the class cluster corresponding to the software fault mode;
and constructing a software failure mode knowledge base based on the failure mode and the test point corresponding to each software function of each type of software.
Further, clustering the fault description information represented by the vectors by using a mean shift clustering algorithm to obtain a software fault cluster, including:
s121, selecting any data point in a vector space of the fault description information as a central point;
s122, calculating the mass centers of all data points in the high-dimensional sphere with the central point as the sphere center and h as the radius;
s123, if the distance between the centroid and the central point is smaller than a second threshold value or reaches the maximum iteration number, the iteration is finished, all data points in the current high-dimensional sphere range are in a cluster, and the step S124 is executed; otherwise, the centroid is taken as the central point, and the step S122 is returned;
s124, if all data points are classified, finishing clustering; otherwise, randomly selecting any data point which is not classified as a central point, and returning to the step S122.
Further, for each test point corresponding to the function of the software to be tested, judging whether the test point exists in a test requirement text of the function of the software to be tested by adopting a similarity matching algorithm, wherein the method comprises the following steps:
performing word segmentation on a test requirement text of a software function to be tested, and calculating a TF-IDF value of each word segmentation on the test requirement text;
performing word segmentation on all test points corresponding to the functions of the software to be tested, and calculating TF-IDF values of each word segmentation on the test points;
calculating the semantic structure similarity and the spatial structure similarity of each test item in the test requirement text and the test point corresponding to the function of the software to be tested based on the TF-IDF value;
calculating comprehensive text similarity based on the semantic structure similarity and the spatial structure similarity; if the test item with the comprehensive text similarity of the test point being greater than or equal to the first threshold exists in the test text, judging that the test point exists in the test requirement text of the software function to be tested, otherwise, judging that the test point does not exist in the test requirement text of the software function to be tested.
Further, for each test point corresponding to the function of the software to be tested, calculating the semantic structure similarity and the spatial structure similarity of the test point and each test item in the test requirement text based on the TF-IDF value, including:
according to the formulaCalculating semantic structure similarity, wherein alphaiX representing common participles of test points and test itemsiThe TF-IDF value of the test point is determined; beta is aiX representing common participlesiTF-IDF values for the test requirements document;
constructing vector representation X 'of the test item according to TF-IDF value of each particleble in the test item to the test requirement text'1;
Constructing vector representation X 'of the test points according to TF-IDF values of the test points of each participle in the test points'2;
Further, calculating a comprehensive text similarity based on the semantic similarity and the vector space similarity, including:
according to the formulaCalculating the comprehensive text similarity, wherein mu represents the weight of the semantic structure similarity,and weight representing spatial structure similarity.
Further, the first threshold is obtained by:
respectively obtaining a positive sample and a negative sample, wherein the positive sample is two similar texts, and the negative sample is two dissimilar texts;
respectively performing word segmentation on the positive sample and the negative sample, and calculating a TF-IDF value of each word segmentation;
respectively calculating the comprehensive text similarity of the positive sample and the negative sample based on the TF-IDF value;
and determining the first threshold value according to the comprehensive text similarity of the positive samples and the comprehensive text similarity of the negative samples.
Further, for each test point, judging whether the test point exists in a test requirement text of the function of the software to be tested by adopting a similarity matching algorithm, including:
respectively obtaining a word vector of each word in the test point and the test requirement text by adopting a word vector training model;
respectively constructing matrixes of the test points and each test item in the test requirement text based on the word vectors;
inputting the test points and each test item into a pre-trained neural network model to judge whether the test points and the test items are similar;
if the test point is not similar to each test item in the test requirement text, judging that the test point does not exist in the test requirement text of the software function to be tested, otherwise, judging that the test point exists in the test requirement text of the software function to be tested.
Further, the pre-trained neural network model is a convolutional neural network model; the convolutional neural network model includes:
the input layer is used for inputting a matrix of the test points and the test items;
the convolution layer is used for carrying out feature extraction and comprises 1 convolution kernel with the size of 2 x 2;
the pooling layer is used for reducing the dimension of the features extracted from the convolutional layer;
and the output layer is used for judging the similarity according to the characteristics subjected to the dimensionality reduction.
Compared with the prior art, the software test requirement mining method provided by the embodiment establishes the software failure mode knowledge base on the basis of the prior test experience base, calculates and detects whether the prior test points have the test requirement document of the function of the software to be tested through the similarity for the same function of the similar software, and pushes the test personnel with the test points and the corresponding failure modes if the prior test points do not have the test requirement document of the function of the software to be tested, so that the existing test process is guided, the repeated software test problem is avoided, the software test efficiency and accuracy are further improved, and the reliability and safety of the software are guaranteed.
On the other hand, the embodiment of the invention provides a software test requirement mining system, which comprises the following modules:
the fault mode knowledge base building module is used for acquiring fault description information of different types of software, and building a software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information, wherein the software fault mode knowledge base comprises software types, software functions, fault modes and test points;
and the test requirement mining module is used for searching a fault pattern and a test point corresponding to the function of the software to be tested in the software fault pattern knowledge base according to the software type and the software function of the software to be tested, judging whether the test point exists in a test requirement text of the function of the software to be tested or not by adopting a similarity matching algorithm for each test point corresponding to the function of the software to be tested, and pushing the test point and the corresponding fault pattern to a tester if the test point does not exist.
In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a flowchart of a software test requirement mining method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a software test requirement mining system according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a fault tree according to an embodiment of the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
The specific embodiment of the invention discloses a software test requirement mining method, as shown in fig. 1, comprising the following steps:
s1, acquiring fault description information of different types of software, and establishing a software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information, wherein the software fault mode knowledge base comprises software types, software functions, fault modes and test points;
s2, according to the software type and the software function of the software to be tested, the fault mode and the test point corresponding to the software function to be tested are searched in the software fault mode knowledge base, for each test point corresponding to the software function to be tested, whether the test point exists in the test requirement text of the software function to be tested is judged through a similarity matching algorithm, and if the test point does not exist, the test point and the corresponding fault mode are pushed to a tester.
Compared with the prior art, the software test requirement mining method provided by the embodiment establishes the software failure mode knowledge base on the basis of the prior test experience base, calculates and detects whether the prior test points have the test requirement document of the function of the software to be tested through the similarity for the same function of the similar software, and pushes the test points and the corresponding failure modes to test personnel if the prior test points do not have the test requirement document of the function of the software to be tested, so that the existing test process is guided, the repeated software test problem is avoided, the software test efficiency and accuracy are further improved, and the reliability and safety of the software are guaranteed.
Specifically, the failure mode of the software refers to the expression form of the software failure, and is a specific description of the external failure phenomenon of the software.
During implementation, a comprehensive failure mode knowledge base is constructed by acquiring software failure description information of different types of software. The test requirement text of the software function to be tested can be extracted from the software requirement specification.
Specifically, a software fault mode knowledge base is established by adopting a mean shift clustering-based fault tree analysis method based on the fault description information, and the method comprises the following steps of S11-S14:
and S11, converting all corresponding software fault description information into vector representation for each software function of each type of software.
During implementation, because the test points corresponding to different types and functions of software are different, the software fault mode knowledge base is established for the software with different types and different functions. For example, for the data transmission function of navigation software, all software fault description information corresponding to the data transmission function is collected and summarized into a text library, the text library is segmented, each segmented word can be coded by adopting one-hot coding, and a vector of the fault description information is constructed according to the composition condition of the segmented words in the fault description information. For example, the number of the participles is d, the first piece of fault description information includes participle 1, participle 5, participle 7, participle 10 and participle 11, and the corresponding vector is represented as a vector with d-dimension, 1 st, 5 th, 7 th, 10 th and 11 th positions being 1, and the rest positions being 0. All problem descriptions of the data transmission are thus mapped into the d-dimensional space as a group of data points in the d-dimensional space.
S12, clustering the fault description information represented by the vectors by adopting a mean shift clustering algorithm to obtain a software fault cluster;
because software faults are various and complex in types, fault analysis is not convenient to directly carry out, fault information needs to be clustered firstly, and fault tree analysis is carried out on the basis of clustering, so that the analysis efficiency is improved.
In the implementation process, the software types are different, the software functions are different, and the clustering results are different, namely the number of clusters is different.
Specifically, step S12 clusters the fault description information represented by the vector by using a mean shift clustering algorithm to obtain a software fault cluster, including:
s121, selecting any data point in a vector space of the fault description information as a central point;
s122, calculating the mass centers of all data points in the high-dimensional sphere with the central point as the sphere center and h as the radius;
the mean shift clustering is based on a sliding window algorithm, a sliding window can be established by taking the central point as the sphere center and h as the sphere radius, and the centroid of all data points in the sliding window is calculated.
In practice, for d-dimensional space RdN data points x in (1)iAnd i is 1, 2,.. times.n, and k data points are shared in a range with c as a central point and a high-dimensional sphere radius h. The centroid of the high-dimensional sphere is calculated by:
calculating the mean vector according to the formula:
according to the formula e ═ c + MhThe centroid e is calculated.
shA set of data points representing distances from the center point c smaller than the spherical radius h.
S123, if the distance between the centroid and the central point is smaller than a second threshold value or reaches the maximum iteration frequency, ending the iteration, and entering the step S124, wherein all data points in the current high-dimensional sphere range are in a cluster; otherwise, the centroid is taken as the central point, and the step S122 is returned;
and if the distance between the center of mass and the center point is less than a second threshold value or reaches the maximum iteration times, ending the iteration. Otherwise, the center point is continuously shifted until an iteration ending condition is reached, so that a cluster is formed.
In implementation, if the distance between the center point of the current cluster and the center of the other existing cluster is smaller than the third threshold, the two clusters are merged and classified into the same class. Otherwise, the current cluster is used as a new cluster, and one cluster is added.
In practice, the high-dimensional sphere radius, the second threshold and the third threshold may be set according to the clustering accuracy, for example, to 0.75, 7.5e-4, and 0.375, respectively.
S124, if all data points are classified, finishing clustering; otherwise, randomly selecting any data point which is not classified as a central point, and returning to the step S122.
If all the data points are classified, the clustering is finished, otherwise, any data point which is not classified is randomly selected as a central point, and the steps S122-S124 are carried out again until all the data points are classified.
After clustering is finished, if a data point belongs to a plurality of clusters, determining the cluster of the data point according to the accessed times, namely, the data point xiIf there are 5 times to be classified as a first cluster and 3 times to be classified as a second cluster, then add the data point xiClassified as a first cluster.
By adopting the mean shift clustering algorithm, the data points can be classified under the condition that a plurality of clusters are unknown, the inaccuracy caused by artificial calibration of clusters is eliminated, and the result is more accurate.
S13, performing fault tree analysis on the software fault cluster to form a software fault mode; establishing a test point corresponding to the fault mode according to the class cluster corresponding to the software fault mode;
and clustering the fault description information to obtain a fault cluster, and then performing fault tree analysis on the fault cluster.
The fault tree analysis method is used as a reliability and safety analysis technology, plays a great role in safety guarantee engineering and reliability measurement engineering, relates to the fields of aerospace, nuclear safety, chemical enterprises, large-scale manufacturing industry and the like, and aims to reduce accident risks or confirm the probability of occurrence of a certain safety accident or a specific dangerous failure event.
The conventional fault tree analysis method is a reasoning deduction failure analysis method from a top layer to a bottom layer, and is used for performing layer-by-layer tracking analysis from top to bottom aiming at a certain fault event (top event) and describing a fault logic causal relationship graphically. The method adopts a reverse fault tree analysis method, namely analysis is carried out from bottom to top, and the obtained cluster is used as a bottom event of the fault tree for fault tree analysis.
Taking the serial communication function of the navigation software as an example, after the fault description of the serial communication is subjected to a clustering algorithm, part of clusters are shown in table 1.
Table 1 partial cluster with abnormal serial communication function
The 4 clusters are used as bottom events to be analyzed, the cluster 1 and the cluster 2 are both start bit check related errors, and both can cause abnormal serial data start bit check, so that the cluster 1 and the cluster 2 are connected by adopting an OR gate to obtain a failure mode of 'not verifying the serial data start bit'. Similarly, the class cluster 3 and the class cluster 4 are both data transmission parameter related errors, so that the class cluster 3 and the class cluster 4 are connected by using an or gate to obtain a failure mode "correct data transmission parameter is not set", and if the two class clusters occur simultaneously (or have other logical relations), the two class clusters are connected by using an and gate (corresponding logical symbol), and the obtained failure tree is shown in fig. 3. The failure modes of 'verifying the serial port data start bit' and 'not setting correct data transmission parameters' are obtained through analysis.
And for the fault mode that the serial port data start bit is not verified, establishing a test point according to the corresponding fault cluster by the corresponding fault cluster 1 and the corresponding fault cluster 2, wherein each cluster corresponds to one test point. Specifically, the description language of the fault cluster can be converted in a positive and negative mode, so that a test point is established. For example, the description of the fault cluster is "the start bit is not determined, and it cannot be determined whether the current data starts to be received, so that the current frame data is received incorrectly", and thus the corresponding test point may be "the start bit of the data is determined".
S14, constructing a software failure mode knowledge base based on the failure mode and the test point corresponding to each software function of each type of software.
And summarizing the fault modes and the test points corresponding to different software functions of different types of software to form a software fault mode knowledge base. The software failure mode knowledge base comprises software types, software functions, failure modes and test points.
For the software function to be tested, according to the software type and the software function, a fault mode and a test point corresponding to the software function to be tested are searched in a software fault mode knowledge base, whether the test point exists in a test requirement text of the software function to be tested or not is judged for each test point corresponding to the software function to be tested by adopting a similarity matching algorithm, if not, the test point and the corresponding fault mode are pushed to a tester, so that the prior experience is used for guiding the current test, the software fault caused by missing of the test point is avoided, and the software test coverage rate and the test efficiency are improved.
Specifically, the step of judging whether the test point exists in a test requirement text of the function of the software to be tested by adopting a similarity matching algorithm comprises the following steps:
s21, performing word segmentation on the test requirement text of the software function to be tested, and calculating the TF-IDF value of each word segmentation on the test requirement text;
in implementation, the word segmentation can be performed by adopting the prior art, for example, a jieba word segmentation tool is adopted to perform word segmentation, stop words are removed according to a stop word dictionary, and a word segmentation set of the test requirement text is obtained.
And calculating the TF-IDF value of each word to the test requirement text according to the TF-IDF calculation formula.
TF (word frequency) ═ number of words appearing in test requirement text/number of words in test requirement text
IDF (inverse document frequency) log (corpus total number of documents/1 + number of documents in which the word appears)
TF-IDF=TF*IDF
S22, performing word segmentation on all test points corresponding to the functions of the software to be tested, and calculating TF-IDF values of each word segmentation on the test points;
and step S21, the word segmentation is carried out on the test points by the same method to obtain the word segmentation set of the test points.
And calculating the TF-IDF value of each word pair test point according to a TF-IDF calculation formula.
The number of the word appearing in the test point/the number of the test point words
IDF (inverse document frequency) log (corpus total number of documents/1 + number of documents in which the word appears)
TF-IDF=TF*IDF
Step S21 and step S22 select the same corpus.
S23, calculating the semantic structure similarity and the spatial structure similarity of each test item in the test requirement text and the test point corresponding to the function of the software to be tested based on the TF-IDF value;
specifically, step S23 includes:
s231, according to the formulaCalculating semantic structure similarity, wherein alphaiX representing common participles of test points and test itemsiThe TF-IDF value of the test point is determined; beta is a betaiX representing common participlesiThe TF-IDF value of the requirement document is tested.
The method comprises the steps of taking common participles of each test item of a test point and a test requirement text, and calculating the semantic structure similarity of the test point and each test item according to the TF-IDF values of the test point and the test requirement text of the common participles.
S232, constructing vector representation X 'of the test item according to TF-IDF value of each sub-word in the test item to the test requirement text'1;
When the method is implemented, the word segmentation sets of the test requirement text and the word segmentation sets of the test points are merged and sequenced to obtain the ordered word list.
For each test item, generating a corresponding vector representation according to the words forming the test item, for example, if a word list comprises m words, the dimension of the vector corresponding to the test item is m, if the word list comprises the ith participle, the ith dimension in the vector is the TF-IDF value of the participle to the test requirement text, otherwise, the ith dimension is 0, thereby constructing the vector representation X 'of the test item'1。
S233, constructing vector representation X 'of the test point according to TF-IDF value of each part word in the test point'2;
For each test point, generating a corresponding vector representation according to the words forming the test point, for example, if the word list contains m words, the dimension of the vector corresponding to the test point is m, if the word list contains the ith participle, the ith dimension in the vector is the TF-IDF value of the participle to the test point, otherwise, the vector representation is 0, thereby constructing the vector representation X 'of the test point'2。
According to the formulaSimilarity values of the test points and the test items on a vector space structure can be calculated.
S24, calculating the comprehensive text similarity based on the semantic structure similarity and the space structure similarity; if the test item with the comprehensive text similarity of the test point being greater than or equal to the first threshold exists in the test text, judging that the test point exists in the test requirement text of the software function to be tested, otherwise, judging that the test point does not exist in the test requirement text of the software function to be tested.
In particular, according to the formulaCalculating the comprehensive text similarity, wherein mu represents the weight of the semantic structure similarity,and weights representing spatial structural similarity.
For a test point, if each test item in the test requirement text is not similar to the test point, namely the comprehensive text similarity is smaller than a first threshold, the test point is judged not to be contained in the test requirement text, and then the test point and a corresponding fault mode are pushed to a tester for requirement mining, so that the test requirement coverage is more comprehensive and accurate.
The semantic structure similarity and the spatial structure similarity are combined for comprehensive measurement, so that the similarity is more accurate, test points existing in a recommended test requirement text are avoided, and the test requirement mining is more accurate.
Specifically, the first threshold is obtained by:
respectively obtaining a positive sample and a negative sample, wherein the positive sample is two similar texts, and the negative sample is two dissimilar texts;
respectively performing word segmentation on the positive sample and the negative sample, and calculating a TF-IDF value of each word segmentation;
respectively calculating the comprehensive text similarity of the positive sample and the negative sample based on the TF-IDF value;
and determining the first threshold value according to the comprehensive text similarity of the positive samples and the comprehensive text similarity of the negative samples.
In implementation, the first threshold may be calculated by acquiring positive and negative sample data in order to more accurately acquire the first threshold. The positive exemplars are two pieces of text labeled similarly, and the negative exemplars are two pieces of text labeled dissimilarly.
Respectively calculating the comprehensive similarity of the positive and negative samples according to the same method in the steps S21-S24, namely performing word segmentation on each sample and calculating the TF-IDF value of each word segmentation; in practice, the same corpus is used for the corpus and step S21.
Calculating a comprehensive text similarity of each sample according to the methods of steps S23 and S24;
and determining the first threshold value according to the comprehensive text similarity of the positive samples and the comprehensive text similarity of the negative samples.
And selecting a first threshold according to the calculated comprehensive similarity value of the positive sample and the negative sample, so that the positive sample and the negative sample can be obviously distinguished according to the threshold, and the first threshold is obtained.
In a specific embodiment of the present invention, for each test point, a similarity matching algorithm is used to determine whether the test point exists in a test requirement text of a function of software to be tested, and the following steps are adopted to implement:
s25, respectively obtaining a word vector of each word in the test point and the test requirement text by adopting a word vector training model;
during implementation, the existing Word segmentation tool can be used for segmenting words of the test points and the test requirement text, and a Word2vec Word vector training model is used for training to obtain a Word vector of each segmented Word in the test points and the test requirement text. The specific implementation process may refer to the prior art, and is not described herein again.
And S26, respectively constructing matrixes of the test points and each test item in the test requirement text based on the word vectors.
After the word vector of each word is obtained, a two-dimensional matrix representation of the test points can be constructed according to the word segmentation contained in the test points. Similarly, according to the word segmentation contained in each test item in the test requirement text, a two-dimensional matrix representation of each test item can be constructed.
S27, inputting the test points and each test item into a pre-trained neural network model to judge whether the test points and the test items are similar;
and inputting the two-dimensional matrix of the test points and the test items into the trained neural network model, and judging whether the test points and the test items are similar.
Specifically, a neural network model trained in advance adopts a convolutional neural network model; the convolutional neural network model includes:
the input layer is used for inputting a matrix of the test points and the test items;
the convolution layer is used for carrying out feature extraction and comprises 1 convolution kernel with the size of 2 x 2;
the pooling layer is used for reducing the dimension of the features extracted from the convolutional layer;
and the output layer is used for judging the similarity according to the characteristics subjected to the dimensionality reduction.
In practice, to prevent the step size of the receptive field from causing a boundary crossing, the convolution kernel size is 2 x 2. In order to improve the calculation efficiency, the number of convolution kernels is 1.
S28, if the test point is not similar to each test item in the test requirement text, judging that the test point does not exist in the test requirement text of the software function to be tested, otherwise, judging that the test point exists in the test requirement text of the software function to be tested.
The neural network model is adopted to judge the similarity between the test points and the test items in the test requirement text, so that the similarity judgment is quicker and more accurate, the recommendation of the test points existing in the test requirement text is avoided, and the test requirement mining is more accurate.
A specific embodiment of the present invention discloses a software test requirement mining system, as shown in fig. 2, including the following modules:
the fault mode knowledge base construction module is used for acquiring fault description information of different types of software, and establishing a software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information, wherein the software fault mode knowledge base comprises software types, software functions, fault modes and test points;
and the test requirement mining module is used for searching a fault pattern and a test point corresponding to the function of the software to be tested in the software fault pattern knowledge base according to the software type and the software function of the software to be tested, judging whether the test point exists in a test requirement text of the function of the software to be tested or not by adopting a similarity matching algorithm for each test point corresponding to the function of the software to be tested, and pushing the test point and the corresponding fault pattern to a tester if the test point does not exist.
The method embodiment and the system embodiment are based on the same principle, and related parts can be referenced mutually, and the same technical effect can be achieved. For a specific implementation process, reference is made to the foregoing embodiments, which are not described herein again.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (10)
1. A software test requirement mining method is characterized by comprising the following steps:
acquiring fault description information of different types of software, and establishing a software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information, wherein the software fault mode knowledge base comprises a software type, a software function, a fault mode and a test point;
according to the software type and the software function of the software to be tested, a fault mode and a test point corresponding to the software function to be tested are searched in a software fault mode knowledge base, for each test point corresponding to the software function to be tested, whether the test point exists in a test requirement text of the software function to be tested is judged by adopting a similarity matching algorithm, and if the test point does not exist, the test point and the corresponding fault mode are pushed to a tester.
2. The method of claim 1, wherein the step of building a software failure mode knowledge base based on the failure description information by using a mean shift clustering-based failure tree analysis method comprises:
for each software function of each type of software, converting all software fault description information corresponding to each type of software into vector representation;
clustering fault description information represented by the vectors by adopting a mean shift clustering algorithm to obtain a software fault cluster;
performing fault tree analysis on the software fault cluster to form a software fault mode; establishing a test point corresponding to the fault mode according to the class cluster corresponding to the software fault mode;
and constructing a software fault mode knowledge base based on the fault mode and the test point corresponding to each software function of each type of software.
3. The software test demand mining method of claim 2, wherein clustering fault description information represented by the vectors by using a mean shift clustering algorithm to obtain software fault clusters comprises:
s121, selecting any data point in a vector space of the fault description information as a central point;
s122, calculating the mass centers of all data points in the high-dimensional sphere with the central point as the sphere center and h as the radius;
s123, if the distance between the centroid and the central point is smaller than a second threshold value or reaches the maximum iteration frequency, ending the iteration, and entering the step S124, wherein all data points in the current high-dimensional sphere range are in a cluster; otherwise, the centroid is taken as the central point, and the step S122 is returned;
s124, if all data points are classified, finishing clustering; otherwise, randomly selecting any data point which is not classified as a central point, and returning to the step S122.
4. The method for mining software testing requirements according to claim 1, wherein for each test point corresponding to a software function to be tested, judging whether the test point exists in a testing requirement text of the software function to be tested by adopting a similarity matching algorithm comprises:
performing word segmentation on a test requirement text of a software function to be tested, and calculating a TF-IDF value of each word segmentation on the test requirement text;
performing word segmentation on all test points corresponding to the functions of the software to be tested, and calculating TF-IDF values of each word segmentation on the test points;
calculating the semantic structure similarity and the spatial structure similarity of each test item in the test requirement text and the test point corresponding to the function of the software to be tested based on the TF-IDF value;
calculating comprehensive text similarity based on the semantic structure similarity and the spatial structure similarity; if the test item with the comprehensive text similarity of the test point being greater than or equal to the first threshold exists in the test text, judging that the test point exists in the test requirement text of the software function to be tested, otherwise, judging that the test point does not exist in the test requirement text of the software function to be tested.
5. The method according to claim 4, wherein for each test point corresponding to a software function to be tested, calculating semantic structure similarity and spatial structure similarity between the test point and each test item in the test requirement text based on the TF-IDF value, comprises:
according to the formulaCalculating semantic structure similarity, wherein alphaiX representing common participles of test points and test itemsiThe TF-IDF value of the test point is determined; beta is aiX representing common participlesiTF-IDF values for the test requirements document;
constructing vector representation X 'of the test item according to TF-IDF value of each particleble in the test item to the test requirement text'1;
Constructing vector representation X 'of the test points according to TF-IDF values of the test points of each participle in the test points'2;
6. The software testing requirement mining method of claim 4, wherein calculating a comprehensive text similarity based on the semantic similarity and the vector space similarity comprises:
7. The software test requirement mining method according to claim 4, wherein the first threshold is obtained by:
respectively obtaining a positive sample and a negative sample, wherein the positive sample is two similar texts, and the negative sample is two dissimilar texts;
respectively performing word segmentation on the positive sample and the negative sample, and calculating a TF-IDF value of each word segmentation;
respectively calculating the comprehensive text similarity of the positive sample and the negative sample based on the TF-IDF value;
and determining the first threshold value according to the comprehensive text similarity of the positive samples and the comprehensive text similarity of the negative samples.
8. The method of claim 1, wherein for each test point, determining whether the test point exists in a test requirement text of a software function to be tested by using a similarity matching algorithm comprises:
respectively obtaining a word vector of each word in the test point and the test requirement text by adopting a word vector training model;
respectively constructing matrixes of the test points and each test item in the test requirement text based on the word vectors;
inputting the test points and each test item into a pre-trained neural network model to judge whether the test points and the test items are similar;
if the test point is not similar to each test item in the test requirement text, judging that the test point does not exist in the test requirement text of the function of the software to be tested, otherwise, judging that the test point exists in the test requirement text of the function of the software to be tested.
9. The software test requirement mining method of claim 1,
the pre-trained neural network model is a convolutional neural network model; the convolutional neural network model includes:
the input layer is used for inputting a matrix of the test points and the test items;
the convolution layer is used for carrying out feature extraction and comprises 1 convolution kernel with the size of 2 x 2;
the pooling layer is used for reducing the dimension of the features extracted from the convolutional layer;
and the output layer is used for judging the similarity according to the characteristics subjected to the dimensionality reduction.
10. A software testing requirement mining system is characterized by comprising the following modules:
the fault mode knowledge base construction module is used for acquiring fault description information of different types of software, and establishing a software fault mode knowledge base by adopting a fault tree analysis method based on mean shift clustering based on the fault description information, wherein the software fault mode knowledge base comprises software types, software functions, fault modes and test points;
and the test requirement mining module is used for searching a fault pattern and a test point corresponding to the function of the software to be tested in the software fault pattern knowledge base according to the software type and the software function of the software to be tested, judging whether the test point exists in a test requirement text of the function of the software to be tested or not by adopting a similarity matching algorithm for each test point corresponding to the function of the software to be tested, and pushing the test point and the corresponding fault pattern to a tester if the test point does not exist.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210103297.9A CN114490396B (en) | 2022-01-27 | 2022-01-27 | Software test requirement mining method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210103297.9A CN114490396B (en) | 2022-01-27 | 2022-01-27 | Software test requirement mining method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114490396A true CN114490396A (en) | 2022-05-13 |
CN114490396B CN114490396B (en) | 2023-05-05 |
Family
ID=81476413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210103297.9A Active CN114490396B (en) | 2022-01-27 | 2022-01-27 | Software test requirement mining method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114490396B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653562A (en) * | 2014-12-02 | 2016-06-08 | 阿里巴巴集团控股有限公司 | Calculation method and apparatus for correlation between text content and query request |
WO2017084267A1 (en) * | 2015-11-18 | 2017-05-26 | 乐视控股(北京)有限公司 | Method and device for keyphrase extraction |
CN108536677A (en) * | 2018-04-09 | 2018-09-14 | 北京信息科技大学 | A kind of patent text similarity calculating method |
CN109299462A (en) * | 2018-09-20 | 2019-02-01 | 武汉理工大学 | Short text similarity calculating method based on multidimensional convolution feature |
CN109582578A (en) * | 2018-11-29 | 2019-04-05 | 泰康保险集团股份有限公司 | System, method, computer-readable medium and the electronic equipment of software test case |
CN109948036A (en) * | 2017-11-15 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of calculation method and device segmenting lexical item weight |
CN111651666A (en) * | 2020-04-28 | 2020-09-11 | 中国平安财产保险股份有限公司 | User theme recommendation method and device, computer equipment and storage medium |
CN112463641A (en) * | 2020-12-16 | 2021-03-09 | 北京京航计算通讯研究所 | Fault mode set construction method and system for software defect checking |
CN113778894A (en) * | 2021-09-18 | 2021-12-10 | 平安国际智慧城市科技股份有限公司 | Test case construction method, device, equipment and storage medium |
-
2022
- 2022-01-27 CN CN202210103297.9A patent/CN114490396B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653562A (en) * | 2014-12-02 | 2016-06-08 | 阿里巴巴集团控股有限公司 | Calculation method and apparatus for correlation between text content and query request |
WO2017084267A1 (en) * | 2015-11-18 | 2017-05-26 | 乐视控股(北京)有限公司 | Method and device for keyphrase extraction |
CN109948036A (en) * | 2017-11-15 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of calculation method and device segmenting lexical item weight |
CN108536677A (en) * | 2018-04-09 | 2018-09-14 | 北京信息科技大学 | A kind of patent text similarity calculating method |
CN109299462A (en) * | 2018-09-20 | 2019-02-01 | 武汉理工大学 | Short text similarity calculating method based on multidimensional convolution feature |
CN109582578A (en) * | 2018-11-29 | 2019-04-05 | 泰康保险集团股份有限公司 | System, method, computer-readable medium and the electronic equipment of software test case |
CN111651666A (en) * | 2020-04-28 | 2020-09-11 | 中国平安财产保险股份有限公司 | User theme recommendation method and device, computer equipment and storage medium |
CN112463641A (en) * | 2020-12-16 | 2021-03-09 | 北京京航计算通讯研究所 | Fault mode set construction method and system for software defect checking |
CN113778894A (en) * | 2021-09-18 | 2021-12-10 | 平安国际智慧城市科技股份有限公司 | Test case construction method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
冯高磊;高嵩峰;: "基于向量空间模型结合语义的文本相似度算法" * |
Also Published As
Publication number | Publication date |
---|---|
CN114490396B (en) | 2023-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111598214A (en) | Cross-modal retrieval method based on graph convolution neural network | |
CN110928764A (en) | Automated mobile application crowdsourcing test report evaluation method and computer storage medium | |
CN113806482A (en) | Cross-modal retrieval method and device for video text, storage medium and equipment | |
CN110222347A (en) | A kind of detection method that digresses from the subject of writing a composition | |
CN114936158B (en) | Software defect positioning method based on graph convolution neural network | |
CN110633371A (en) | Log classification method and system | |
CN111506728B (en) | Hierarchical structure text automatic classification method based on HD-MSCNN | |
CN111581092A (en) | Method for generating simulation test data, computer device and storage medium | |
CN111427775A (en) | Method level defect positioning method based on Bert model | |
CN114154570A (en) | Sample screening method and system and neural network model training method | |
KR20160149050A (en) | Apparatus and method for selecting a pure play company by using text mining | |
Gupta et al. | Unsupervised self-training for sentiment analysis of code-switched data | |
CN115687609A (en) | Zero sample relation extraction method based on Prompt multi-template fusion | |
CN113672508B (en) | Simulink testing method based on risk strategy and diversity strategy | |
CN114706986A (en) | Multi-category emotion classification method and device and computer storage medium | |
CN107577738A (en) | A kind of FMECA method by SVM text mining processing datas | |
Chen et al. | An effective crowdsourced test report clustering model based on sentence embedding | |
CN114169439A (en) | Abnormal communication number identification method and device, electronic equipment and readable medium | |
Aman et al. | A comparative study of vectorization-based static test case prioritization methods | |
CN109783586B (en) | Water army comment detection method based on clustering resampling | |
CN114490396B (en) | Software test requirement mining method and system | |
CN113792141B (en) | Feature selection method based on covariance measurement factor | |
CN117077680A (en) | Question and answer intention recognition method and device | |
CN115098674A (en) | Method for generating confrontation network generation data based on cloud ERP supply chain ecosphere | |
CN114610882A (en) | Abnormal equipment code detection method and system based on electric power short text classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |