CN109144879B - Test analysis method and device - Google Patents

Test analysis method and device Download PDF

Info

Publication number
CN109144879B
CN109144879B CN201811018314.9A CN201811018314A CN109144879B CN 109144879 B CN109144879 B CN 109144879B CN 201811018314 A CN201811018314 A CN 201811018314A CN 109144879 B CN109144879 B CN 109144879B
Authority
CN
China
Prior art keywords
sample
keyword
code
test
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811018314.9A
Other languages
Chinese (zh)
Other versions
CN109144879A (en
Inventor
涂润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201811018314.9A priority Critical patent/CN109144879B/en
Publication of CN109144879A publication Critical patent/CN109144879A/en
Application granted granted Critical
Publication of CN109144879B publication Critical patent/CN109144879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a test analysis method and a test analysis device. The test analysis method comprises the following steps: acquiring a code object to be analyzed; performing word segmentation processing on the code object, and extracting a keyword sequence; and determining whether the code object needs to be subjected to code testing or not by utilizing the trained test analysis model based on the keyword sequence.

Description

Test analysis method and device
Technical Field
The present application relates to the field of software testing technologies, and in particular, to a test analysis method and apparatus.
Background
With the development of the internet, various kinds of software are widely used in various fields. In the links of software development and the like, software testing is an important link. Prior to performing a test operation such as a unit test or an interface test, the existing test scheme needs to manually determine whether to test a function according to a code.
Disclosure of Invention
The application provides a test analysis scheme, which can improve the analysis efficiency of code objects.
According to an aspect of the present application, there is provided a test analysis method including: acquiring a code object to be analyzed; performing word segmentation processing on the code object, and extracting a keyword sequence; and determining whether the code object needs to be subjected to code testing or not by utilizing the trained test analysis model based on the keyword sequence.
According to an aspect of the present application, there is provided a test analysis apparatus including: an object acquisition unit that acquires a code object to be analyzed; the keyword acquisition unit is used for performing word segmentation processing on the code object and extracting a keyword sequence; and determining whether the code object needs to be subjected to code testing or not by utilizing the trained test analysis model based on the keyword sequence.
According to an aspect of the application, there is provided a computing device comprising: one or more processors, memory, and one or more programs. One or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the test analysis methods of the present application.
According to an aspect of the present application, there is provided a storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the test analysis method of the present application.
In conclusion, according to the technical scheme of the application, the trouble of manually analyzing the code object can be avoided, and whether the code object is subjected to code test or not can be automatically analyzed through the test analysis model, so that the analysis efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1A illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;
FIG. 1B illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;
FIG. 2 illustrates a flow diagram of a test analysis method 200 according to some embodiments of the present application;
FIG. 3 illustrates a flow diagram of a test analysis method 300 according to some embodiments of the present application;
FIG. 4A illustrates a flow diagram for obtaining a first set of code objects according to some embodiments of the present application;
FIG. 4B illustrates an example of code for testing a code resource according to some embodiments of the present application;
FIG. 4C illustrates an example of code according to some embodiments of the present application;
FIG. 5 illustrates a flow diagram for obtaining a second set of code objects according to some embodiments of the present application;
FIG. 6A illustrates a flow diagram for obtaining a first feature matrix according to some embodiments of the present application;
FIG. 6B illustrates a schematic diagram of a first feature matrix according to some embodiments of the present application;
FIG. 7 illustrates a flow diagram for training a test analysis model according to some embodiments of the present application;
FIG. 8A illustrates a flow diagram for obtaining feature extraction results according to some embodiments of the present application;
FIG. 8B illustrates a schematic diagram of a second feature matrix according to some embodiments of the present application;
FIG. 8C illustrates a schematic diagram of a computational process of a recurrent neural network layer, according to some embodiments of the present application; and
FIG. 9 shows a schematic diagram of a test analysis device 900 according to some embodiments of the present application;
FIG. 10 shows a schematic view of a test analysis device 1000 according to some embodiments of the present application;
FIG. 11 illustrates a block diagram of the components of a computing device.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In some embodiments, before performing a software test, a tester needs to analyze each code object in the software code to screen out code objects to be tested and code objects not to be tested. Here, a code object refers to a testable functional module in software. In software code of different programming languages, code objects may be different types of code entities. For example, in the software code of C language, one code object may be one function. In the software code of the JAVA language, a code object may be a class. In the c + + language software code, a code object may be a class or a function. For simplicity of description, specific types of code objects are not distinguished below.
FIG. 1A illustrates a schematic diagram of an application scenario 100a, according to some embodiments of the present application.
As shown in FIG. 1A, computing device 102 may include software source code 104, test analysis model 106, and application 108. In some embodiments, the computing device 102 may include, but is not limited to, a palmtop computer, a wearable computing device, a Personal Digital Assistant (PDA), a tablet computer, a laptop computer, a desktop computer, a mobile phone, a smartphone, an Enhanced General Packet Radio Service (EGPRS) mobile phone, or a combination of any two or more of these or other data processing devices. In some embodiments, the computing device 102 may be a hardware-independent server device or virtual server or the like.
Software source code 104 may be program code written in one or more programming languages. The software source code 104 may include a program file that executes business logic and a test program file that tests a program entity that executes business logic. The application 108 may extract the code objects to be tested from the test program file. In addition, the application 108 may also extract a set of code objects in a program file that executes business logic. In this way, the application 108 may take the code objects that need to be tested as positive samples and may also take the code objects that do not need to be tested as negative samples. The set of positive and negative samples may be referred to as a sample set. On this basis, the application 108 may train the test analysis model with the sample set. In this way, the trained test analysis model is able to determine whether the code object to be analyzed requires a code test. Here, the test analysis model 106 can analyze the code object automatically, and can avoid the trouble of analyzing the code object by a human and improve the analysis efficiency.
FIG. 1B illustrates a schematic diagram of an application scenario 100B, according to some embodiments of the present application.
As shown in fig. 1B, computing device 102 may include a test analysis model 106, an application 108, and a sample set 110. Each sample in the sample set 110 includes a code object and a label indicating whether a code test is required for the code object. Application 108 may train test analysis model 106 using sample set 110. In this way, the trained test analysis model 106 may automatically analyze the code object.
Additionally, in some application scenarios, the functionality of the application 108 described above may be performed in a computing cluster. Here, a computing cluster may include a plurality of computing nodes. In this way, the operations performed by the application 108 may be performed in multiple compute nodes, thereby enabling efficient completion of the training process for the test analysis model 106 when the data volume of the sample set is large.
FIG. 2 shows a schematic diagram of a test analysis method 200 according to some embodiments of the present application. Here, the test analysis method 200 may be performed, for example, in the application 108. The application 108 may reside, for example, in the computing device 102 or in a computing cluster, which is not limited in this application.
As shown in fig. 2, in step S201, a code object to be analyzed is acquired.
In step S202, word segmentation processing is performed on the code object, and a keyword sequence is extracted. In some embodiments, the keyword sequence is extracted in a manner similar to the keyword sequence of the weighted sample in step S302, which is not repeated herein.
In step S203, it is determined whether the code object needs to be subjected to a code test using the trained test analysis model based on the keyword sequence. Here, the test analysis model may be various models capable of determining whether a code test is required, for example, the test analysis model trained in steps S301 to S303 in the method 300, which is not described herein again.
In summary, the method 200 can avoid the trouble of manually analyzing the code object, and can automatically analyze whether the code object is subjected to the code test through the test analysis model, thereby improving the analysis efficiency.
FIG. 3 shows a schematic diagram of a test analysis method 300 according to some embodiments of the present application. Here, the test analysis method 300 may be performed, for example, in the application 108. The application 108 may reside, for example, in the computing device 102 or in a computing cluster, which is not limited in this application.
As shown in fig. 3, in step S301, a sample set is acquired. Here, each sample in the sample set includes a code object and a label for describing whether to perform a code test on the code object.
In some embodiments, step S301 may be implemented as step S3011 and step S3012.
In step S3011, a first code object set that needs to be subjected to a code test in software is obtained, a first label indicating that the code test needs to be performed is added to each code object in the first code object, and the first code object set to which the first label is added is used as a positive sample subset in the sample set. In some embodiments, step S3011 may obtain the first set of code objects by method 400.
As shown in fig. 4A, in step S401, a test code resource of software is acquired. The following description will be made by taking software including C + + code as an example. First, step S401 may filter the code entity of the software to filter out all non-C + + file resources. For example, step S401 may filter out files with suffix names of. cpp and. h. Step S401 may then query the test code resources from the filtered remaining C + + file. For example, step S401 may acquire a file whose file name includes an identification character string "unity", and use the acquired file as a test code resource. It should be understood that, depending on the source of the test code resource, the step S401 may also acquire the test code resource in an acquisition manner corresponding to the source, which is not described herein again.
In step S402, a test case in the test code resource is determined according to the feature pattern of the test case. Illustratively, a test code resource may include one or more test cases. The name of the test case may follow certain specifications. The characteristic pattern of the test case is, for example, that the main function of the test case includes a character string "test". For example, step S402 may search the test code resource for a functional entity of the "Testxxxx ()" feature pattern. Each functional entity that conforms to a characteristic pattern may be a test case. It should be noted that, when the test case set of one software has multiple characteristic patterns, step S402 may acquire the test cases based on the multiple characteristic patterns. The manner in which the test cases are determined is more visually illustrated below in conjunction with FIG. 4B. FIG. 4B illustrates an example of a portion of code for testing a code resource. The feature pattern of "testatomiciinclusion ()" in the dashed box 401 in fig. 4B conforms to "Testxxxx ()". Therefore, step S401 can take the code entity of "testatomiciinclusion ()" as one test case.
In step S403, a code object is extracted from the determined test case, and the extracted code object is taken as a first code object set. In some embodiments, step S403 may analyze an execution step of a test case line by line, and query a function called by the test case according to a matching rule of function names. Here, a function called by a test case may be a code object. For example, "base:: subtitle:: NoBarrier _ AtomicIncrement ()" in the dashed box 402 of FIG. 4C conforms to the matching rule for function name. Step S403 may extract a code object based on the function name in the virtual box 402. In one embodiment, a code object may include a function name. In yet another embodiment, a code object may include a function name and function implementation code for the function.
In step S3012, a second set of code objects that do not need to be subjected to code testing in the software is obtained, a second label indicating that no code testing is needed is added to each code object in the second set of code objects, and the second set of code objects to which the second label is added is used as a negative sample subset in the sample set. In some embodiments, step S3012 may obtain the second code object by method 500.
As shown in fig. 5, in step S501, a code object is extracted from software, and a set of code objects is acquired. Here, step S501 may extract a code object from a source code resource of the software. Taking software containing C + + codes as an example, step S501 may query all function names according to the matching rule of the function names for code resources with non-C + + filtered out, so as to determine a set of code objects.
In step S502, a subset that does not belong to the first set of code objects is screened from the set of code objects and is taken as the second set of code objects.
It should be noted that, in order to improve the accuracy of the subsequent test analysis model, the step S301 may perform a further filtering operation on the sample set obtained through the steps S3011 and S3012 when the sample set is obtained. For example, the filtering operation is mainly to remove the function whose overloading function and name are overlapped in the sample set, so as to reduce the sample noise.
In step S302, for any sample in the sample set, word segmentation processing is performed on the code object in the sample, and a keyword sequence corresponding to the sample is extracted.
In some embodiments, the code object in each sample includes a function name. Step S302 may perform word segmentation processing on the function name of the code object in each sample, and obtain a keyword sequence corresponding to the sample. Here, the function name is typically in hump format. The character string of the function name is ambiguous as a whole. Through the word segmentation process, step S302 may extract a meaningful word (e.g., an english word, etc.) from the function name to extract a keyword.
In some embodiments, step S302 may be implemented by steps S3021 and S3022.
In step S3021, the object name and the function realization code of the code object in the sample are subjected to word segmentation processing, respectively, to obtain word segmentation results. In some embodiments, step S3021 may perform a word segmentation operation according to a Natural Language processing Toolkit (NLTK). In some embodiments, word segmentation is performed on the character tiles for more accuracy. Step S3021 may perform word segmentation processing on the character string based on the NLTK and the supplementary thesaurus. The supplemental thesaurus is, for example, a collection of proper nouns extracted from the programming grammar specification. For example, a complementary thesaurus is a collection of proper nouns extracted from the C + + grammar specification. When performing the word segmentation processing on the function implementation code (for example, a function body), step S3021 may segment the function implementation code into a plurality of labels (tokens) according to a special symbol (for example, a space, a semicolon, or the like). Then, step S3021 may continue to perform the word segmentation process on each tag.
In some embodiments, the word segmentation result of the function name may be expressed as: s _ fname ═ { t1, t 2.... tn } where S _ fname represents the word segmentation result of the function name. And tn is the nth word in the word segmentation result.
Similarly, the word segmentation result of the functional body can be expressed as: s _ fbody { [ t1, t2
S _ fbody represents the word segmentation result of the function body, and tm represents the mth word in the word segmentation result.
In step S3022, a keyword sequence is extracted from the word segmentation result according to the feature pattern of the keyword. Here, the feature pattern of the keyword may be determined based on a programming syntax specification, such as a regular expression pattern determined according to the programming syntax specification, and the like, which will not be described herein. Step S3022 may represent the characteristics of the code object by a keyword sequence.
In step S303, a test analysis model is trained using the keyword sequence of each sample in the sample set and the label of each sample. Here, the test analysis model is used to determine whether the code object it analyzes requires a code test. It should be noted that, in step S303, various machine learning models capable of being trained by using the keyword sequence may be adopted, which is not limited in the present application. In some embodiments, step S303 may be implemented by steps S3031 and S3032.
In step S3031, for each keyword sequence of any sample in the sample set, a first feature matrix corresponding to each keyword in the keyword sequence is generated. In some embodiments, step S3031 may be implemented as method 600.
As shown in fig. 6A, in step S601, each character in each keyword in the keyword sequence of the sample is mapped to one feature vector. For example, step S601 may map (embedding) each character in the keyword into one feature vector of 64 dimensions. Here, by mapping characters as feature vectors, step S601 can avoid the Out Of vocabularies (OOV for short) problem.
In step S602, a first feature matrix corresponding to each keyword is generated by using feature vectors corresponding to all characters in each keyword in the keyword sequence of the sample. For example, a keyword has 15 characters, each of which is a 64-dimensional feature vector. The first feature matrix of the keyword has a size of 64 × 15. The first feature matrix is obtained more visually as described below with reference to fig. 6B. The keyword sequence of a sample may include, for example, "check constructor result". As shown in FIG. 6B, reference numeral 601 represents a mapping table of characters and feature vectors. The method 600 may determine a feature vector of each character in the "check constructor result" by querying the mapping table 601 in step S601. In step S602, the method 600 may generate a first feature matrix 602 corresponding to the keyword "check", a first feature matrix 603 corresponding to the keyword "constractor", and a first feature matrix 604 corresponding to the keyword "result" through the feature vector of each character.
In summary, the method 600 may determine the first feature matrix of each keyword by mapping characters into feature vectors. Thus, by converting the keyword into the first feature matrix, the method 600 can avoid the problem that the topological similarity is difficult to calculate between different keyword sequences, and is very convenient to further extract feature information from the first feature matrix (i.e., the operation of step S3032) and train the test analysis model.
In step S3032, a Deep Neural Network (Deep Neural Network, abbreviated as DNN) model is trained using the first feature matrix corresponding to each keyword in the keyword sequence of each sample in the sample set and the label of each sample, and the trained Deep Neural Network model is used as a test analysis model. Here, the deep Neural Network model is, for example, a Recurrent Neural Network (RNN) model, but is not limited thereto.
In some embodiments, step S3032 may be implemented as method 700.
As shown in fig. 7, in step S701, for the first feature matrix corresponding to all the keywords in the keyword sequence of any sample in the sample set, a feature extraction network of the deep neural network model is used to perform a feature extraction operation on the first feature matrix corresponding to all the keywords in the keyword sequence of the sample, so as to obtain a feature extraction result. Here, the feature extraction network may include a single-layer or multi-layer neural network, and may be configured to perform feature extraction on the first feature matrix of each keyword in the keyword sequence. In some embodiments, the feature extraction network may include, but is not limited to, a convolutional layer, a pooling layer, a high speed network (high speed network) layer, a recurrent neural network layer, and a fully-connected layer.
In some embodiments, step S701 may be implemented as method 800.
As shown in fig. 8A, in step S801, for a first feature matrix corresponding to any keyword in the keyword sequence of the sample, a convolution operation is performed on the first matrix corresponding to the keyword by using a plurality of convolution kernels at convolution layers of the feature extraction network, and a second feature matrix corresponding to each convolution kernel is obtained. Here, the size of each convolution kernel may be configured as needed, and is not described in detail here. The process of acquiring the second diagnostic matrix is described below with reference to fig. 8B. As shown in fig. 8B, for the first feature matrix 602 corresponding to "check", step S801 may perform feature extraction on the first feature matrix 602 by using convolution kernels 801, 802, and 803, respectively, to obtain second feature matrices 804, 805, and 806.
In step S802, in a pooling layer of the feature extraction network, a pooling operation is performed on the second feature matrix corresponding to each convolution kernel of the plurality of convolution kernels, so as to obtain a third feature matrix corresponding to the keyword. Here, the pooling operation may employ various pooling means. In some embodiments, the pooling operation employs a max-pooling (max-pooling) approach. When the maximum pooling mode is adopted, the third feature matrix is a feature vector. Taking the keyword "check" in fig. 8B as an example, step S802 may pool all the second feature matrices of the keyword "check" into one feature vector 807. Here, the dimension of the feature vector coincides with the number of the second feature matrices. In short, step S802 may perform a pooling operation on all the second feature matrices of the keywords, and then splice all the pooled results into a third feature matrix.
In step S803, the third feature matrix of the keyword is processed by the high-speed network layer of the feature extraction network to obtain a fourth feature matrix corresponding to the keyword. Specifically, at the high-speed network layer, step S803 may acquire the fourth feature matrix according to the following formula.
z=t⊙g(WHy+bH)+(1-t)⊙y
Wherein z denotes a fourth feature matrix, t denotes a trainable parameter threshold, WHRepresenting a parameter matrix, g () representing an activation function, bHDenotes an offset amount, and y denotes a third feature matrix. In this way, step S803 may determine whether to use the third feature matrix as a direct input for a subsequent recurrent neural network layer through the parameter threshold t. For example, when t is equal to 1, (1-t) | y is zero, indicating that the third feature matrix is not used as a direct input to the recurrent neural network layer. In addition, when t is 0, the third feature matrix can be directly used as an input of the recurrent neural network layer, so that the forward propagation speed of the deep learning network model is accelerated.
In step S804, in the recurrent neural network layer of the feature extraction network, feature extraction is sequentially performed on the fourth feature matrix corresponding to each keyword in the keyword sequence of the sample according to the order of the keywords in the keyword sequence of the sample, so as to obtain a fifth feature matrix corresponding to each keyword. In some embodiments, the neurons of the Recurrent neural network layer are Gated Recurrent Units (GRUs). In some embodiments, the neurons of the recurrent neural network layer are Long Short-Term Memory networks (LSTM). Step S804 may sequentially perform feature extraction on the fourth feature matrix corresponding to each keyword in the keyword sequence of the sample based on the long-term and short-term memory network to obtain a fifth feature matrix corresponding to each keyword. For example, fig. 8C illustrates a computational process diagram of a recurrent neural network layer, according to some embodiments of the present application. Taking three keywords of "check", "constractor" and "result" in the keyword sequence as an example, step S804 may sequentially calculate outputs (i.e., a fifth feature matrix) corresponding to the three keywords. The fourth feature matrix of these three keywords is 808, 809, and 810 in order. Matrix 808 may serve as an input to the neurons of the RNN, which may output a fifth feature matrix 811 of "check". Similarly, the neurons of the RNN may output a fifth feature matrix 812 corresponding to "constractor" by taking matrix 809 as an input. With the matrix 810 as input, the neurons of the RNN can output a fifth feature matrix 813 corresponding to "result".
In step S805, feature extraction is performed on the fifth feature matrix corresponding to all the keywords in the keyword sequence of the sample at the full connection layer of the feature extraction network, so as to obtain a feature extraction result. In some embodiments, step S805 may be implemented by steps S8051 and S8052.
In step S8051, a ratio of the serial number corresponding to each keyword in the keyword sequence of the sample to the total number of words in the keyword sequence of the sample is calculated. For example, step S8051 may calculate the ratio according to the following manner:
k is level _ number/all _ level _ number, where level _ number indicates the number of the keyword (which is consistent with the number of sub-layers in the recurrent neural network layer). The all _ level _ number represents the total number of words (consistent with the total number of sub-layers in the recurrent neural network layer).
In step S8052, the product of the fifth feature matrix of each keyword in the keyword sequence of the sample and the ratio corresponding to each keyword is accumulated, and the accumulated result is used as a feature extraction result.
In step S702, the feature extraction result is input into the classification layer of the deep neural network model to obtain an analysis result. Here, the classification layer may perform a classification operation using a flexible max (softmax) classifier, for example.
In step S703, parameters of the deep neural network model are updated according to the label of the sample and the analysis result. In some embodiments, step S703 may be implemented by steps S7031 and S7032.
In step S7031, an error of the analysis result is determined based on the label of the sample and the analysis result.
In step S7032, parameters of the deep neural network model are updated according to the error and the loss function of the classification layer. Here, the loss function is, for example, a loss function based on cross entropy, but is not limited thereto. Here, step S7032 may update the parameters of the model (i.e., train the test analysis model) based on a back propagation approach.
In step S304, a code object to be analyzed is acquired.
In step S305, word segmentation processing is performed on the code object, and a keyword sequence is extracted. In some embodiments, the extraction manner of the keyword sequence is similar to the keyword sequence of the weighted sample in step S302, and is not repeated here.
In step S306, it is determined whether the code object needs to be subjected to a code test using the trained test analysis model based on the keyword sequence. In some embodiments, step S306 may generate a first feature matrix corresponding to each keyword in the keyword sequence respectively. Here, the manner of generating the first feature matrix in step S306 is similar to the manner of generating the first feature matrix corresponding to the sample in the foregoing, and is not described herein again. On the basis, step S306 may input the first feature matrix corresponding to each keyword into the trained test analysis model to output an analysis result indicating whether to perform the code test.
In step S307, when it is determined that the code object to be analyzed needs to perform a code test, according to the function type of the code object to be analyzed, a test case set of the function type is queried, and the queried test case set is used as an alternative test case set. Here, the function type is used to classify each code object. The range of function types may include, for example, network call types, business logic types, and so forth. Thus, through step S307, the method 300 may recommend alternative test cases for the user test code object to the user.
In summary, the method 300 may train the test analysis model using the sample set. On this basis, the method 300 may analyze the code object to be analyzed by using the trained model, so as to automatically analyze whether the code object needs to be tested, thereby improving the analysis efficiency.
Fig. 9 illustrates a schematic diagram of a test analysis device 900 according to some embodiments of the present application. The test analysis apparatus 900 may reside in the computing device 102 or in a computing cluster, for example, and is not limited in this application.
As shown in fig. 9, the method 900 may include an object acquisition unit 901, a keyword acquisition unit 902, and an analysis unit 903. Among them, the object acquisition unit 901 may acquire a code object to be analyzed. The keyword acquisition unit 902 may perform word segmentation processing on the code object and extract a keyword sequence.
The analysis unit 903 may determine whether the code object needs to be subjected to a code test using the trained test analysis model based on the keyword sequence. In some embodiments, the analysis unit 903 may generate a first feature matrix corresponding to each keyword in the keyword sequence respectively. On this basis, the analysis unit 903 may input the first feature matrix corresponding to each keyword into the trained test analysis model to output an analysis result indicating whether to perform a code test. More specific implementations of the apparatus 900 are consistent with the method 200 and will not be described here.
Fig. 10 illustrates a schematic diagram of a test analysis device 1000 according to some embodiments of the present application. The test analysis apparatus 1000 may reside in the computing device 102 or in a computing cluster, for example, and the application is not limited thereto.
As shown in fig. 10, the test analysis apparatus 1000 may include an object acquisition unit 1001, a keyword acquisition unit 1002, an analysis unit 1003, a sample acquisition unit 1004, a model training unit 1005, and a query unit 1006. In some embodiments, the object obtaining unit 1001, the keyword obtaining unit 1002, and the analyzing unit 1003 may implement operations consistent with the object obtaining unit 901, the keyword obtaining unit 902, and the analyzing unit 903, which are not described herein again.
In some embodiments, the sample acquisition unit 1004 may acquire a set of samples. Each sample in the sample set includes a code object and a label describing whether to code test the code object.
In some embodiments, the sample acquiring unit 1004 acquires a first code object set requiring code testing in software, adds a first tag indicating that code testing is required to each code object in the first code object, and takes the first code object set to which the first tag is added as a positive sample subset in the sample set. In some embodiments, to obtain a first set of code objects in the software that require code testing, the sample acquisition unit 1004 may acquire test code resources of the software. According to the feature pattern of the test case, the sample obtaining unit 1004 may determine the test case in the test code resource. The sample acquiring unit 1004 may extract a code object from the determined test case, and take the extracted code object as the first code object set.
In addition, the sample acquiring unit 1004 may acquire a second code object set that does not need to be subjected to code testing in the software, add a second label indicating that code testing is not needed to each code object in the second code object, and use the second code object set to which the second label is added as a negative sample subset in the sample set. In some embodiments, to obtain a second set of code objects in the software that do not require code testing, the sample acquisition unit 1004 may extract code objects from the software and obtain a set of code objects. On this basis, the sample acquisition unit 1004 may filter out a subset from the set of code objects that does not belong to the first set of code objects and treat the subset as the second set of code objects.
For any sample in the sample set, the keyword obtaining unit 1002 may further perform word segmentation on the code object in the sample, and extract a keyword sequence corresponding to the sample. In some embodiments, for any sample in the sample set, the keyword obtaining unit 1002 may perform word segmentation on the object name and the function implementation code of the code object in the sample, respectively, to obtain word segmentation results. The keyword acquisition unit 1002 may extract a keyword sequence from the word segmentation result according to the feature pattern of the keyword.
The model training unit 1005 may train the test analysis model using the keyword sequence of each sample in the sample set and the label of each sample. In some embodiments, for a keyword sequence of any sample in the sample set, the model training unit 1005 may generate a first feature matrix corresponding to each keyword of the keyword sequence, respectively. By using the first feature matrix corresponding to each keyword in the keyword sequence of each sample in the sample set and the label of each sample, the model training unit 1005 may train the deep neural network model, and use the trained deep neural network model as a test analysis model.
In some embodiments, to generate the first feature matrix corresponding to the sample, the model training unit 1005 may map each character in each keyword in the keyword sequence of the sample to one feature vector. And generating a first feature matrix corresponding to each keyword by using the feature vectors corresponding to all characters in each keyword in the keyword sequence of the sample.
In some embodiments, for any sample in the sample set, the model training unit 1005 may perform, by using a feature extraction network of the deep neural network model, a feature extraction operation on the first feature matrix corresponding to all keywords in the keyword sequence of the sample to obtain a feature extraction result.
In some embodiments, in order to obtain the feature extraction result, for a first feature matrix corresponding to any keyword in the keyword sequence of the sample, the model training unit 1005 may perform a convolution operation on the convolution layer of the feature extraction network by using a plurality of convolution kernels to obtain a second feature matrix corresponding to each convolution kernel.
In a pooling layer of the feature extraction network, the model training unit 1005 may perform pooling operation on the second feature matrix corresponding to each of the plurality of convolution kernels to obtain a third feature matrix corresponding to the keyword.
Model training section 1005 processes the third feature matrix using the high-speed network layer of the feature extraction network to obtain a fourth feature matrix corresponding to the keyword.
In a recurrent neural network layer of the feature extraction network, the model training unit 1005 sequentially performs feature extraction on the fourth feature matrix corresponding to each keyword in the keyword sequence of the sample according to the sequence of the keywords in the keyword sequence of the sample to obtain a fifth feature matrix corresponding to each keyword. In some embodiments, the neurons of the recurrent neural network layer are long-short term memory networks. Based on the long and short term memory network, the model training unit 1005 may sequentially perform feature extraction on the fourth feature matrix corresponding to each keyword in the keyword sequence of the sample to obtain a fifth feature matrix corresponding to each keyword.
In the fully connected layer of the feature extraction network, the model training unit 1005 performs feature extraction on the fifth feature matrices corresponding to all the keywords in the keyword sequence of the sample to obtain a feature extraction result. In some embodiments, to obtain the feature extraction result, the model training unit 1005 may calculate a ratio of a sequence number corresponding to each keyword in the keyword sequence of the sample to the total number of words in the keyword sequence of the sample. And accumulating the product of the fifth feature matrix of each keyword in the keyword sequence of the sample and the ratio corresponding to each keyword, and taking the accumulated result as a feature extraction result.
In addition, the model training unit 1005 may input the feature extraction result to the classification layer of the deep neural network model to obtain an analysis result.
In addition, the model training unit 1005 may update the parameters of the deep neural network model according to the label of the sample and the analysis result. In some embodiments, based on the labels of the samples and the analysis results, model training unit 1005 may determine an error in the analysis results.
Based on the loss functions of the error and classification layers, model training unit 1005 may update parameters of the deep neural network model.
When the analysis unit 1003 determines that the code object to be analyzed needs to perform a code test, the query unit 1006 queries a test case set of the function type according to the function type of the code object to be analyzed, and takes the queried test case set as an alternative test case set. Here, more specific implementations of the apparatus 1000 are consistent with the method 300 and will not be described here.
FIG. 11 illustrates a block diagram of the components of a computing device. As shown in fig. 11, the computing device includes one or more processors (CPUs) 1102, a communications module 1104, a memory 1106, a user interface 1110, and a communications bus 1108 for interconnecting these components.
The processor 1102 may receive and transmit data via the communication module 1104 to enable network communications and/or local communications.
The user interface 1110 includes one or more output devices 1112, including one or more speakers and/or one or more visual displays. The user interface 1110 also includes one or more input devices 1014. The user interface 1110 may receive, for example, an instruction of a remote controller, but is not limited thereto.
Memory 1106 may be high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The memory 1106 stores a set of instructions executable by the processor 1102, including:
an operating system 1116, including programs for handling various basic system services and for performing hardware-related tasks;
the application 1118, including various programs for implementing the test analysis method described above, may include, for example, the test analysis apparatus 900 shown in fig. 9 or the test analysis apparatus 1000 shown in fig. 10.
In addition, each of the embodiments of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that a data processing program constitutes the present application.
Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present application. The storage medium may use any type of recording means, such as a paper storage medium (e.g., paper tape, etc.), a magnetic storage medium (e.g., a flexible disk, a hard disk, a flash memory, etc.), an optical storage medium (e.g., a CD-ROM, etc.), a magneto-optical storage medium (e.g., an MO, etc.), and the like.
The present application therefore also discloses a non-volatile storage medium in which a data processing program is stored, the data processing program being adapted to perform any one of the embodiments of the test analysis method described above in the present application.
In addition, the method steps described in this application may be implemented by hardware, for example, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers, embedded microcontrollers, and the like, in addition to data processing programs. Such hardware capable of implementing the methods described herein may also constitute the present application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (17)

1. A method of test analysis, comprising:
acquiring a code object to be analyzed;
performing word segmentation processing on the code object, and extracting a keyword sequence; and
determining whether the code object needs to be subjected to code testing or not by utilizing a trained test analysis model based on the keyword sequence;
wherein the method further comprises:
obtaining a sample set, wherein each sample in the sample set comprises a code object and a label for describing whether to perform code test on the code object;
for any sample in the sample set, performing word segmentation processing on a code object in the sample, and extracting a keyword sequence corresponding to the sample; mapping each character in each keyword in the keyword sequence of the sample into a feature vector respectively; generating a first feature matrix corresponding to each keyword by using the feature vectors corresponding to all characters in each keyword in the keyword sequence of the sample;
and training a deep neural network model by using the first feature matrix corresponding to each keyword in the keyword sequence of each sample in the sample set and the label of each sample, and taking the trained deep neural network model as the test analysis model.
2. The method of claim 1, wherein said determining whether the code object requires code testing using a trained test analysis model based on the keyword sequence comprises:
respectively generating a first feature matrix corresponding to each keyword in the keyword sequence;
and inputting the first feature matrix corresponding to each keyword into the trained test analysis model to output an analysis result indicating whether the code test is performed or not.
3. The method of claim 1, wherein the code object refers to a functional module testable in software.
4. The method of claim 1, wherein said obtaining a set of samples comprises:
acquiring a first code object set which needs to be subjected to code testing in software, adding a first label which represents that the code testing needs to be performed to each code object in the first code object, and taking the first code object set added with the first label as a positive sample subset in the sample set;
and acquiring a second code object set which does not need to be subjected to code testing in the software, adding a second label representing that the code testing does not need to be performed to each code object in the second code object, and taking the second code object set added with the second label as a negative sample subset in the sample set.
5. The method of claim 4, wherein,
the acquiring a first code object set which needs to be subjected to code testing in software comprises the following steps:
acquiring a test code resource of the software;
determining a test case in the test code resource according to the characteristic mode of the test case;
extracting code objects from the determined test cases, and taking the extracted code objects as the first code object set;
the acquiring a second code object set which does not need to be subjected to code testing in the software comprises the following steps:
extracting code objects from the software to obtain a set of code objects;
a subset is selected from the set of code objects that does not belong to the first set of code objects and is considered as the second set of code objects.
6. The method of claim 1, wherein for any sample in the sample set, performing word segmentation on the code object in the sample and extracting a keyword sequence corresponding to the sample comprises:
respectively performing word segmentation on the object name and the function realization code of the code object in the sample to obtain word segmentation results;
and extracting the keyword sequence from the word segmentation result according to the characteristic mode of the keyword.
7. The method of claim 5, wherein prior to said determining test cases in said test code resources according to their characteristic patterns, the method further comprises:
and filtering the test code resources.
8. The method of claim 5, wherein said extracting code objects from said determined test cases comprises:
and analyzing the execution steps of the test case line by line, and inquiring the function called by the test case according to the matching rule of the function name.
9. The method of claim 1, wherein the training of the deep neural network model using the first feature matrix corresponding to each keyword in the keyword sequence of each sample in the sample set and the label of each sample comprises:
for any sample in the sample set, utilizing the feature extraction network of the deep neural network model to perform feature extraction operation on the first feature matrix corresponding to all keywords in the keyword sequence of the sample to obtain a feature extraction result;
inputting the feature extraction result into a classification layer of the deep neural network model to obtain an analysis result;
and updating the parameters of the deep neural network model according to the label of the sample and the analysis result.
10. The method of claim 9, wherein said updating parameters of said deep neural network model based on said sample's label and said analysis results comprises:
determining an error of the analysis result according to the label of the sample and the analysis result;
and updating parameters of the deep neural network model according to the error and the loss function of the classification layer.
11. The method of claim 9, wherein the performing, by using the feature extraction network of the deep neural network model, a feature extraction operation on the first feature matrix corresponding to all keywords in the keyword sequence of the sample to obtain a feature extraction result for any sample in the sample set comprises:
performing convolution operation on the first characteristic matrix corresponding to any keyword in the keyword sequence of the sample on the convolution layer of the characteristic extraction network by using a plurality of convolution cores to obtain a second characteristic matrix corresponding to each convolution core;
performing pooling operation on the second feature matrix corresponding to each convolution kernel in the plurality of convolution kernels in a pooling layer of the feature extraction network to obtain a third feature matrix corresponding to the keyword;
processing the third feature matrix by utilizing a high-speed network layer of the feature extraction network to obtain a fourth feature matrix corresponding to the keyword;
in a recurrent neural network layer of the feature extraction network, sequentially performing feature extraction on the fourth feature matrix corresponding to each keyword in the keyword sequence of the sample according to the sequence of the keywords in the keyword sequence of the sample to obtain a fifth feature matrix corresponding to each keyword;
and performing feature extraction on the fifth feature matrix corresponding to all the keywords in the keyword sequence of the sample at a full connection layer of the feature extraction network to obtain a feature extraction result.
12. The method of claim 11, wherein the neurons of the recurrent neural network layer are long-short term memory networks; the method for extracting the features of the sample keyword sequence includes the following steps that in a recurrent neural network layer of the feature extraction network, according to the sequence of the keywords in the sample keyword sequence, feature extraction is sequentially performed on a fourth feature matrix corresponding to each keyword in the sample keyword sequence to obtain a fifth feature matrix corresponding to each keyword, and the method includes the following steps:
and sequentially extracting the features of the fourth feature matrix corresponding to each keyword in the keyword sequence of the sample based on the long-term and short-term memory network to obtain a fifth feature matrix corresponding to each keyword.
13. The method of claim 11, wherein said performing, at a fully connected layer of the feature extraction network, feature extraction on a fifth feature matrix corresponding to all keywords in the keyword sequence of the sample to obtain the feature extraction result comprises:
calculating the ratio of the serial number corresponding to each keyword in the keyword sequence of the sample to the total word number of the keyword sequence of the sample;
and accumulating the product of the fifth feature matrix of each keyword in the keyword sequence of the sample and the ratio corresponding to each keyword, and taking the accumulated result as the feature extraction result.
14. The method of claim 1, further comprising:
when it is determined that the code object to be analyzed needs to be subjected to code testing, according to the function type of the code object to be analyzed, a test case set of the function type is queried, and the queried test case set is used as an alternative test case set.
15. A test analysis device, comprising:
an object acquisition unit that acquires a code object to be analyzed;
the keyword acquisition unit is used for performing word segmentation processing on the code object and extracting a keyword sequence; and
the analysis unit is used for determining whether the code object needs to be subjected to code test or not by utilizing the trained test analysis model based on the keyword sequence;
the analysis unit is further configured to obtain a sample set, where each sample in the sample set includes a code object and a label for describing whether to perform a code test on the code object; for any sample in the sample set, performing word segmentation processing on a code object in the sample, and extracting a keyword sequence corresponding to the sample; mapping each character in each keyword in the keyword sequence of the sample into a feature vector respectively; generating a first feature matrix corresponding to each keyword by using the feature vectors corresponding to all characters in each keyword in the keyword sequence of the sample; and training a deep neural network model by using the first feature matrix corresponding to each keyword in the keyword sequence of each sample in the sample set and the label of each sample, and taking the trained deep neural network model as the test analysis model.
16. A computing device comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, implement the method of any of claims 1 to 14.
17. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by at least one processor, implement the method of any one of claims 1 to 14.
CN201811018314.9A 2018-09-03 2018-09-03 Test analysis method and device Active CN109144879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811018314.9A CN109144879B (en) 2018-09-03 2018-09-03 Test analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811018314.9A CN109144879B (en) 2018-09-03 2018-09-03 Test analysis method and device

Publications (2)

Publication Number Publication Date
CN109144879A CN109144879A (en) 2019-01-04
CN109144879B true CN109144879B (en) 2020-12-18

Family

ID=64826263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811018314.9A Active CN109144879B (en) 2018-09-03 2018-09-03 Test analysis method and device

Country Status (1)

Country Link
CN (1) CN109144879B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11853196B1 (en) 2019-09-27 2023-12-26 Allstate Insurance Company Artificial intelligence driven testing
CN113760690A (en) * 2020-06-05 2021-12-07 腾讯科技(深圳)有限公司 Method and device for analyzing program interface and computer equipment
CN112416782A (en) * 2020-11-25 2021-02-26 上海信联信息发展股份有限公司 Test result verification method and device and electronic equipment
CN116578500B (en) * 2023-07-14 2023-09-26 安徽华云安科技有限公司 Method, device and equipment for testing codes based on reinforcement learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224463B (en) * 2015-10-28 2018-02-02 南京大学 A kind of software defect Code location method based on crash stack data
US9721097B1 (en) * 2016-07-21 2017-08-01 Cylance Inc. Neural attention mechanisms for malware analysis
CN107256357B (en) * 2017-04-18 2020-05-15 北京交通大学 Detection and analysis method for android malicious application based on deep learning
CN107885999B (en) * 2017-11-08 2019-12-24 华中科技大学 Vulnerability detection method and system based on deep learning

Also Published As

Publication number Publication date
CN109144879A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109697162B (en) Software defect automatic detection method based on open source code library
US10417350B1 (en) Artificial intelligence system for automated adaptation of text-based classification models for multiple languages
CN109408526B (en) SQL sentence generation method, device, computer equipment and storage medium
CN109144879B (en) Test analysis method and device
US10705795B2 (en) Duplicate and similar bug report detection and retrieval using neural networks
US11966389B2 (en) Natural language to structured query generation via paraphrasing
US20220100963A1 (en) Event extraction from documents with co-reference
EP4006909A1 (en) Method, apparatus and device for quality control and storage medium
CN111507086A (en) Automatic discovery of translation text location in localized applications
CN110874536B (en) Corpus quality evaluation model generation method and double-sentence pair inter-translation quality evaluation method
CN111194401B (en) Abstraction and portability of intent recognition
CN112036162A (en) Text error correction adaptation method and device, electronic equipment and storage medium
US20220100772A1 (en) Context-sensitive linking of entities to private databases
CN114385780B (en) Program interface information recommendation method and device, electronic equipment and readable medium
CN110162771A (en) The recognition methods of event trigger word, device, electronic equipment
CN112307048B (en) Semantic matching model training method, matching method, device, equipment and storage medium
CN113778864A (en) Test case generation method and device, electronic equipment and storage medium
CN113821616A (en) Domain-adaptive slot filling method, device, equipment and storage medium
CN110968664A (en) Document retrieval method, device, equipment and medium
WO2022072237A1 (en) Lifecycle management for customized natural language processing
JP2022003544A (en) Method for increasing field text, related device, and computer program product
US20220100967A1 (en) Lifecycle management for customized natural language processing
CN111783425B (en) Intention identification method based on syntactic analysis model and related device
US20230185778A1 (en) Method and system for scalable acceleration of data processing pipeline.
CN111460137A (en) Micro-service focus identification method, device and medium based on topic model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant