Disclosure of Invention
In order to overcome the problems in the related art, the specification provides test case generation and test methods, devices, equipment and systems.
According to a first aspect of embodiments of the present specification, there is provided a test case generation method, including:
acquiring a flow data set, wherein each flow data in the set comprises running data in actual running of a published version of a service module, each flow data comprises a plurality of groups of data pairs, and each group of data pairs consists of an attribute name and an attribute value;
converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words;
clustering the characteristic vectors of the flow data in the set to obtain classes with preset class numbers;
and screening specified number of flow data from each class, and constructing a test case of the service module by using the screened flow data, wherein the test case is used for testing the undistributed version of the service module.
In one embodiment, the determining of the feature vector comprises:
converting each piece of flow data into a word set formed by ordered words, wherein each data pair in the flow data is regarded as a word;
inputting a Word set corresponding to each piece of flow data and the occurrence frequency of data pairs regarded as words in the flow data set into a Word2Vector algorithm, and performing training sample construction and model training based on the Word2Vector algorithm to obtain Word vectors of each group of data pairs output by the trained model;
and (3) for the same flow data, performing summation and averaging processing on the data with the same dimensionality in the word vectors of all the data pairs contained in the same flow data, and taking the vector obtained by processing as a feature vector of the flow data.
In one embodiment, each set of data pairs in the Word set input to the Word2Vector algorithm is represented by a data pair number, and the numbering rule of the data pairs satisfies: the relationship between the numbers can represent the position relationship between the data pairs and the data pairs in the flow data, the numbers of different data pairs are different, and the numbers of the same data pairs are the same.
In one embodiment, the method further comprises:
collecting flow data related to a released version of a service module when the service module executes a service;
storing the flow data on a file server in a JSON format;
the determination process of the feature vector further comprises:
a data pair consisting of an attribute name and an attribute value is extracted from the traffic data in the JSON format.
In one embodiment, the traffic data includes a service request and associated data resulting from processing the service request by a published version of the service module, the method further comprising:
taking a service request in a test case as input, and acquiring related data obtained by processing the service request by an undistributed version of the service module;
and comparing the undistributed version and the published version in the same service module to respectively process the user request to obtain respective related data, and taking the comparison result as a basis for evaluating the undistributed version of the service module.
According to a second aspect of embodiments herein, there is provided a testing method, the method comprising:
performing regression testing on the unreleased version of the service module by using the test case;
the determination process of the test case comprises the following steps:
acquiring a flow data set, wherein each flow data in the set comprises running data in actual running of a published version of a service module, each flow data comprises a plurality of groups of data pairs, and each group of data pairs consists of an attribute name and an attribute value;
converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words;
clustering the characteristic vectors of the flow data in the set to obtain classes with preset class numbers;
and screening specified number of flow data from each class, and constructing a test case of the service module by using the screened flow data, wherein the test case is used for testing the undistributed version of the service module.
In one embodiment, the determining of the feature vector comprises:
converting each piece of flow data into a word set formed by ordered words, wherein each data pair in the flow data is regarded as a word;
inputting a Word set corresponding to each piece of flow data and the occurrence frequency of data pairs regarded as words in the flow data set into a Word2Vector algorithm, and performing training sample construction and model training based on the Word2Vector algorithm to obtain Word vectors of each group of data pairs output by the trained model;
and (3) for the same flow data, performing summation and averaging processing on the data with the same dimensionality in the word vectors of all the data pairs contained in the same flow data, and taking the vector obtained by processing as a feature vector of the flow data.
In one embodiment, each set of data pairs in the Word set input into the Word2Vector algorithm is represented by a data pair number, and the numbering rule of the data pairs satisfies: the relationship between the numbers can represent the position relationship between the data pairs and the data pairs in the flow data, the numbers of different data pairs are different, and the numbers of the same data pairs are the same.
According to a third aspect of embodiments of the present specification, there is provided a test case generation apparatus, including:
a data acquisition module to: acquiring a flow data set, wherein each flow data in the set comprises operation data in actual operation of a published version of a service module, each flow data comprises a plurality of groups of data pairs, and each group of data pairs comprises an attribute name and an attribute value;
a vector conversion module to: converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words;
a data clustering module to: clustering the characteristic vectors of the flow data in the set to obtain classes with preset class numbers;
a use case determination module to: and screening specified number of flow data from each class, and constructing a test case of the service module by using the screened flow data, wherein the test case is used for testing the undistributed version of the service module.
According to a fourth aspect of embodiments herein, there is provided a test apparatus, the apparatus comprising:
a regression testing module to: performing regression testing on the unreleased version of the service module by using the test case;
a use case construction module to: acquiring a flow data set, wherein each flow data in the set comprises operation data in actual operation of a published version of a service module, each flow data comprises a plurality of groups of data pairs, and each group of data pairs comprises an attribute name and an attribute value; converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words; clustering the characteristic vectors of the flow data in the set to obtain classes with preset class numbers; and screening specified number of flow data from each class, and constructing a test case of the service module by using the screened flow data, wherein the test case is used for testing the undistributed version of the service module.
According to a fifth aspect of the embodiments of the present specification, there is provided a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements any one of the test case generation/test methods described above when executing the program.
According to a sixth aspect of embodiments herein, there is provided a test system, the system comprising a flow collection module, a data analysis module, and a regression test module;
the flow acquisition module is used for: collecting flow data, wherein each flow data comprises operation data in actual operation of a published version of a service module, each flow data comprises a plurality of data pairs, and different data pairs correspond to different attribute names and attribute values;
the data analysis module is configured to: acquiring a flow data set; converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words; clustering the characteristic vectors of the flow data in the set to obtain classes with preset class numbers; screening a specified number of pieces of flow data from each class, and constructing a test case of the service module by using the screened flow data;
the regression testing module is configured to: and performing regression testing on the undistributed version of the business module by using the test case.
The technical scheme provided by the embodiment of the specification can have the following beneficial effects:
the embodiment of the specification can obtain a flow data set, each flow data in the set comprises operation data in actual operation of a published version of a service module, each flow data comprises a plurality of groups of data pairs, each group of data pairs comprises an attribute name and an attribute value, each flow data can be converted into a feature vector by using a word vectorization algorithm, the feature vectors of the flow data in the set are clustered, representative flow data in each class are screened out from a clustering result, and test cases are constructed, so that the number of the test cases is reduced, and the test efficiency and the test quality can be greatly improved when the undistributed version of the service module is played back by using the test cases.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.
In a computer programming language, a method may be an ordered combination of code designed to solve a problem, and one or more methods, when combined, may provide a class of services. The service entry method may be a method in which, in a method call relationship in source code, a method located at the outermost layer is not called by other methods in engineering, and the method may be used by a service initiator. The service entrance method has two modes for providing service to the outside, one mode is used as a front-side method for the synchronous calling of an external system, and the other mode is used as a message monitor for the asynchronous triggering of tasks by the external system in a message mode. The service module may be a module that provides a type of service and may also be referred to as a service module. One class of services may be implemented by one or more methods that have a call relationship. For example, an application may provide one or more classes of services, each of which may be implemented by a service module. One service module can realize one service, and a plurality of sub-services can also be realized according to different access parameters. Taking the case that the service module provides the transfer service, the transfer service can be subdivided into sub-services which are transferred to balance, sub-services which are transferred to bank cards and the like according to different access parameters.
Based on various requirements, the existing functions of the service module can be modified, or new functions can be added, old functions can be deleted, and the like. For example, the service module is taken as a module corresponding to a certain service in the application program, and the version number of the service module may be the version number of the application program. The service module with the old version number can be updated by the service module with the new version number so as to perfect the function of the service module. The service module of the new version number which is not released can be called as the non-released version of the service module; the released business module with the old version number can be called as the released version of the business module. Before the non-released version of the service module is released, the actually occurring flow data can be guided to a test system for regression testing. Where regression testing may refer to re-testing after the old code is modified to confirm that the modification did not introduce new errors or cause errors in other code. The traffic data may be operation data in actual operation of a published version including the service module, and may also be referred to as online traffic or online traffic data. For example, the running data in the method calling process corresponding to one online business operation (such as commodity ordering operation), for example, the running data such as the input, output, error stack of the business entry method, the content of the database read-write operation, the request to and return value of the peripheral system method, etc., may constitute a piece of flow data. The number of attributes in the flow data can affect the subsequent screening accuracy, the greater the number of attributes, that is, the more the data types, the higher the screening accuracy is, but the calculation amount is also correspondingly increased, so that the number of attributes can be set according to requirements.
The flow data is diverted to the test system for regression testing, which may also be referred to as flow replay. The flow playback may refer to a method of re-running the corresponding flow data once on a machine on which a new code to be released online is deployed, and the system may compare whether the running data of the new code is consistent with the flow data, and if there is a difference, it may indicate that the new code has a defect.
However, the applicant finds that for some service modules, such as a transfer service module, a payment service module, etc., due to the large user base and the high frequency of use, the flow data generated by the published version of the service module every day is very large, and if all real flow data is drained and tested before the published version of the service module, huge resources and time are consumed.
In view of this, an embodiment of the present disclosure provides a test case generation scheme and a test scheme, in which a flow data set may be obtained first, each flow data in the set includes operation data in actual operation of a published version of a service module, each flow data includes a plurality of sets of data pairs, each set of data pair includes an attribute name and an attribute value, each flow data may be converted into a feature vector by using a word vectorization algorithm, the feature vectors of the flow data in the set are clustered, representative flow data in each class is screened from a clustering result, and a test case is constructed, so that the number of the test cases is reduced, when an unpublished version of the service module is played back by using the test cases, playback efficiency may be greatly improved, meanwhile, a situation that test cases are easy to be omitted in various scenes due to human consideration is avoided, and a problem that scene coverage is incomplete in playback while collecting can also be avoided.
The embodiments of the present specification will be described below by way of example with reference to the accompanying drawings.
The test case generation method/test method provided in this embodiment may be implemented by software, or by a combination of software and hardware, or by hardware, where the related hardware may be composed of two or more physical entities, or may be composed of one physical entity. The method of the embodiment can be applied to electronic equipment with processing capability. Fig. 1A is a schematic structural diagram of a test system according to an exemplary embodiment of the present disclosure. The system may include a flow collection module 120, a data analysis module 140, and a regression test module 160. The traffic collection module 120 is configured to collect traffic data, where each piece of traffic data includes operation data in actual operation of a published version of a service module, each piece of traffic data includes a plurality of data pairs, and different data pairs correspond to different attribute names and attribute values. The data analysis module 140 is configured to obtain a traffic data set and construct a test case of the service module. The regression testing module 160 is used for performing regression testing on the unreleased version of the business module by using the test case. Fig. 1B is a diagram illustrating an application scenario of a test case generation method/test method according to an exemplary embodiment of the present specification. The release version of the service module may be set in the electronic device, and fig. 1A exemplifies a user terminal such as a smart phone, a desktop computer, and a tablet computer. The flow acquisition module is used for acquiring flow data. The data analysis module is used for acquiring flow data from the release version of the service module and constructing a test case of the service module. The regression testing module is used for carrying out regression testing on the unreleased version of the business module by using the test case. The traffic collection module, the data analysis module, and the regression test module may be disposed in the same device, or may be disposed in different devices, and fig. 1A illustrates an example in which the traffic collection module, the data analysis module, and the regression test module are disposed in different servers. The traffic collection module may be a traffic collection client installed in an online server, and the traffic collection module may obtain traffic data on each method line, for example, the traffic data may include method input, method output, error stack, sub-call, and the like.
Next, a test case generation method will be explained as an example.
As shown in fig. 2, it is a flowchart of a test case generation method shown in this specification according to an exemplary embodiment, where the method includes:
in step 202, a flow data set is obtained, wherein each flow data in the set comprises running data in actual running of a published version of a service module, each flow data comprises a plurality of groups of data pairs, and each group of data pairs comprises an attribute name and an attribute value;
in step 204, converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words;
in step 206, clustering the feature vectors of the flow data in the set to obtain classes with preset number of classes;
in step 208, a specified number of traffic data are screened from each class, and a test case of the service module is constructed by using the screened traffic data, where the test case is used to test an unpublished version of the service module.
The flow data set comprises a plurality of pieces of flow data, and each piece of flow data comprises operation data of the published version of the service module in actual operation. Much operational data may be involved in executing a service using a published version of the service module. Which operation data are specifically used as flow data can be set according to requirements. For example, traffic data may include input data, output data, error stacks, contents of database read and write operations, requested and returned values to peripheral system methods, and the like. The background can record the running data related to the business execution process of the published version of the business module, and is used for assisting the follow-up flow screening. Each piece of traffic data includes a plurality of sets of data pairs, and each set of data pairs may be composed of an attribute name and an attribute value.
Flow data may also be collected prior to obtaining the flow data set. The flow data in the flow data set may be collected during a specified period. In one embodiment, the traffic data collection may be implemented using dual engines, echo wall, and the like. After the traffic data is collected, the traffic data may be stored, for example, the traffic data may be stored on an OSS file server by default, and the amount of traffic data for each method is often more than 10W. In one example, the traffic data may be saved on a file server in JSON format. The traffic data in JSON format may be as follows:
in this example, the data pairs may be: "xx": xxxxxxx "," xxxxx ": xxxxxxxxxxx", "xxxx": xxxxxx "," xxxxx ": xxxxxxx", "xxxxx": xxxxxxxxx "," xxx ": xxxxxxxxxxx", or "xx": xxxx ". It is understood that x has specific content in practical applications, and that x is used to replace specific content for simplicity. Data before the colon is the attribute name and data after the colon is the attribute value. Since there is often other information in the traffic data than a data pair, for this reason, in one embodiment, the data pair may be extracted from the traffic data. And because the flow data in the JSON format is convenient for quickly extracting the data pairs, the flow data can be stored on the file server in the JSON format in the storage process. The determination process of the feature vector further comprises: a data pair consisting of an attribute name and an attribute value is extracted from the traffic data in the JSON format. For example, traffic data in the JSON format is exported from a file server and imported into an offline operation table; a data pair consisting of an attribute name and an attribute value is extracted from the flow data in the offline calculation table. If the derivation fails, the procedure may end. As one of the ways to extract the data pairs, the flow data in the JSON format in the offline operation table may be analyzed, and each piece of flow data is traversed as a tree to extract the data pairs corresponding to the leaf nodes in the tree. For example, the data pair may be stored as a Key-Value Key Value pair consisting of an attribute name and an attribute Value, and may be further separated by a fixed separator to form a word string. It can be seen that each flow is converted from the expression form of JSON format to the expression form of a word string, the word string may be a word set composed of ordered words, and each group of data pairs is regarded as a word.
After the traffic data set is obtained, each piece of traffic data may be converted into a feature vector by using a word vectorization algorithm, where each data pair is regarded as a word in the conversion process, and each piece of traffic data may be a word set composed of ordered words.
The word vector algorithm may be an algorithm for converting words into word vectors, and since the traffic data may be regarded as a set formed by sequential words, after the word vectors corresponding to the words are obtained, the feature vectors corresponding to the traffic data may be determined according to the word vectors.
In one embodiment, word vector processing may be performed on each piece of traffic data to convert each piece of traffic data into a feature vector, depending on whether the same data pair occurs in different pieces of traffic data. For example, a TF-IDF algorithm is used to convert each piece of traffic data into a feature vector. The TF-IDF (Term Frequency-Inverse Document Frequency algorithm) can be used for evaluating the importance of a word to a certain Document in a file set or a corpus, wherein the importance of the word is increased in proportion to the number of times the word appears in the file, but is reduced in Inverse proportion to the Frequency of the word appearing in the corpus. The more a word occurs in a document, the less it occurs in all other documents, and the more representative the content of the document is the word. In the conversion process, one group of data pairs can be regarded as a word, and the traffic data formed by the multiple groups of data pairs can be regarded as a text formed by ordered words, so that each piece of traffic data can be converted into a feature vector.
However, only considering whether the same data pair appears in different flow data, for example, the position of the data pair in the flow data and the number of times of appearance in the set do not reflect the relationship between the data pair and the data pair, and thus the relationship between the flow data and the flow data cannot be reflected, for this reason, in another embodiment, each flow data may be converted into a feature vector according to the position relationship between the data pair and the data pair in the flow data and the number of times of appearance of the data pair in the flow data set. The position relation of different data pairs can be determined according to the positions of the data pairs in the flow data, and meanwhile, the occurrence times of the data pairs in the set are used as evaluation factors, so that a feature vector which can be used for representing the correlation among different flow data can be obtained. For example, each data pair may be treated as a Word, each flow data comprises a set of words consisting of ordered words, and each flow data is converted into a feature Vector using the Word2Vector algorithm.
Wherein, word2Vector, which can also be called Word2Vec, converts words in the corpus into Word vectors, which can be called Word transformation Vector algorithm. The Word2Vector tool may contain two models: skip-gram model and continuous bag of words model (CBOW), and two efficient training methods: negative sampling (negative sampling) and sequence softmax (hierarchical softmax). The Word vectors obtained by adopting the Word2Vector algorithm can better express the similarity and analogy relationship among different words. After the word vectors of the words are obtained, the feature vectors corresponding to the flow data formed by the ordered words can be determined according to the word vectors. For example, for each piece of traffic data, a specified operation is performed on data of the same dimension in word vectors of all data pairs included in the piece of traffic data, and a vector obtained by processing is used as a feature vector of the piece of traffic data. For example, the determining process of the feature vector includes:
converting each piece of flow data into a word set formed by ordered words, wherein each data pair in the flow data is regarded as a word;
inputting a Word set corresponding to each flow data and the occurrence frequency of data pairs regarded as words in the flow data set into a Word2Vector algorithm, and performing training sample construction and model training based on the Word2Vector algorithm to obtain Word vectors of each group of data pairs output by the trained model;
for the same piece of traffic data, the sum and average processing may be performed on the data of the same dimension in the word vectors of all the data pairs included in the same piece of traffic data, and the vector obtained by the processing may be used as the feature vector of the traffic data.
In order to improve processing efficiency, each piece of flow data may be converted into a word set composed of ordered words, and each data pair in the flow data is regarded as a word.
After the Word set corresponding to each flow data and the occurrence frequency of the data pairs regarded as words in the flow data set are input into a Word2Vector algorithm, the Word2Vector algorithm can make each group of data pairs in the same flow data be regarded as a Word, the Word set is regarded as a text, a training sample is constructed according to the position relation between the data pairs in the flow data and the data pairs and the occurrence frequency of the data pairs regarded as words in the flow data set, model training is further carried out by using the training sample, a Word Vector model capable of converting the data pairs into Word vectors is obtained, and therefore the Word vectors corresponding to each data pair are output by using the Word Vector model.
In this embodiment, each data pair may be considered a word, and each traffic data is a set of ordered words. A Word2vec algorithm is adopted to train a neural network model, and a Word vector model capable of converting data pairs into Word vectors can be obtained.
In one example, a matrix Vector can be formed by Word sets corresponding to all traffic data in a traffic data set, the occurrence times of words are input into a Word2Vector algorithm together, training sample construction and model training are performed based on the Word2Vector algorithm, and a Word Vector of each group of data pairs output by a trained model is obtained.
Regarding the structure of the training sample set, in one example, for a word set corresponding to one traffic, the t-th word in the word set is represented as w (t), and the preceding and following 2 words can be represented as w (t-2), w (t-1), w (t + 1), and w (t + 2). After the neural network model training is completed, the four words of w (t-2), w (t-1), w (t + 1) and w (t + 2) are input, and then the associated word can be predicted to be w (t). Therefore, w (t-2), w (t-1), w (t + 1), w (t + 2) can be denoted as context (wt), and [ context (wt), wt ] can constitute a training sample. Aiming at the same word set, the training samples can be moved and constructed according to the preset moving step length, so that one or more training samples can be constructed by one word set.
To simplify the manner in which data pairs are expressed, in one embodiment, the data pairs may be represented by numbers, and the numbering rules satisfy: the relationship between the numbers can represent the position relationship between the data pairs and the data pairs in the flow data, the numbers of different data pairs are different, and the numbers of the same data pairs are the same. Each group of data pairs in the Word set input into the Word2Vector algorithm is represented by the number of the data pair, that is, when the Word set is input into the Word2Vector algorithm, the number of the data pair is input, and the specific content of the data pair does not need to be input. In this embodiment, the data pairs are represented by numbers, and the calculations can be simplified by ignoring what the actual content of the data pairs is. Illustratively, the method further comprises: all data pairs are numbered in the order in which they appear, with different data pairs numbered differently and the same data pairs numbered identically. The occurrence order of the data pairs may be determined by the arrangement order of the data pairs in the traffic data, and the order between the data pairs between different traffic data does not affect the training result, and in one example, the different traffic data may be arranged according to the order of the occurrence time. The numbers corresponding to different data pairs may be ordered numbers, e.g., consecutive numbers for all data pairs in the order in which they appear, then consecutive numbers for adjacent data pairs. In this example, the number is substituted for the data pair, and the amount of data can be greatly reduced.
For the same service module, a dictionary of the service module can be constructed by the flow data set, the dictionary can be composed of all data pairs of the flow data corresponding to the service module, and one group of data pairs is a word. Each piece of flow data includes a set of words formed by ordered words, and the dictionary may include a set of words formed by ordered words corresponding to all pieces of flow data. For example, all data pairs are numbered in the order in which they appear, and the number of the first appearing data pair may be 1. If there are no duplicate data pairs between 1 and N, the number of the nth occurring data pair may be N, and if there are duplicate data pairs between 1 and N, the same number is used for the same data pair. If, for example, it is assumed that there is a number repetition N times, the number of the data pair appearing N is (N-N + 1). The same data pair means that the attribute name and the attribute value are the same.
In this embodiment, all data pairs in the traffic data set are numbered, so that different data pairs in the same traffic data have different numbers, and different data pairs in different traffic data have different numbers. The same data pair numbers in the same flow data are the same, and the same data pair numbers in different flow data are also the same. Since the numbers are ordered, the positional relationship of the data pairs to the data pairs can be expressed by the serial relationship of the numbers to the numbers. Regarding the number of occurrences of the data pair in the set, it may be the number of occurrences of the word corresponding to the data pair in the dictionary.
The trained model can output word vectors of each group of data pairs, and the flow data is a word combination formed by ordered words, so that for the same flow data, summation and average processing can be performed on data with the same dimensionality in the word vectors of all the data pairs contained in the same flow data, and the vector obtained by processing is used as a feature vector of the flow data. The sum-and-average processing may be processing of summing first and then averaging the sum.
Fig. 3A is a flow chart illustrating a feature vector determination process according to an exemplary embodiment of the present description. The feature vector determination process may include:
in step 302, traffic data is exported from the traffic platform in JSON format.
The traffic data may include data such as an entry, an exit, a sub-call, an error stack, and the like.
In step 304, the JSON-formatted traffic data is imported into the offline table, and if the import is successful, the process proceeds to step 306, and if the import is failed, the process is terminated.
In step 306, each piece of traffic data is traversed and converted into a set of words consisting of words in the form of Key-Value.
In step 308, the words in Key-Value form are numbered in their order in the flow data set.
A dictionary is established for each business module, and all words in the dictionary can be composed of all key value pairs of the traffic data corresponding to the business module. And numbering all Key-Value Key Value pairs according to the sequence of the Key-Value Key Value pairs in the flow data set, wherein the same Key Value pair is numbered the same, and different Key Value pairs are numbered differently. The words in the dictionary are numbers starting with 1.
In step 310, the number of occurrences of each word in the lexicon is counted.
In step 312, the Word sequence corresponding to each flow is used as input to the Word2Vec component along with the number of times each Word occurs. The word sequence may consist of the number of data pairs in the word set.
In step 314, each Word is converted to a Word vector using the Word2Vec algorithm.
In step 316, each piece of traffic data is associated with each word it corresponds to and the corresponding word vector and a new record is generated. For example, the format of the record may be: traffic ID, word ID (number of key-value pair), word vector.
In step 318, a plurality of word vectors of each piece of traffic data are processed, the sum and average operation is performed on the distances in the same dimension on the word vectors, and the operation result is used as the distance of the whole piece of traffic data in the dimension, so as to obtain the feature vector of each piece of traffic.
Therein, steps 302 to 310 may be performed as a flow modeling process, e.g., by a flow modeling module. Steps 312 through 318 may be performed as a feature engineering process, for example, by a feature engineering module. With respect to step 312, the traffic data module may output the sequence of words corresponding to each flow and the number of times each word occurs to the feature engineering module as input to the feature engineering module.
Fig. 3B is a schematic diagram of a feature vector determination process shown in this specification according to an exemplary embodiment. Fig. 3B is a schematic diagram illustrating an example of a piece of traffic data. Suppose that a piece of traffic data includes input, output, exception, mock, etc. operation data. The flow data is converted into a word set consisting of words in the form of Key-Value. Key-Value is a Key-Value pair consisting of an attribute name and an attribute Value. Suppose there are N words (i.e., there are N key-value pairs) in the word set, for example, the key-value pair corresponding to input is K1: V1; the key value pair corresponding to the output is K2: V2; the key value pair corresponding to the exception is K3: V3; and the key value pair corresponding to the Nth operation data is KN to VN. And establishing a dictionary aiming at each business module, wherein words in the dictionary consist of all key value pairs of the traffic data corresponding to the business module. All Key-Value pairs are numbered in the order in which they appear. The number of times each word appears in the dictionary is counted. The Word sequence (the Word sequence is a sequence composed of numbers) corresponding to each flow and the number of times each Word appears in the dictionary are used as the input of the Word2Vec component, and then each Word is converted into a Word vector by using the Word2Vec algorithm. Suppose that the K-dimensional word vector corresponding to K1: V1 is: a11, a12 … a1k; the K-dimensional word vectors corresponding to K1: V1 are: a11, a12 … a1k; the K-dimensional word vectors corresponding to K2: V2 are: a21, a22 … a2k; the K-dimensional word vectors corresponding to K3: V3 are: a31, a32 … a3k; VN is the corresponding K-dimensional word vector: aN1, aN2 … aNk, and the like. And performing summation average operation on the data with the same dimensionality, taking an operation result as the data with the dimensionality, obtaining a K-dimensional vector after operation, and taking the vector as a characteristic vector of the flow data. For example, the formula:
calculating to obtain the data of the ith dimension in the feature vector, i belongs to [1,k ]]. The value of K may be set according to a requirement, in an example, the value range of K may be 15 to 25, and optionally, K may be 20.
It can be understood that other operation methods may also be adopted to calculate and obtain the data of the ith dimension in the feature vector, which is not described herein again.
After the feature vector of each piece of flow data is obtained, the feature vectors of the flow data in the set can be clustered to obtain classes with preset number of classes. The preset number of categories may be a preset number of feature classes, for example, M. The value of M may be determined according to the function provided by the service module. In an example, the value range of M may be 150 to 250, and optionally, M may be 200, so that millions of pieces of traffic data are reduced to a limited number, the number of playback traffic may be controlled within a certain range, the playback time is controllable, and the playback traffic may cover a complete service verification scenario.
Regarding clustering, a specified clustering algorithm can be adopted to cluster the feature vectors of the flow data in the set to obtain classes with preset number of classes. In one embodiment, the specified clustering algorithm may be a GMM (Gaussian Mixture Model) algorithm. The GMM can mean that the algorithm is formed by linearly superposing and mixing a plurality of Gaussian models. In this embodiment, a GMM algorithm may be used for cluster training to obtain model parameters of a gaussian model that can be divided into classes of a specified number of classes. For example, mean values, variance values, and the like are included. The model with model parameters may be referred to as a trained gaussian model. And predicting the probability value of each flow data belonging to a certain class by using the trained Gaussian model, and taking the class corresponding to the maximum probability value as the class to which the flow data belongs.
After the classes with the preset number of classes are obtained, the specified number of flow data can be screened out from each class, and the screened flow data is utilized to construct a test case of the service module. For example, the probability values of the traffic data belonging to the same class are sorted, and the traffic data with the probability value of R at the top is screened out from the sorting as representative traffic data of the class. The designated number R can be configured as desired. In one embodiment, R may be 1, for example, the flow data corresponding to the center of the class is taken as representative flow data representing the class, so as to reduce the number of pieces of flow data to the same number as the number of categories.
In an example, since the possible discrimination of the data of the next several dimensions in the feature vector of the K dimension is not large, in order to improve the processing efficiency, the feature vector of each piece of traffic data may be subjected to feature column splitting to obtain a specified number of feature columns. For example, the first s feature columns are extracted from the feature vector of the K dimension, and the feature vector of the s dimension is formed. For example, s may be set according to requirements, in an example, a value range of s may be 3 to 5, and optionally, s may take a value of 3.
In other embodiments, clustering algorithms such as K-means may also be used to cluster the feature vectors of the flow data in the set to obtain classes with the preset number of classes, which are not described herein again. As an embodiment, word vectorization and clustering are performed by adopting a Word2Vec + GMM algorithm, so that the testing efficiency can be improved, the input of testing human resources can be reduced, and the online problem of the service can be reduced by improving the testing quality.
After the traffic data corresponding to the service module is screened out, the screened-out traffic data can be used for constructing a test case of the service module. The constructed test cases can be divided according to the functions provided by the service modules and the success or failure. Still taking the transfer module as an example, assume that the transfer module can provide a sub-service to go to the balance and a sub-service to go to the bank card. Accordingly, test cases include, but are not limited to: example 1: transfer to the balance is successful; example 2: the transfer to the bank card is successful; example 3: transfer to balance failure, return of funds; example 4: transfer to the bank card fails and funds return. Multiple test cases may constitute a test case set. In one example, a manual marking mode may be adopted to mark what test case the screened traffic data belongs to. In other embodiments, the test case may also be constructed in other manners, which are not described herein.
The test case is used for testing the undistributed version of the service module. The released version of the business module and the undistributed version of the business module are for the same business module. By constructing the test case by using the flow data of the released version of the service module, the quality of the test of the undistributed version of the service module can be improved.
In the embodiment, the online flow screening problem is modeled into the problems of word vectorization and text clustering, and the attributes of method input, method output, error stack, sub-call and the like of the online flow are abstracted into the characteristic vector of the flow, so that the flow similarity distinguishing accuracy is improved; the distances of all word vectors belonging to the flow on the same line on each dimension are summed and averaged, the problem of flow characteristic vector representation is solved, all flows are predicted by each Gaussian model in a GMM clustering training result, the results are ranked, the front designated flow with the highest probability value is used as the most representative data of the corresponding classification of the model, and the problem of flow screening is solved.
In one example, after obtaining the test case of the business module, result reflow may be performed, and the test case may be stored, for example, in the recommended case set management module. In the subsequent application stage, regression testing can be performed on the unreleased version of the business module by using the test case. For example, in an online or grayscale environment, the selected test case suite is used for playback, and information such as a success case, a failure case, code coverage rate and the like can be transmitted to a management platform and displayed to a user.
Correspondingly, there is also provided a testing method, as shown in fig. 4, which is a flowchart of a testing method according to an exemplary embodiment shown in this specification, the method includes:
in step 402, a flow data set is obtained, wherein each flow data in the set comprises running data in actual running of a published version of a service module, each flow data comprises a plurality of data pairs, and each data pair comprises an attribute name and an attribute value;
in step 404, converting each piece of traffic data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of traffic data comprises a word set formed by ordered words;
in step 406, clustering the feature vectors of the flow data in the set to obtain classes with preset class numbers;
in step 408, screening a specified number of pieces of traffic data from each class, and constructing a test case of a service module by using the screened traffic data, wherein the test case is used for testing an unpublished version of the service module;
in step 410, regression testing is performed on the undistributed version of the business module using the test case.
Steps 402 to 408 are test case construction processes, and step 410 is a regression test process. The test case construction process and the regression test process can be executed in the same equipment or different equipment. The regression test may be triggered to execute when the test condition is satisfied.
It is understood that fig. 4 is the same as the related art in fig. 2, and the description thereof is not repeated herein.
The embodiment can improve the playback efficiency, simultaneously avoid the condition that various scenes are easy to miss due to artificial consideration, also avoid the problem of incomplete scene coverage probably occurring during acquisition and playback, and improve the test quality.
Management services may also be configured for ease of management. For example, on the basis of the test system described in fig. 1A, the data analysis module 140 may include an intelligent analysis sub-module and a playback management sub-module. The intelligent analysis submodule can be used for realizing any one of the test case construction methods. The intelligent analysis submodule may be a machine learning PAI (Platform of Intelligent) Platform. The playback management submodule can be used for managing the test cases and playback, and can be responsible for playback interface configuration, flow management, playback task management, recommended case management and the like. FIG. 5A is a schematic diagram of a data analysis module shown in accordance with an exemplary embodiment. The intelligent analysis submodule can provide services such as flow modeling, feature engineering, clustering, result backflow and the like. The flow modeling may include steps 302 through 310 in fig. 3A, and the characterization process may include steps 312 through 318. The playback management submodule can provide services such as playback interface configuration, flow management, playback task management and recommended case management. For example, the playback management submodule may configure the interface to determine which traffic data of the traffic module needs to be analyzed and processed for playback. And importing flow data corresponding to the interface, triggering the intelligent analysis submodule to construct a test case, storing the constructed test case by the playback management submodule, and inquiring the test case from the playback management submodule by a user. The playback task management may be to configure the playback timing of each service module, for example, to specify the playback time of each service module, and when the playback time arrives, to trigger the regression test of the service module. It can be understood that the playback management sub-module may also perform other management tasks, which are specifically configured according to requirements and are not described herein again.
Fig. 5B is a business flow diagram illustrating a testing method according to an exemplary embodiment of the present disclosure. The main business process can be divided into three parts, namely online flow acquisition, data analysis and offline/gray level environment continuous accurate playback. And acquiring flow data in the online application through a flow acquisition client installed on the online server. The traffic data is the running data of the published version of the service module in actual operation, and therefore may also be referred to as online traffic. For example, this may be accomplished by a dual engine, echo wall, etc. The data analysis process can include a playback management and test case construction process, through data analysis, the most representative flow data can be selected from data above 10W + for each service module to serve as a case set for accurate playback, and online flow in the set is controlled within two hundred. And the continuous accurate playback module in the offline or gray level environment receives the scheduling of the management platform, plays back the selected case set at regular time in the offline or gray level environment, transmits information such as successful case, failed case, code coverage rate and the like to the management platform and displays the information to a user. In the embodiment, the most representative traffic in history is selected for playback, the playback traffic can cover a complete service verification scene, the number of the playback traffic is controlled within a certain range, and the playback time is controllable.
The various technical features in the above embodiments can be arbitrarily combined, so long as there is no conflict or contradiction between the combinations of the features, but the combination is limited by the space and is not described one by one, and therefore, any combination of the various technical features in the above embodiments also belongs to the scope disclosed in the present specification.
Corresponding to the embodiments of the test case generation and test method, the present specification also provides embodiments of a test case generation and test apparatus and an electronic device applied thereto.
The embodiment of the test case generation/test device in the specification can be applied to computer equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of the computer device where the software implementation is located as a logical means. From a hardware aspect, as shown in fig. 6, the hardware structure diagram of the computer device in which the test case generating/testing apparatus of this specification is located is shown in fig. 6, except for the processor 610, the network interface 620, the memory 630, and the nonvolatile memory 640 shown in fig. 6, in an embodiment, the computer device in which the test case generating/testing apparatus 631 is located may also include other hardware according to an actual function of the device, which is not described again.
Fig. 7 is a schematic structural diagram of a test case generation apparatus according to an exemplary embodiment, where the apparatus includes:
a data acquisition module 72 for: acquiring a flow data set, wherein each flow data in the set comprises running data in actual running of a published version of a service module, each flow data comprises a plurality of groups of data pairs, and each group of data pairs consists of an attribute name and an attribute value;
a vector conversion module 74 for: converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words;
a data clustering module 76 for: clustering the characteristic vectors of the flow data in the set to obtain classes with preset class numbers;
a use case determination module 78 to: and screening specified number of flow data from each class, and constructing a test case of the service module by using the screened flow data, wherein the test case is used for testing the undistributed version of the service module.
In one embodiment, the vector conversion module 74 is configured to:
converting each piece of flow data into a word set formed by ordered words, wherein each data pair in the flow data is regarded as a word;
inputting a Word set corresponding to each piece of flow data and the occurrence frequency of data pairs regarded as words in the flow data set into a Word2Vector algorithm, and performing training sample construction and model training based on the Word2Vector algorithm to obtain Word vectors of each group of data pairs output by the trained model;
and (3) for the same flow data, performing summation and averaging processing on the data with the same dimensionality in the word vectors of all the data pairs contained in the same flow data, and taking the vector obtained by processing as a feature vector of the flow data.
In one embodiment, each set of data pairs in the Word set input into the Word2Vector algorithm is represented by a data pair number, and the numbering rule of the data pairs satisfies: the relationship between the numbers can represent the position relationship between the data pairs and the data pairs in the flow data, the numbers of different data pairs are different, and the numbers of the same data pairs are the same.
In one embodiment, the apparatus further comprises a data acquisition module to:
collecting flow data related to a released version of a service module when the service is executed;
storing the flow data on a file server in a JSON format;
the vector conversion module 74 is further configured to:
a data pair consisting of an attribute name and an attribute value is extracted from the traffic data in the JSON format.
In one embodiment, the traffic data includes a service request and data associated with the service request processed by a published version of the service module, and the apparatus further includes a testing module for:
taking a service request in a test case as input, and acquiring related data obtained by processing the service request by an undistributed version of the service module;
and comparing the undistributed version and the published version in the same service module to respectively process the user request to obtain respective related data, and taking the comparison result as a basis for evaluating the undistributed version of the service module.
Fig. 8 is a schematic structural diagram of a testing apparatus according to an exemplary embodiment, the apparatus includes:
a regression test module 82 to: performing regression testing on the unreleased version of the service module by using the test case;
a use case construction module 84 for: acquiring a flow data set, wherein each flow data in the set comprises running data in actual running of a published version of a service module, each flow data comprises a plurality of groups of data pairs, and each group of data pairs consists of an attribute name and an attribute value; converting each piece of flow data into a feature vector by using a word vectorization algorithm, wherein each data pair is regarded as a word in the conversion process, and each piece of flow data comprises a word set formed by ordered words; clustering the characteristic vectors of the flow data in the set to obtain classes with preset class numbers; and screening specified number of flow data from each class, and constructing a test case of the service module by using the screened flow data, wherein the test case is used for testing the undistributed version of the service module.
For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement without inventive effort.
Accordingly, the present specification further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements any one of the above test case generation/test methods when executing the program.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Correspondingly, the embodiment of the present specification further provides a computer storage medium, in which program instructions are stored, and the program instructions implement any one of the test case generation/test methods described above.
Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following the general principles of the specification and including such departures from the present disclosure as come within known or customary practice in the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.