CN114238070A - Test script generation method and system based on semantic recognition - Google Patents

Test script generation method and system based on semantic recognition Download PDF

Info

Publication number
CN114238070A
CN114238070A CN202111322833.6A CN202111322833A CN114238070A CN 114238070 A CN114238070 A CN 114238070A CN 202111322833 A CN202111322833 A CN 202111322833A CN 114238070 A CN114238070 A CN 114238070A
Authority
CN
China
Prior art keywords
test
test case
case
structured data
test script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111322833.6A
Other languages
Chinese (zh)
Other versions
CN114238070B (en
Inventor
李哲
张鑫
申连腾
李凌
翟天一
黄天航
底晓梦
贾强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electric Power Research Institute Co Ltd CEPRI
Original Assignee
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electric Power Research Institute Co Ltd CEPRI filed Critical China Electric Power Research Institute Co Ltd CEPRI
Priority to CN202111322833.6A priority Critical patent/CN114238070B/en
Publication of CN114238070A publication Critical patent/CN114238070A/en
Application granted granted Critical
Publication of CN114238070B publication Critical patent/CN114238070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a test script generation method and a test script generation system based on semantic recognition, wherein the method comprises the following steps: the method comprises the steps of databasing a pre-acquired original test case to obtain structured data after databasing; performing main component analysis on the test case based on the structured data after the database is formed, and acquiring a main and predicate object structure and a theme keyword of the test case; judging whether the main predicate object structure of the test case is coincident with the subject keyword or not; if the judgment result is that the test scripts are not coincident, the test script generation is finished; and if the judgment result is coincidence, skipping execution: and translating the subject key words of the test cases, and generating a test script based on the translated subject key words. The method of the invention satisfies the decoupling property, the logical self-consistent completeness and can be optimized independently; the script generation accuracy is high, the generation efficiency of the automatic script can be greatly improved, and a large amount of repetitive labor time is saved.

Description

Test script generation method and system based on semantic recognition
Technical Field
The invention belongs to the technical field of intelligent test script generation, and particularly relates to a test script generation method and system based on semantic recognition.
Background
At present, there are various methods for automatically generating codes, and a comparison of various aspects is shown in table 1.
TABLE 1 automatic test script code generation method comparison
Figure BDA0003345963930000011
Figure BDA0003345963930000021
Figure BDA0003345963930000031
As can be seen from table 1, for system development scenarios, the existing automatic code generation technologies generally have technical problems of complex models, single applicable scenarios, strong coupling with underlying codes, and the like, and are not suitable for the script generation task in the automated test scenario, and a new test script generation method and system based on semantic recognition are urgently needed.
Disclosure of Invention
The invention aims to provide a test script generation method and system based on semantic recognition, so as to solve one or more technical problems. The method is a conversion method from a traditional test case to an automatic script, meets decoupling and logical self-consistent completeness, and can be independently optimized; the script generation accuracy is high, the generation efficiency of the automatic script can be greatly improved, and a large amount of repetitive labor time is saved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a test script generation method based on semantic recognition, which comprises the following steps:
the method comprises the steps of databasing a pre-acquired original test case to obtain structured data after databasing;
performing main component analysis on the test case based on the structured data after the database is formed, and acquiring a main and predicate object structure and a theme keyword of the test case;
judging whether the main predicate object structure of the test case is coincident with the subject keyword or not; if the judgment result is that the test scripts are not coincident, the test script generation is finished; and if the judgment result is coincidence, skipping execution: and translating the subject key words of the test cases, and generating a test script based on the translated subject key words.
The further improvement of the present invention is that the step of databasing the pre-acquired original test cases to obtain structured data after databasing comprises:
extracting the pre-acquired content of the original test case in the Excel format to MySQL to realize databanking, and acquiring structured data after databanking.
The further improvement of the invention is that the step of extracting the pre-acquired content of the original test case in the Excel format to the MySQL to realize database formation and acquiring the structured data after database formation specifically comprises the following steps:
creating a form field according to the test case;
traversing each cell by row and column based on the established table field;
and updating the database based on the traversal result to obtain structured data.
The further improvement of the present invention is that the step of performing the principal component analysis of the test case based on the structured data after the database to obtain the major-minor structure and the topic keyword of the test case specifically comprises:
based on the structured data after the database transformation, a word segmentation method adopting a binary grammar and a data sparseness and smoothing strategy is adopted to obtain initial case word segmentation; carrying out disambiguation and part-of-speech tagging on the initial use case participle to obtain a final use case participle;
and performing dependency relationship analysis among morphemes based on the final case participle, and obtaining a main and predicate object structure and a theme key word of the test case based on an analysis result.
The further improvement of the invention is that in the process of disambiguating and part-of-speech tagging the initial case participle to obtain the final case participle, the disambiguation is carried out by a disambiguation method based on n-gram; the part-of-speech tagging processing is performed by a part-of-speech tagging method based on a structured perceptron, a conditional random field or a custom part-of-speech.
In the process of analyzing the interdependence relationship among the morphemes based on the final case participle, the interdependence relationship analysis among the morphemes is performed through the dependency syntax based on the transfer.
The invention is further improved in that the algorithm adopted in the process of the dependency syntax based transfer is an Arc-Eager transfer system.
The invention is further improved in that when the topic keyword is obtained, a Latent Semantic Analysis (language Semantic Analysis), a Probabilistic Latent Semantic Analysis (Probabilistic language Semantic Analysis) or a Linear Discriminant Analysis (Linear Discriminant Analysis) algorithm is adopted.
The further improvement of the present invention is that the step of translating the subject key words of the test case and generating the test script based on the translated subject key words specifically comprises:
adopting a pretrained vector framework model based on pyTorch to perform Chinese-to-English translation on the subject keywords of the test case to obtain translated subject keywords;
and generating a test script by taking the translated subject key words as a framework of the automation script.
The invention relates to a test script generation system based on semantic recognition, which comprises the following steps:
the structured data acquisition module is used for databasing the pre-acquired original test cases to acquire structured data after databasing;
the main component analysis module is used for carrying out main component analysis on the test case based on the structured data after the database is formed, and acquiring a main and predicate structure and a theme keyword of the test case;
the judging and generating module is used for judging whether the main and predicate object structures of the test case are overlapped with the theme key words or not; if the judgment result is non-coincidence, the test script generation is ended; and if the judgment result is coincidence, skipping execution: and translating the subject key words of the test cases, and generating a test script based on the translated subject key words.
Compared with the prior art, the invention has the following beneficial effects:
the method provided by the invention is a method for converting a traditional test case into an automatic script, and the steps of the method meet the decoupling property and the self-consistent completeness in logic and can be independently optimized; the method has the following three typical characteristics: inputting a pre-acquired test case (exemplarily, a conventional test case document in Excel or Word format); and (3) outputting: testing the script; conversion: realizing the meaning-preserving transformation to the maximum extent through key technical links such as semantic analysis, translation and the like; the test case is decomposed into a plurality of single-step operations, and under the condition that the semantics of each operation and operation object are clear, a test script can be automatically generated by adopting a method based on sentence component analysis, sentence theme extraction and case principal component translation of the test case. The script generation accuracy rate of the invention is high (exemplarily, the accuracy rate is 60-90%); for the traditional test case with good writing normative, the invention can greatly improve the generation efficiency of the automatic script and save a great deal of repetitive labor time.
In the method provided by the invention, the test case management can be convenient for subsequent operation and management in a mode of automatically importing the MySQL database.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a flow chart of a test script generation method based on semantic recognition according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a test case extraction flow according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an example of a test performed according to the embodiment of the present invention;
FIG. 4 is a diagram illustrating a sentence component analysis;
FIG. 5 is a diagram illustrating the analysis results of sentence components according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating keyword extraction for test cases according to an embodiment of the present invention;
FIG. 7 is a diagram of a transformation architecture based on pyTorch in an embodiment of the present invention;
FIG. 8 is a diagram illustrating a stacking mechanism according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only partial embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, a test script generation method based on semantic recognition according to an embodiment of the present invention is a method for converting a traditional test case into an automation script, and the steps of the method meet decoupling, self-consistent completeness in logic, and can be optimized independently, and specifically includes the following steps:
traditional test case database function module: for a traditional test case based on an Excel format, the database and the structured management can be realized by extracting case contents into MySQL, and the database greatly simplifies the subsequent operations of retrieval, modification, comparison and the like of the test case; the functional module provides an extraction function from an Excel format test case to a MySQL database;
the test case component analysis functional module: the traditional test cases are originally written to facilitate reading of testers, so even if part of the test cases are written under the normalized condition, most of the descriptions of the test cases have more or less language components. Therefore, the standardization of the existing test case is a necessary step before the existing test case is translated into the automatic script, sentence component analysis is one of main methods for language standardization analysis, and the functional module provides a main and predicate guest main component analysis and extraction function for the traditional test case;
the test case subject word extraction function module: the semantic principal component sometimes deviates from the structure of the principal and predicate object in the sentence component, so it is necessary to research the following two aspects: semantic extraction of test cases: extracting in the form of subject words; testing whether the semantic principal component of the case coincides with the principal component of the sentence structure: in the case of coincidence, principal component extraction is accurate. If the semantic principal component and the sentence structure principal component are not coincident, whether the subject word is used as the basis for generating the automatic script or not can not be judged, other technologies are needed to solve the problem, and the situation is not considered in the invention. In combination with the above, the functional module will provide input for the next translation work.
A function module for translating the subject term by using the principal components is carried out: because the test script is mostly in an English format, after the subject term extraction of the test case is completed, the subject term is subjected to case principal component translation and serves as a framework of the automatic script. The functional module needs to complete translation work in a local execution environment, and can realize iterative optimization of a translation model under the condition of sufficient training data.
In an exemplary embodiment of the present invention, a typical test script applicable to an actual application scenario of automated inspection is generated as shown in fig. 1. In the embodiment of the invention, a complete automatic script generation process has the following three typical characteristics: inputting: a traditional test case document in Excel or Word format; and (3) outputting: testing the script; conversion: the meaning-preserving transformation is realized to the maximum extent through key technical links such as semantic analysis, translation and the like.
Referring to fig. 2 and fig. 3, in the embodiment of the present invention, the test case extracting step is as shown in fig. 2, and includes the following three steps:
(1) creating a form field according to the test case;
(2) traversing each cell according to rows and columns;
(3) and updating the traversal result to the database.
An exemplary extracted test of an embodiment of the present invention is shown in fig. 3.
In the embodiment of the invention, the word segmentation technology suitable for the automatic test scene in the sentence component analysis of the test case is adopted, the language model is a skeleton of a function, and the parameters of the function can be obtained only by statistics on a corpus. In order to meet the actual engineering requirements, a high-quality corpus with sufficient components is indispensable. An exemplary, commonly used corpus according to an embodiment of the present invention includes: a 'people daily newspaper' corpus PKU; microsoft asian research institute corpus MSR; CITYU (prosperous) of hong kong city university; the information related to the above corpus is shown in table 2.
TABLE 2 several Main corpus information
Figure BDA0003345963930000081
Figure BDA0003345963930000091
In the preferred embodiment of the present invention, the MSR is used as the first choice of the participle corpus, and there are four reasons as follows: the MSR corpus comprises linguistic data of fifteen subject classifications of astronomy, biology, chemistry, computer science and the like; MSR is superior to PKU in labeling consistency; MSR is superior to PKU in the segmentation granularity, the organization name of the MSR is not segmented, and the PKU is disassembled; the name in the MSR is taken as a whole, so that the MSR is more in line with the habit; the MSR is twice as large as the PKU.
Therefore, the word segmentation technology adopting the binary grammar, the data sparseness and the smoothing strategy and the MSR word segmentation corpus training data set are word segmentation technologies suitable for automatic detection scenes.
The method is suitable for the part-of-speech tagging technology of the automatic test scene, and the recognition accuracy of the part-of-speech tagging technology based on the structured sensing machine and the conditional random field can meet the requirements of practical application. In addition, certain words can be marked with self-defined labels in a self-defined part-of-speech mode, and the customization requirement of part-of-speech tagging in an automatic detection scene is met.
The embodiment of the invention is suitable for the dependency analysis technology of an automatic test scene, a typical transfer dependency analysis algorithm is an Arc-Eager transfer system, and a transfer system S is composed of 4 components: s ═ (C, T, Cs, Ct), where: c is a set of system states; t is the set of all the branch actions that can be performed; cs is an initialization function; ct is a series of termination states, and the system can be stopped to output a final action sequence after entering the states.
The system state consists of 3 tuples: c ═ C (σ, β, a) where: σ is a stack for storing words; beta is a queue for storing the single words; a is the set of determined dependent arcs.
The transfer action set of a typical Arc-Eager transfer system, as exemplified by an embodiment of the present invention, is detailed in table 3.
TABLE 3 transfer action set for typical Arc-Eager transfer System
Figure BDA0003345963930000101
At this time, the dependent arc in the set a is a dependent syntax tree. The technique of using sentence component analysis in the present invention is shown in fig. 4, and the sentence component analysis result is shown in fig. 5.
In the keyword extraction of the embodiment of the invention, the extraction accuracy of the keyword extraction methods LSA, PLSA and LDA algorithm based on the topic model is higher than that of TF-IDF and TextRank algorithms, so that the keyword extraction is realized by adopting the LDA algorithm.
In the embodiment of the present invention, a typical LDA algorithm may be divided into two steps:
preprocessing the training corpus: preprocessing of the corpus refers to the process of converting the original character text in the document into sparse vectors understood by the LDA algorithm model. In general, the raw corpus to be processed is a collection of documents, each of which is a collection of some raw characters. Before being handed to model training of the LDA algorithm, these native characters need to be parsed into a sparse vector format that the LDA algorithm can handle. Due to the diversity of languages and applications, the LDA algorithm does not make any mandatory restrictions on the preprocessed interfaces. The original text is usually subjected to word segmentation, stop word removal and other operations to obtain a feature list of each document. And establishing an index dictionary of the corpus features by calling an API (application programming interface) provided by an LDA (latent dirichlet allocation) algorithm, and converting the original expression of the text features into the expression of sparse vectors corresponding to the bag-of-words model. And obtaining a sparse vector corresponding to each document in the corpus through preprocessing, wherein each element of the vector represents the occurrence frequency of a word in the document. For the consideration of memory optimization, the LDA algorithm supports document stream processing, so that the list can be packaged into a Python iterator, and each iteration returns a sparse vector.
Transformation of the theme vector: the transformation of the text vector is the core of the LDA algorithm; by mining the hidden semantic structural features in the corpus, a simple and efficient text vector can be transformed. The theme vector change can be divided into two steps: model object initialization: generally, the LDA algorithm model receives a training corpus (corpus corresponding to a sparse vector iterator) as an initialization parameter. More complex models require more parameters to be configured. Vector conversion: the model can be called to convert any one corpus into TF-IDF, TextRank, PLSA, LDA or word2vec vector iterators.
The embodiment of the invention is suitable for a parameter adjusting method for keyword extraction, and for an LDA algorithm, typical adjustable parameters and parameter meanings are as follows:
n _ topics: the hidden theme number K needs to be adjusted; the size of K depends on the requirement of theme division, for example, only the similar division needs to be coarse granularity requirement of animals, plants and non-living beings, and the value of K can be small, and is single digit. If the goal is to similarly distinguish fine particle size requirements of different animals from different plants, different non-living, the K value needs to be taken very large, e.g., thousands. At this time, the number of required training documents is very large, including:
doc _ topoc _ prior: namely a parameter alpha of a document theme prior Dirichlet distribution thetad; generally if there is no a priori knowledge of the topic distribution, a default value of 1/K may be used;
topoic _ word _ prior: namely a parameter eta of the prior Dirichlet distribution beta k of the subject term; generally if we have no prior knowledge of the topic distribution, a default value of 1/K can be used;
learning _ method: namely the LDA solving algorithm; there are two options, the 'batch' and the 'online'. 'batch' is a variation inference EM algorithm in the theory, and 'online' is an algorithm for updating distribution of subject words by using samples in batches step by introducing step training on the basis of 'batch'. The default is 'online', and if 'online' is selected, the training can be distributed by using a partial _ fit function during training. However, in scimit-spare version 0.20 the default algorithm would change back to 'batch'. It is preferable to use 'batch' if the sample size is not large enough to be used for learning, so that many parameters can be reduced. If too many samples are too large, 'online' is preferred;
learning _ decay: the 'online' algorithm is only meaningful when the 'online' is used, and the value is preferably (0.5, 1.0) so as to ensure the gradual convergence of the 'online' algorithm;
learning _ offset: the method is only significant when 'online' is used in the algorithm, and the value is larger than 1; the method is used for reducing the influence of the previous training sample batch on the final model;
max _ iter: maximum number of iterations of the EM algorithm;
total _ samples: it is only meaningful when the algorithm uses 'online', i.e. the number of document samples per batch when training step by step, is needed when using the partial _ fit function;
batch _ size: it is only meaningful if the algorithm uses 'online', i.e. the number of document samples used each time the EM algorithm iterates;
mean _ change _ tol: step E, updating the threshold of the variation parameters, and if all the variation parameters are updated to be smaller than the threshold, ending step E and turning to step M; the default values are not generally modified;
max _ doc _ update _ iter: and E, updating the maximum iteration times of the variation parameters, and if the iteration times of the step E reach a threshold value, turning to the step M.
In the embodiment of the invention, the following five parameters are adjusted: n _ topics: the number of themes; n _ features: the number of features, namely the number of commonly used words; doc _ topoc _ prior: namely a parameter alpha of a document theme prior Dirichlet distribution thetad; topoic _ word _ prior: namely a parameter eta of the prior Dirichlet distribution beta k of the subject term; learning _ method: namely the solution algorithm of LDA, the method has two options of 'batch' and 'online'.
Exemplary keyword extraction for test cases according to the embodiment of the present invention is shown in fig. 6.
In the keyword translation in the embodiment of the invention, the optimal result of the accuracy and the translation time can be obtained simultaneously by adopting the translation algorithm of the transform technology, so that the embodiment of the invention realizes the translation of the main components of the use case by adopting a neural network structure with 8 layers of 512 transform units. The pyTorch is a python-preferred deep learning framework and can realize Tensor calculation and a dynamic neural network on the basis of GPU acceleration. Compared to the TensorFlow based on static graphs, the dynamic neural network structure of pyTorch is more flexible, and the network behavior can be arbitrarily changed with zero delay or zero cost through a technique called inverse mode auto-differentiation.
In the embodiment of the invention, a schematic diagram of a Transformer architecture based on pyTorch is shown in FIG. 7.
In the embodiment of the invention, the transformation translation process based on pyTorch can be divided into three steps of preprocessing, training and translation, and comprises the following steps:
the preprocessing process mainly comprises a shuffling process and sorting according to sentence lengths. Wherein, within each length of sentence, the sequence of sentences is random and is ordered according to sentence length. In each resulting batch, the sentence lengths are substantially equal. The training speed can be accelerated by the method. The saving of pt files, including: (1) and (2) dct: dictionary format, two dictionaries of 'src' and 'tgt' are saved; (2) train: the dictionary format is stored with two Dict classes of 'src' and 'tgt'; (3) valid: dictionary format, holding two Dict classes of 'src' and 'tgt'. Meanwhile, the dictionary file is also stored.
The training method mainly comprises the following parameters: input _ size: input's Embedding _ size; hidden _ size: the number of hidden states; num _ layers: the number of layers; a bias: default to True, if False is set, the network will not use b _ ih, b _ hh. (see the formula for calculation in LSTM in Link for details); batch _ fisrt: if True is set, the shape of the input and output will become (batch x seq _ length x embedding _ size); dropout: if not 0, discarding (1-dropout) proportion of hidden neurons between longitudinal layers except the last layer; bidirectional: default to False, and if True, become a bidirectional RNN.
Where the input to the LTSM is input, (h _0, c _ 0):
input:seq_len x batch x enbedding_size;
h_0:num_layers*num_directions x batch x hidden_size;
c_0:num_layers*num_directions x batch x hidden_size。
where the output of the LTSM is:
output:seq_len x batch x hidden_size*num_directions;
h_n:num_layers*num_directions x batch x hidden_size;
c_n:num_layers*num_directions x batch x hidden_size。
wherein, the input dimension of LSTMCell () is:
input:batch x embedding_size;
h_0:batch x hidden_size;
c_0:batch x hidden_size。
the output dimension of LSTMCell () is:
h_1:batch x hidden_size;
c_1:batch x hidden_size。
rnn in the decoder is implemented using LSTMCell () stacking, as shown in fig. 8, because the attribute mechanism is introduced in the decoder: as can be seen in the figure, after attn _ applied at bmm, the OpenNMT-py code has no choice to combine attn _ applied with embedd, but after one softmax, it is transformed into batch x 1 x src _ send _ length (attn3), multiplied by context matrix (weighted context), connected to input (contextual combined), and finally linearly changed and taken again for tanh, and returned.
To sum up, at present, the script generation method for the automated test scenario must meet the following requirements: the logic is simple: the generation logic of the script is simple and clear, and the script has complete and self-consistent logic; the applicable scenes are rich: the method can be suitable for various test scenes such as functional test, safety test and the like; local operation: the resources are acquired without being suitable for cloud access or other online access modes, and the local single-machine operation can be realized; the expansibility is strong: the seamless expansion of the functions can be realized by means of plug-ins or functional modules; decoupling from the underlying code: decoupled from the system under test. The existing code generation methods all aim at system development scenes, generally have the problems of complex models, single applicable scenes, strong coupling with bottom layer codes and the like, and are not suitable for script generation tasks in automatic test scenes. In the method provided by the embodiment of the invention, the traditional test case management can facilitate subsequent operation and management in a mode of automatically importing the MySQL database. Under the condition that the test case can be decomposed into a plurality of single-step operations, and meanwhile, the semantics of each operation and an operation object are clear, the test script can be automatically generated by adopting a method based on sentence component analysis, sentence theme extraction and case principal component translation of the test case, the correctness of the operation is verified, the script generation accuracy of the invention is between 60 and 90 percent according to the writing normalization of the text test case, and for the traditional test case with good writing normalization, the invention can greatly improve the generation efficiency of the automatic script and save a large amount of repetitive labor time.
The key points of the functional test case conversion technology based on semantic analysis in the method of the embodiment of the invention are how to overcome the inaccuracy of natural language, how to convert the test case based on the nonstandard language structure into the strict computer language and the like, and the invention point is briefly described as table 4.
TABLE 4 technical difficulties and innovations
Figure BDA0003345963930000151
Figure BDA0003345963930000161
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details of non-careless mistakes in the embodiment of the apparatus, please refer to the embodiment of the method of the present invention.
In another embodiment of the present invention, a test script generating system based on semantic recognition is provided, which includes:
the structured data acquisition module is used for databasing the pre-acquired original test cases to acquire structured data after databasing;
the main component analysis module is used for carrying out main component analysis on the test case based on the structured data after the database is formed, and acquiring a main and predicate structure and a theme keyword of the test case;
the judging and generating module is used for judging whether the main and predicate object structures of the test case are overlapped with the theme key words or not; if the judgment result is non-coincidence, the test script generation is ended; and if the judgment result is coincidence, skipping execution: and translating the subject key words of the test cases, and generating a test script based on the translated subject key words.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A test script generation method based on semantic recognition is characterized by comprising the following steps:
the method comprises the steps of databasing a pre-acquired original test case to obtain structured data after databasing;
performing main component analysis on the test case based on the structured data after the database is formed, and acquiring a main and predicate object structure and a theme keyword of the test case;
judging whether the main predicate object structure of the test case is coincident with the subject keyword or not; if the judgment result is that the test scripts are not coincident, the test script generation is finished; and if the judgment result is coincidence, skipping execution: and translating the subject key words of the test cases, and generating a test script based on the translated subject key words.
2. The method according to claim 1, wherein the step of databasing the pre-acquired original test cases to obtain the structured data after databasing comprises:
extracting the pre-acquired content of the original test case in the Excel format to MySQL to realize databanking, and acquiring structured data after databanking.
3. The method for generating the test script based on the semantic recognition according to claim 2, wherein the step of extracting the pre-acquired content of the original test case in the Excel format to MySQL to realize database, and obtaining the structured data after database specifically comprises:
creating a form field according to the test case;
traversing each cell by row and column based on the established table field;
and updating the database based on the traversal result to obtain structured data.
4. The method for generating the test script based on the semantic recognition according to claim 1, wherein the step of performing the main component analysis of the test case based on the structured data after the database to obtain the subject-predicate object structure and the subject keyword of the test case specifically comprises:
based on the structured data after the database transformation, a word segmentation method adopting a binary grammar and a data sparseness and smoothing strategy is adopted to obtain initial case word segmentation; carrying out disambiguation and part-of-speech tagging on the initial use case participle to obtain a final use case participle;
and analyzing the dependency relationship among morphemes based on the final case participle, and obtaining the main and predicate object structures and the theme key words of the test case based on the analysis result.
5. The test script generation method based on semantic recognition according to claim 4, wherein in the process of disambiguating and part-of-speech tagging the initial case participle to obtain the final case participle, the disambiguation is performed by an n-gram based disambiguation method; the part-of-speech tagging processing is performed by a part-of-speech tagging method based on a structured perceptron, a conditional random field or a custom part-of-speech.
6. The method as claimed in claim 4, wherein in the process of performing the interdependence analysis between morphemes based on the final case participles, the interdependence analysis between morphemes is performed by dependency syntax based on transfer.
7. The method for generating the test script based on the semantic recognition as claimed in claim 6, wherein the algorithm adopted in the process of proceeding based on the dependency syntax of the branch is Arc-Eager branch system.
8. The method as claimed in claim 4, wherein a latent semantic analysis, a probabilistic latent semantic analysis or a linear discriminant analysis algorithm is used to obtain the topic keyword.
9. The method for generating the test script based on the semantic recognition according to claim 1, wherein the step of translating the subject key words of the test cases and generating the test script based on the translated subject key words specifically comprises:
performing Chinese-translation-English translation on the subject keywords of the test case by adopting a pretrained vector framework model based on pyTorch to obtain translated subject keywords;
and generating a test script by taking the translated subject key words as a framework of the automation script.
10. A test script generation system based on semantic recognition is characterized by comprising:
the structured data acquisition module is used for databasing the pre-acquired original test cases to acquire structured data after databasing;
the main component analysis module is used for carrying out main component analysis on the test case based on the structured data after the database is formed, and acquiring a main and predicate structure and a theme keyword of the test case;
the judging and generating module is used for judging whether the main and predicate object structures of the test case are overlapped with the theme key words or not; if the judgment result is that the test scripts are not coincident, the test script generation is finished; and if the judgment result is coincidence, skipping execution: and translating the subject key words of the test cases, and generating a test script based on the translated subject key words.
CN202111322833.6A 2021-11-09 2021-11-09 Test script generation method and system based on semantic recognition Active CN114238070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111322833.6A CN114238070B (en) 2021-11-09 2021-11-09 Test script generation method and system based on semantic recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111322833.6A CN114238070B (en) 2021-11-09 2021-11-09 Test script generation method and system based on semantic recognition

Publications (2)

Publication Number Publication Date
CN114238070A true CN114238070A (en) 2022-03-25
CN114238070B CN114238070B (en) 2023-08-18

Family

ID=80748954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111322833.6A Active CN114238070B (en) 2021-11-09 2021-11-09 Test script generation method and system based on semantic recognition

Country Status (1)

Country Link
CN (1) CN114238070B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130183649A1 (en) * 2011-06-15 2013-07-18 Ceresis, Llc Method for generating visual mapping of knowledge information from parsing of text inputs for subjects and predicates
CN107832229A (en) * 2017-12-03 2018-03-23 中国直升机设计研究所 A kind of system testing case automatic generating method based on NLP
CN107844417A (en) * 2017-10-20 2018-03-27 东软集团股份有限公司 Method for generating test case and device
CN110162468A (en) * 2019-04-26 2019-08-23 腾讯科技(深圳)有限公司 A kind of test method, device and computer readable storage medium
CN111581090A (en) * 2020-04-30 2020-08-25 重庆富民银行股份有限公司 Automatic test case generation method and system based on NLP and RF framework
CN112286814A (en) * 2020-10-30 2021-01-29 上海纳恩汽车技术有限公司 Automatic generation system and method of test case script
CN113282498A (en) * 2021-05-31 2021-08-20 平安国际智慧城市科技股份有限公司 Test case generation method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130183649A1 (en) * 2011-06-15 2013-07-18 Ceresis, Llc Method for generating visual mapping of knowledge information from parsing of text inputs for subjects and predicates
CN107844417A (en) * 2017-10-20 2018-03-27 东软集团股份有限公司 Method for generating test case and device
CN107832229A (en) * 2017-12-03 2018-03-23 中国直升机设计研究所 A kind of system testing case automatic generating method based on NLP
CN110162468A (en) * 2019-04-26 2019-08-23 腾讯科技(深圳)有限公司 A kind of test method, device and computer readable storage medium
CN111581090A (en) * 2020-04-30 2020-08-25 重庆富民银行股份有限公司 Automatic test case generation method and system based on NLP and RF framework
CN112286814A (en) * 2020-10-30 2021-01-29 上海纳恩汽车技术有限公司 Automatic generation system and method of test case script
CN113282498A (en) * 2021-05-31 2021-08-20 平安国际智慧城市科技股份有限公司 Test case generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114238070B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
US10409911B2 (en) Systems and methods for text analytics processor
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
Zhang et al. SG-Net: Syntax guided transformer for language representation
CN110597961A (en) Text category labeling method and device, electronic equipment and storage medium
CN109558482B (en) Parallelization method of text clustering model PW-LDA based on Spark framework
Sellam et al. Deepbase: Deep inspection of neural networks
CN116304748B (en) Text similarity calculation method, system, equipment and medium
CN113254655B (en) Text classification method, electronic device and computer storage medium
CN113821635A (en) Text abstract generation method and system for financial field
CN114217766A (en) Semi-automatic demand extraction method based on pre-training language fine-tuning and dependency characteristics
CN112507124A (en) Chapter-level event causal relationship extraction method based on graph model
Radford Multitask models for supervised protests detection in texts
CN114238070B (en) Test script generation method and system based on semantic recognition
Gendron et al. Natural language processing: a model to predict a sequence of words
Acharjee et al. Sequence-to-sequence learning-based conversion of pseudo-code to source code using neural translation approach
Putra et al. Textual Entailment Technique for the Bahasa Using BiLSTM
He et al. Mongolian word segmentation based on BiLSTM-CNN-CRF model
Shams et al. Lexical intent recognition in urdu queries using deep neural networks
Patel et al. To laugh or not to laugh–LSTM based humor detection approach
Kayalvizhi et al. Deep learning approach for extracting catch phrases from legal documents
Walsh Natural Language Processing
Colton Text classification using Python
Duo et al. Transition based neural network dependency parsing of Tibetan
US20240143334A1 (en) Text block classification by dependency tree generation
Dhivyashree et al. A Combined Model of NLP with Business Process Modelling for Sentiment Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant