CN113032253B - Test data feature extraction method, test method and related device - Google Patents

Test data feature extraction method, test method and related device Download PDF

Info

Publication number
CN113032253B
CN113032253B CN202110292100.6A CN202110292100A CN113032253B CN 113032253 B CN113032253 B CN 113032253B CN 202110292100 A CN202110292100 A CN 202110292100A CN 113032253 B CN113032253 B CN 113032253B
Authority
CN
China
Prior art keywords
test data
test
original corpus
short
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110292100.6A
Other languages
Chinese (zh)
Other versions
CN113032253A (en
Inventor
陈振坤
张伟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN202110292100.6A priority Critical patent/CN113032253B/en
Publication of CN113032253A publication Critical patent/CN113032253A/en
Application granted granted Critical
Publication of CN113032253B publication Critical patent/CN113032253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/368Test management for test version control, e.g. updating test cases to a new software version
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a test data feature extraction method, a test method and a related device, wherein the test data feature extraction method comprises the following steps: acquiring original corpus short sentences in a test case; extracting keywords of the original corpus short sentences to form a keyword set; adjusting the keyword set by adopting a pre-trained word embedding model to obtain a vectorization characteristic sentence corresponding to the original corpus short sentence; and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences. By the scheme, redundant construction of test data can be avoided.

Description

Test data feature extraction method, test method and related device
Technical Field
The present application relates to the field of software testing technologies, and in particular, to a method for extracting features of test data, a test method, and a related device.
Background
Because the description of the same thing by the Chinese natural language has a plurality of speaking methods, especially the text written by different people has larger difference, the difference can occur when different people write the structured test case by the Chinese natural language, thereby leading to redundant construction of test data and forming excessive repeated labor and low multiplexing results.
The existing implementation scheme can use sentence vector mode, uses a large number of Chinese text sets as training materials, obtains a data model through unsupervised learning, then utilizes the model to cluster the phrases in the structured test case, and associates similar phrases with unique test data ID mapping. However, this solution has two significant problems: firstly, a large number of Chinese text sets are relied on as training materials, but a large number of structured test case texts are generally difficult to provide for new products; secondly, the correlation accuracy by a clustering mode is low, and is generally lower than 50%. In view of this, how to provide a test data feature extraction method with high accuracy and without requiring a large amount of training materials has become a very valuable topic.
Disclosure of Invention
The application mainly solves the technical problem of providing a test data feature extraction method, a test method and a related device, which can avoid redundant construction of test data.
In order to solve the above problems, a first aspect of the present application provides a test data feature extraction method, the extraction method comprising: acquiring original corpus short sentences in a test case; extracting keywords of the original corpus short sentences to form a keyword set; adjusting the keyword set by adopting a pre-trained word embedding model to obtain a vectorization characteristic sentence corresponding to the original corpus short sentence; and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences.
In order to solve the above problems, a second aspect of the present application provides a test method comprising: extracting test data features from all original corpus short sentences in the test case by using a test data feature extraction method to obtain test data features corresponding to each original corpus short sentence; establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics; testing all original corpus short sentences in the test case according to the classification result; the test data feature extraction method is the test data feature extraction method of the first aspect.
In order to solve the above problem, a third aspect of the present application provides an extraction device for testing data features, including: the corpus acquisition module is used for acquiring original corpus short sentences in the test cases; the keyword extraction module is used for extracting keywords of the original corpus short sentences to form a keyword set; the vectorization module is used for adjusting the keyword set by adopting a pre-trained word embedding model to obtain vectorization feature sentences corresponding to the original corpus short sentences; and the encryption module is used for encrypting the vectorized feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences.
In order to solve the above problems, a fourth aspect of the present application provides a test apparatus, comprising: the feature extraction module is used for extracting test data features of all original corpus short sentences in the test case by using a test data feature extraction method to obtain test data features corresponding to each original corpus short sentence; the classification module is used for establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics; the testing module is used for testing all original corpus short sentences in the test case according to the classification result; the test data feature extraction method is the test data feature extraction method of the first aspect.
In order to solve the above-mentioned problems, a fifth aspect of the present application provides an electronic device, including a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory, to implement the test data feature extraction method of the first aspect, or the test method of the second aspect.
In order to solve the above-described problems, a sixth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the test data feature extraction method of the above-described first aspect, or the test method of the above-described second aspect.
The beneficial effects of the application are as follows: in the test data feature extraction method, the key words of the original corpus short sentences are extracted to form a key word set, a pre-trained word embedding model is adopted to adjust the key word set to obtain vectorized feature sentences corresponding to the original corpus short sentences, and then a preset encryption algorithm is adopted to encrypt the vectorized feature sentences, so that test data features corresponding to the original corpus short sentences are obtained. In addition, the test method of the application uses the test data obtained by the test data characteristic extraction method to carry out software test, thereby realizing low repeated labor and high multiplexing.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for extracting test data features according to the present application;
FIG. 2 is a flowchart of an embodiment of step S12 in FIG. 1;
FIG. 3 is a flowchart illustrating an embodiment of step S13 in FIG. 1;
FIG. 4 is a flow chart of an embodiment of the testing method of the present application;
FIG. 5 is a schematic diagram of an embodiment of an apparatus for extracting test data features according to the present application;
FIG. 6 is a schematic diagram of a testing apparatus according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;
FIG. 8 is a schematic diagram of a frame of one embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes embodiments of the present application in detail with reference to the drawings.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a test data feature extraction method according to the present application. Specifically, the test data feature extraction method in this embodiment may include the following steps:
step S11: and obtaining the original corpus short sentence in the test case.
Software testing is usually performed by carefully selecting a batch of test data to form test cases according to specifications of various stages of software development and internal structures of programs, using the test cases to drive a tested program, observing execution results of the programs, verifying whether the obtained results are consistent with expected results, and then performing corresponding adjustment. In one embodiment, the test cases may be structured test cases, which are semi-formal test case documents written in chinese natural language, written in accordance with UML (unified modeling language or standard modeling language) and BNF (bachelus-style) constraints. In order to ensure the coverage rate of software testing, the test cases generally comprise large-scale corpus data which are scientifically sampled and processed, the corpus data can be collected in the process of testing the processed program, and because the description of the same thing by Chinese natural language has multiple speaking methods, especially the text written by different people has larger difference, the corpus data are repeatedly described, so that the original corpus short sentences in the initial test cases need to be obtained to realize the feature extraction of the initial test cases, and the redundant construction of the test data is avoided.
Step S12: and extracting keywords of the original corpus short sentences to form a keyword set.
In the natural language processing process, various expressions exist for the description of the same things, and largely, the expressions with the modifier have a plurality of modifier words in the corpus, the modifier words have smaller meaning for the description of the things in most cases, and words with main effects on the description of the things are keywords of the original corpus short sentences. By extracting the keywords of the original corpus short sentence, a keyword set can be formed, and the formed keyword set can reflect things to be described by the original corpus short sentence.
Referring to fig. 2, fig. 2 is a flowchart illustrating an embodiment of step S12 in fig. 1. In an embodiment, the step S12 may specifically include:
Step S121: and performing word segmentation on the original corpus short sentences to obtain an original word set.
Specifically, the original corpus short sentence comprises the requirement of testing a target program, word segmentation processing is carried out on the test target information, a plurality of words contained in the test target information are obtained, and an original word set formed by the plurality of words obtained after the word segmentation processing can be obtained. When the word segmentation is performed on the original corpus short sentence, jieba (barker word segmentation) algorithm, snowNLP (emotion analysis) algorithm, THULAC (Chinese lexical analysis) algorithm and the like can be adopted, or other algorithms with word segmentation function can be adopted, so that the original corpus short sentence can be segmented into individual words.
Step S122: and filtering the original word set by adopting a preset processing rule to obtain the keyword set.
It can be understood that after the original corpus short sentence is subjected to word segmentation processing to obtain an original word set, the original word set may include vocabularies which are used for representing the mood, the modification and the like and have no specific meaning, the vocabularies which have no specific meaning in the original word set are filtered by adopting a preset processing rule, and the remaining vocabularies can reflect things to be described by the original corpus short sentence, so that a corresponding keyword set is obtained.
Specifically, the preset processing rule comprises at least one of a de-stop word rule, a de-punctuation rule and a de-numerical rule. After the word segmentation is finished, filtering the obtained original word set, wherein the rule of removing the stop word refers to that a stop word stock is established to filter the original word set so as to filter the segmentation existing in the original word set in the stop word stock, the rule of removing the punctuation refers to that punctuation marks, suffix marks and the like in the original word set are filtered, and the rule of removing the digits refers to that the digits in the original word set are filtered. By filtering out useless words in the original word set, which do not have actual meanings, the accuracy of the feature extraction of the test data can be improved.
Step S13: and adjusting the keyword set by adopting a pre-trained word embedding model to obtain the vectorization characteristic sentences corresponding to the original corpus short sentences.
The word embedding model can vectorize all words through training, so that the relation between the de-metric words and the words can be quantified, and the relation between the words is mined, therefore, word vectors output by the word embedding model can be used for performing a plurality of natural language processing related works, such as clustering, synonym finding, part-of-speech analysis and the like. In the application, a keyword set is adjusted by adopting a pre-trained word embedding model, and a vectorization characteristic sentence corresponding to an original corpus short sentence can be obtained; specifically, the keyword is vectorized by adopting a pre-trained word embedding model, so that multidimensional vector data, namely a word vector of the keyword, can be obtained; and according to the word characteristics of the keywords in the original corpus short sentences, vectorizing each keyword in the keyword set to obtain a word vector set, and according to the word vector set, converting the original corpus short sentences into vectorized feature sentences.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of step S13 in fig. 1. In an embodiment, the step S13 may specifically include:
Step S131: and inputting the keyword set into the pre-trained word embedding model to obtain a word vector.
Step S132: and replacing the body word by using an adjacent word nearest to the body word in the embedded space to obtain the vectorization characteristic sentence corresponding to the original corpus short sentence.
And carrying out vectorization processing on each keyword to obtain a word vector of each keyword, thereby obtaining a word vector set formed by the word vectors corresponding to each original corpus short sentence. The word vector is output through the word embedding model, and the similarity between the closest words in the embedding space is highest, so that the adjacent words closest to the ontology word in the embedding space are used for replacing the ontology word, the obtained vectorization characteristic sentences corresponding to the original corpus short sentences can be used for flattening the differences of the structured text test cases written by different people in Chinese natural language.
Step S14: and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences.
It can be understood that the vectorized feature sentence is encrypted by adopting a preset encryption algorithm, so that a unique feature value corresponding to the vectorized feature sentence can be obtained, namely, the unique feature value is the test data feature corresponding to the original corpus short sentence. It can be understood that, because the vectorization feature sentence corresponds to a unique feature value, the vectorization feature sentence corresponding to each original corpus short sentence can be compared and classified according to the feature value, so that a plurality of original corpus short sentences which are originally different are related to the same test data feature, the difference of different test cases is flattened, and a unique mapping relation can be established between the original corpus short sentences which are in different expression modes and have the same actual semantics and the test data feature, thereby avoiding redundant construction of the test data.
In an embodiment, the predetermined encryption algorithm is a message digest algorithm. The message digest algorithm is an algorithm that sums inputs of arbitrary length to produce a pseudo-random input of fixed length. Generally, whenever an input message is different, the digest message generated after it is digested must also be different; but the same input must produce the same output. This is exactly what the good message digest algorithm has: i.e. the input changes and the output changes. Therefore, the vectorization feature sentences are encrypted through the message digest algorithm to obtain the test data features corresponding to the original corpus short sentences, so that the digests of two similar vectorization feature sentences are not similar and even are quite different, and whether the original different original corpus short sentences describe the same thing can be accurately judged according to the obtained test data features corresponding to the original corpus short sentences.
Specifically, the Message Digest Algorithm may be Message-Digest Algorithm 5 (md5), where the test data feature corresponding to the original corpus is an MD5 value. For any length of data, the length of the calculated MD5 value is fixed, and it is easy to calculate the MD5 value from the original data, and any modification to the original data, even if only 1 byte is modified, the resulting MD5 values are quite different. Therefore, the original corpus short sentences in the test cases can be classified according to the MD5 values corresponding to the obtained original corpus short sentences, so that a plurality of original corpus short sentences which are different originally are related to the same MD5 value, the difference of grinding different test cases is realized, a unique mapping relation can be established between the original corpus short sentences with different expression modes and the same actual semantics and the MD5 value, and redundant construction of test data is avoided.
Referring to fig. 4, fig. 4 is a flow chart of an embodiment of the testing method of the present application. The test method in this embodiment may include the steps of:
Step S41: and extracting test data features from all original corpus short sentences in the test case by using a test data feature extraction method to obtain the test data features corresponding to each original corpus short sentence. The test data feature extraction method is any one of the test data feature extraction methods.
The testing method is applied to the testing device, and the testing device can be a terminal or a server. The terminal can be a smart phone, a tablet personal computer, a computer and the like. The server may be a server or a server cluster consisting of several servers. The user can install the software program to be tested on the test terminal or upload the software program to be tested on the server, and the test terminal or the server adopts the test method provided by the application to test the software program to be tested.
Step S42: and establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics.
Step S43: and testing all original corpus short sentences in the test case according to the classification result.
Specifically, before the software program to be tested is tested, a corresponding test case can be established, however, because the description of the same thing by Chinese natural language has multiple speaking methods, especially the text written by different people has larger difference, a plurality of different original corpus phrases often exist in the test case to describe the same thing, and if the test case is adopted to test the software program to be tested, a result of high repeated labor and low multiplexing can appear. Therefore, the method for extracting the test data features is required to extract all the original corpus short sentences in the test case by using the test data feature extraction method to obtain the test data features corresponding to each original corpus short sentence, and then an index value is established according to the test data features corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data features, thereby testing all the original corpus short sentences in the test case according to classification results and realizing low-repetition labor-height multiplexing.
In the test data feature extraction method, the key word set is formed by extracting the key words of the original corpus short sentences, the key word set is adjusted by adopting the pre-trained word embedding model, the vectorized feature sentences corresponding to the original corpus short sentences are obtained, and then the vectorized feature sentences are encrypted by adopting a preset encryption algorithm, so that the test data features corresponding to the original corpus short sentences are obtained. In addition, the test method of the application uses the test data obtained by the test data characteristic extraction method to carry out software test, thereby realizing low repeated labor and high multiplexing.
In an application scenario, a plurality of original corpus short sentences are obtained, wherein the original corpus short sentences 1 are used for judging whether a user is registered as a high-level user, the original corpus short sentences 2 are used for judging whether a deer accompanying user is registered as a high-level user, and the original corpus short sentences 3 are used for judging whether the user is registered as the high-level user. The method comprises the steps of firstly performing word segmentation on each original corpus short sentence to obtain a corresponding original word set, wherein the result of the word segmentation on the original corpus short sentence 1 is that whether a user is registered by a registered value is the user with the mind, the result of the word segmentation on the original corpus short sentence 2 is that whether a deer accompanying player is registered by the registered value is the user with the mind, and the result of the word segmentation on the original corpus short sentence 3 is that whether the user is registered by the registered value is the user with the mind. And then filtering the original word set to remove the useless words to obtain a keyword set, wherein the result of removing the useless words from the original word set corresponding to the original corpus short sentence 1 is that the user is registered with the mind, the result of removing the useless words from the original word set corresponding to the original corpus short sentence 2 is that the deer accompany and playing the user is registered with the mind, and the result of removing the useless words from the original word set corresponding to the original corpus short sentence 3 is that the user is registered with the mind. Because the 'deer accompanying and playing user' is close to the 'user' in the embedded space, the body word replacement is carried out on the keyword set corresponding to each original corpus short sentence, and the obtained vectorization characteristic sentences have the result that the 'deer accompanying and playing user|registration|magical user'. Since the vectorization feature sentences corresponding to the original corpus short sentences are the same, the vectorization feature sentences are encrypted through the message digest algorithm 5, and the obtained test data features corresponding to the original corpus short sentences are necessarily the same. The three original corpus short sentences are processed to finally obtain the same test data characteristics, the md5 value of the test data characteristics is 'a deer accompanying player|registration|mind user', and an index value is established by using the md5 value of the test data characteristics, so that the three original corpus short sentences with different original languages can be related to the same test data characteristics, and then the classified test data characteristics are subjected to specific test automation script implementation. Therefore, the difference of the structured text test cases written by different people in Chinese natural language can be flattened, so that a plurality of Chinese phrases with different expression modes and the same actual semantics can establish a unique mapping relation with the characteristics of the test data, redundant construction of the test data is avoided, and low-repetition labor-height multiplexing is realized.
Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an apparatus for extracting test data features according to an embodiment of the application. The test data feature extraction device 50 includes: the corpus acquisition module 500 is used for acquiring original corpus short sentences in the test cases; the keyword extraction module 502 is configured to extract keywords of the original corpus short sentence, so as to form a keyword set; the vectorization module 504 is configured to adjust the keyword set by using a pre-trained word embedding model, so as to obtain a vectorized feature sentence corresponding to the original corpus short sentence; and the encryption module 506 is configured to encrypt the vectorized feature sentence by using a preset encryption algorithm, so as to obtain a test data feature corresponding to the original corpus short sentence.
In some embodiments, the keyword extraction module 502 performs a step of extracting keywords of the original corpus short sentence to form a keyword set, including: word segmentation processing is carried out on the original corpus short sentences to obtain an original word set; and filtering the original word set by adopting a preset processing rule to obtain the keyword set.
In some embodiments, the vectorization module 504 performs adjustment on the keyword set using a pre-trained word embedding model to obtain a vectorized feature sentence corresponding to the original corpus short sentence, including: inputting the keyword set into the pre-trained word embedding model to obtain a word vector; and replacing the body word by using an adjacent word nearest to the body word in the embedded space to obtain the vectorization characteristic sentence corresponding to the original corpus short sentence.
Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an embodiment of a testing apparatus according to the present application. The test device 60 includes: the feature extraction module 600 is configured to extract test data features of all original corpus short sentences in the test case by using a test data feature extraction method, so as to obtain test data features corresponding to each original corpus short sentence; the test data feature extraction method is any one of the test data feature extraction methods; the classification module 602 is configured to establish an index value according to the test data feature corresponding to each original corpus short sentence, so as to classify all the original corpus short sentences in the test case according to different test data features; and the testing module 604 is configured to test all the original corpus phrases in the test case according to the classification result by using the testing module 604.
Referring to fig. 7, fig. 7 is a schematic diagram of a frame of an electronic device according to an embodiment of the application. The electronic device 70 comprises a memory 71 and a processor 72 coupled to each other, the processor 72 being adapted to execute program instructions stored in the memory 71 to implement the steps of any one of the test data feature extraction method embodiments described above, or the steps of any one of the test method embodiments described above. In one particular implementation scenario, electronic device 70 may include, but is not limited to: microcomputer, server.
Specifically, the processor 72 is configured to control itself and the memory 71 to implement the steps of any of the test data feature extraction method embodiments described above, or the steps of any of the test method embodiments described above. The processor 72 may also be referred to as a CPU (Central Processing Unit ). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 72 may be commonly implemented by an integrated circuit chip.
Referring to fig. 8, fig. 8 is a schematic diagram illustrating a frame of an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 80 stores program instructions 800 that can be executed by a processor, the program instructions 800 being configured to implement the steps of any of the test data feature extraction method embodiments described above, or the steps of any of the test method embodiments described above.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.
The elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims (8)

1. A method of testing, the method comprising:
extracting test data features from all original corpus short sentences in the test case by using a test data feature extraction method to obtain test data features corresponding to each original corpus short sentence;
establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics;
testing all original corpus short sentences in the test case according to the classification result;
the test data feature extraction method comprises the following steps:
Acquiring original corpus short sentences in a test case;
extracting keywords of the original corpus short sentences to form a keyword set;
adjusting the keyword set by adopting a pre-trained word embedding model to obtain a vectorization characteristic sentence corresponding to the original corpus short sentence;
and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences.
2. The method according to claim 1, wherein the extracting keywords of the original corpus short sentence to form a keyword set includes:
word segmentation processing is carried out on the original corpus short sentences to obtain an original word set;
and filtering the original word set by adopting a preset processing rule to obtain the keyword set.
3. The test method of claim 2, wherein the preset processing rules include at least one of a de-stop word rule, a de-punctuation rule, a de-numericity rule.
4. The method according to claim 1, wherein the adjusting the keyword set by using the pre-trained word embedding model to obtain the vectorized feature sentence corresponding to the original corpus short sentence includes:
inputting the keyword set into the pre-trained word embedding model to obtain a word vector;
And replacing the body word by using an adjacent word nearest to the body word in the embedded space to obtain the vectorization characteristic sentence corresponding to the original corpus short sentence.
5. The test method according to claim 1, wherein,
The test cases are structured test cases; and/or, the preset encryption algorithm is a message digest algorithm.
6. A test device, comprising:
The feature extraction module is used for extracting test data features of all original corpus short sentences in the test case by using a test data feature extraction method to obtain test data features corresponding to each original corpus short sentence; wherein the test data feature extraction method is the test data feature extraction method of any one of claims 1 to 5;
the classification module is used for establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics;
And the testing module is used for testing all the original corpus short sentences in the test case according to the classification result.
7. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the test method of any one of claims 1 to 5.
8. A computer readable storage medium having stored thereon program instructions, which when executed by a processor implement the test method of any of claims 1 to 5.
CN202110292100.6A 2021-03-18 2021-03-18 Test data feature extraction method, test method and related device Active CN113032253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110292100.6A CN113032253B (en) 2021-03-18 2021-03-18 Test data feature extraction method, test method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110292100.6A CN113032253B (en) 2021-03-18 2021-03-18 Test data feature extraction method, test method and related device

Publications (2)

Publication Number Publication Date
CN113032253A CN113032253A (en) 2021-06-25
CN113032253B true CN113032253B (en) 2024-04-19

Family

ID=76471531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110292100.6A Active CN113032253B (en) 2021-03-18 2021-03-18 Test data feature extraction method, test method and related device

Country Status (1)

Country Link
CN (1) CN113032253B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672522B (en) * 2021-10-25 2022-02-08 腾讯科技(深圳)有限公司 Test resource compression method and related equipment
CN114020643B (en) * 2021-11-29 2023-01-20 中国银行股份有限公司 Knowledge base testing method and device
CN116610592B (en) * 2023-07-20 2023-09-19 青岛大学 Customizable software test evaluation method and system based on natural language processing technology

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100005083A1 (en) * 2008-07-01 2010-01-07 Xerox Corporation Frequency based keyword extraction method and system using a statistical measure
CN101859273A (en) * 2009-04-07 2010-10-13 西门子(中国)有限公司 Method and device for generating test cases
CN107085581A (en) * 2016-02-16 2017-08-22 腾讯科技(深圳)有限公司 Short text classification method and device
CN109933779A (en) * 2017-12-18 2019-06-25 苏宁云商集团股份有限公司 User's intension recognizing method and system
CN111026671A (en) * 2019-12-16 2020-04-17 腾讯科技(深圳)有限公司 Test case set construction method and test method based on test case set
CN112163419A (en) * 2020-09-23 2021-01-01 南方电网数字电网研究院有限公司 Text emotion recognition method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100005083A1 (en) * 2008-07-01 2010-01-07 Xerox Corporation Frequency based keyword extraction method and system using a statistical measure
CN101859273A (en) * 2009-04-07 2010-10-13 西门子(中国)有限公司 Method and device for generating test cases
CN107085581A (en) * 2016-02-16 2017-08-22 腾讯科技(深圳)有限公司 Short text classification method and device
CN109933779A (en) * 2017-12-18 2019-06-25 苏宁云商集团股份有限公司 User's intension recognizing method and system
CN111026671A (en) * 2019-12-16 2020-04-17 腾讯科技(深圳)有限公司 Test case set construction method and test method based on test case set
CN112163419A (en) * 2020-09-23 2021-01-01 南方电网数字电网研究院有限公司 Text emotion recognition method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113032253A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN110347835B (en) Text clustering method, electronic device and storage medium
CN113032253B (en) Test data feature extraction method, test method and related device
RU2678716C1 (en) Use of autoencoders for learning text classifiers in natural language
US20150095017A1 (en) System and method for learning word embeddings using neural language models
CN108536868B (en) Data processing method and device for short text data on social network
CN110929125A (en) Search recall method, apparatus, device and storage medium thereof
US10678625B2 (en) Log-based computer system failure signature generation
CN107357895B (en) Text representation processing method based on bag-of-words model
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN113722512A (en) Text retrieval method, device and equipment based on language model and storage medium
US11822589B2 (en) Method and system for performing summarization of text
CN114722837A (en) Multi-turn dialog intention recognition method and device and computer readable storage medium
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN112581297B (en) Information pushing method and device based on artificial intelligence and computer equipment
CN110472031A (en) A kind of regular expression preparation method, device, electronic equipment and storage medium
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN108733733B (en) Biomedical text classification method, system and storage medium based on machine learning
CN116010545A (en) Data processing method, device and equipment
CN111767710B (en) Indonesia emotion classification method, device, equipment and medium
US11042520B2 (en) Computer system
CN111199170B (en) Formula file identification method and device, electronic equipment and storage medium
CN113761206A (en) Intelligent information query method, device, equipment and medium based on intention recognition
CN113742448A (en) Knowledge point generation method and device, electronic equipment and computer readable storage medium
CN107622129B (en) Method and device for organizing knowledge base and computer storage medium
CN111368068A (en) Short text topic modeling method based on part-of-speech feature and semantic enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant