CN113032253A - Test data feature extraction method, test method and related device - Google Patents

Test data feature extraction method, test method and related device Download PDF

Info

Publication number
CN113032253A
CN113032253A CN202110292100.6A CN202110292100A CN113032253A CN 113032253 A CN113032253 A CN 113032253A CN 202110292100 A CN202110292100 A CN 202110292100A CN 113032253 A CN113032253 A CN 113032253A
Authority
CN
China
Prior art keywords
test data
original corpus
test
original
sentences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110292100.6A
Other languages
Chinese (zh)
Other versions
CN113032253B (en
Inventor
陈振坤
张伟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN202110292100.6A priority Critical patent/CN113032253B/en
Priority claimed from CN202110292100.6A external-priority patent/CN113032253B/en
Publication of CN113032253A publication Critical patent/CN113032253A/en
Application granted granted Critical
Publication of CN113032253B publication Critical patent/CN113032253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/368Test management for test version control, e.g. updating test cases to a new software version
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases

Abstract

The application discloses a test data feature extraction method, a test method and a related device, wherein the test data feature extraction method comprises the following steps: acquiring an original corpus short sentence in a test case; extracting key words of the original corpus short sentences to form a key word set; adjusting the keyword set by adopting a pre-trained word embedding model to obtain vectorization characteristic sentences corresponding to the original corpus short sentences; and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences. By the scheme, redundant construction of test data can be avoided.

Description

Test data feature extraction method, test method and related device
Technical Field
The present application relates to the field of software testing technologies, and in particular, to a test data feature extraction method, a test method, and a related apparatus.
Background
Because the Chinese natural language has multiple descriptions of the same things, especially the difference of texts written by different people is larger, the difference occurs when different people use the Chinese natural language to write the structured test case, so that the redundant structure of test data is caused, and the results of excessive repeated labor and low reuse are formed.
The conventional implementation scheme can use a sentence vector mode, a large amount of Chinese text sets as training materials, a data model is obtained through unsupervised learning, short sentences in a structured test case are clustered by using the model, and the similar short sentences are associated and mapped with a unique test data ID. However, this solution presents two significant problems: firstly, a large amount of Chinese text sets are used as training materials, but a large amount of structured test case texts are generally difficult to provide for new products; secondly, the association is performed in a clustering mode, so that the accuracy is low and is generally lower than 50%. In view of this, how to provide a method for extracting features of test data with high accuracy and without requiring a large amount of training materials has become a topic with great research value.
Disclosure of Invention
The technical problem mainly solved by the application is to provide a test data feature extraction method, a test method and a related device, which can avoid redundant construction of test data.
In order to solve the above problem, a first aspect of the present application provides a test data feature extraction method, where the extraction method includes: acquiring an original corpus short sentence in a test case; extracting key words of the original corpus short sentences to form a key word set; adjusting the keyword set by adopting a pre-trained word embedding model to obtain vectorization characteristic sentences corresponding to the original corpus short sentences; and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences.
In order to solve the above problem, a second aspect of the present application provides a test method, including: extracting the test data characteristics of all original corpus phrases in the test case by using a test data characteristic extraction method to obtain the test data characteristics corresponding to each original corpus phrase; establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics; testing all original corpus phrases in the test case according to the classification result; wherein the test data feature extraction method is the test data feature extraction method of the first aspect.
In order to solve the above problem, a third aspect of the present application provides an apparatus for extracting test data features, including: the corpus acquiring module is used for acquiring original corpus short sentences in the test cases; the keyword extraction module is used for extracting keywords of the original corpus short sentences to form a keyword set; the vectorization module is used for adjusting the keyword set by adopting a pre-trained word embedding model to obtain vectorization feature sentences corresponding to the original corpus short sentences; and the encryption module is used for encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus phrases.
In order to solve the above problem, a fourth aspect of the present application provides a test apparatus comprising: the feature extraction module is used for extracting the test data features of all the original corpus phrases in the test case by using a test data feature extraction method to obtain the test data features corresponding to each original corpus phrase; the classification module is used for establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics; the test module is used for testing all original corpus phrases in the test case according to the classification result; wherein the test data feature extraction method is the test data feature extraction method of the first aspect.
In order to solve the above problem, a fifth aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the test data feature extraction method of the first aspect or the test method of the second aspect.
In order to solve the above-mentioned problems, a sixth aspect of the present application provides a computer-readable storage medium on which program instructions are stored, which program instructions, when executed by a processor, implement the test data feature extraction method of the above-mentioned first aspect, or the test method of the above-mentioned second aspect.
The invention has the beneficial effects that: different from the situation of the prior art, in the test data feature extraction method of the application, a keyword set is formed by extracting keywords of original corpus phrases, the keyword set is adjusted by adopting a pre-trained word embedding model, vectorization feature sentences corresponding to the original corpus phrases are obtained, then the vectorization feature sentences corresponding to the original corpus phrases are encrypted by adopting a preset encryption algorithm, so that test data features corresponding to the original corpus phrases are obtained, so that the original corpus phrases in test cases can be classified according to the test data features corresponding to all the original corpus phrases, a plurality of original corpus phrases in different test cases are associated to the same test data feature, the differences of the test cases with different ground levels are realized, and the original corpus phrases with different expression modes but the same actual semantics can establish a unique mapping relation with the test data features, redundant construction of test data is avoided. In addition, according to the test method, the test data obtained by the test data feature extraction method is used for software test, so that low-repeated labor and high-reuse can be realized.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a test data feature extraction method according to the present application;
FIG. 2 is a flowchart illustrating an embodiment of step S12 in FIG. 1;
FIG. 3 is a flowchart illustrating an embodiment of step S13 in FIG. 1;
FIG. 4 is a schematic flow chart diagram of an embodiment of a test method of the present application;
FIG. 5 is a block diagram of an embodiment of an apparatus for extracting test data features according to the present application;
FIG. 6 is a block diagram of an embodiment of the test apparatus of the present application;
FIG. 7 is a block diagram of an embodiment of an electronic device of the present application;
FIG. 8 is a block diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an embodiment of a test data feature extraction method according to the present application. Specifically, the method for extracting test data features in this embodiment may include the following steps:
step S11: and acquiring an original corpus short sentence in the test case.
Software testing usually carefully selects a batch of test data to form test cases according to specifications of each stage of software development and an internal structure of a program, uses the test cases to drive a tested program, observes an execution result of the program, verifies whether an obtained result is consistent with an expected result, and then makes corresponding adjustment. In one embodiment, the test cases may be structured test cases, which are semi-formal, Chinese natural language written test case documents written in accordance with UML (unified modeling language or Standard modeling language) and BNF (Backus-normal) constraints. In order to ensure the coverage rate of software testing, test cases generally contain large-scale corpus data which is scientifically sampled and processed, the corpus data can be collected in the process of program testing, and as Chinese natural language has multiple descriptions of the same things, especially the difference of texts written by different people is larger, the corpus data are mostly repetitive descriptions, so that an original corpus short sentence in an initial test case needs to be obtained to extract the characteristics of the initial test case, and the redundant structure of the test data is avoided.
Step S12: and extracting the key words of the original corpus short sentences to form a key word set.
In the process of natural language processing, there are many expressions for describing the same things, and the reason is that the linguistic data contains many modifying words, which have little meaning for describing things in most cases, and the words that play a main role in describing things are the keywords of the original linguistic data short sentence. By extracting the key words of the original corpus clause, a key word set can be formed, and the formed key word set can reflect the things to be described by the original corpus clause.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of step S12 in fig. 1. In an embodiment, the step S12 may specifically include:
step S121: and performing word segmentation processing on the original corpus short sentences to obtain an original word set.
Specifically, the original corpus clause includes a requirement for testing a target program, and the test target information is subjected to word segmentation processing to obtain a plurality of words contained in the test target information, that is, an original word set formed by the plurality of words obtained after the word segmentation processing can be obtained. When the original corpus phrase is segmented, a jieba algorithm, a SnowNLP algorithm, a THULAC algorithm, or the like may be used, or other algorithms having a segmentation function may be used to segment the original corpus phrase into a single word.
Step S122: and filtering the original word set by adopting a preset processing rule to obtain the keyword set.
It can be understood that after the original corpus clause is participled to obtain an original word set, the original word set may include words without specific meanings for representing mood, decoration, etc., the words without specific meanings in the original word set are filtered out by using a preset processing rule, and the remaining words can reflect objects to be described by the original corpus clause, so as to obtain a corresponding keyword set.
Specifically, the preset processing rule includes at least one of a stop word removing rule, a punctuation removing rule and a number removing rule. And after word segmentation is finished, filtering the obtained original word set, wherein a stop word removing rule refers to establishing a stop word bank to filter the original word set so as to filter out the segmented words existing in the original word set in the stop word bank, a punctuation removing rule refers to filtering out punctuation marks, suffix marks and the like in the original word set, and a digit removing rule refers to filtering out digits in the original word set. The accuracy of the test data feature extraction can be improved by filtering useless words which do not have actual meanings in the original word set.
Step S13: and adjusting the keyword set by adopting a pre-trained word embedding model to obtain a vectorization characteristic sentence corresponding to the original corpus short sentence.
The word embedding model can be trained to vectorize all words, so that the relation between words can be quantitatively measured, and the relation between words is mined, therefore, word vectors output by the word embedding model can be used for performing a plurality of tasks related to natural language processing, such as clustering, synonym finding, part of speech analysis and the like. In the application, a pre-trained word embedding model is adopted to adjust the keyword set, so that vectorization characteristic sentences corresponding to the original corpus short sentences can be obtained; specifically, a pre-trained word embedding model is adopted to carry out vectorization processing on the keywords, and multi-dimensional vector data, namely word vectors of the keywords, can be obtained; and according to the word characteristics of the keywords in the original corpus short sentences, vectorizing each keyword in the keyword set to obtain a word vector set, and according to the word vector set, converting the original corpus short sentences into vectorized characteristic sentences.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an embodiment of step S13 in fig. 1. In an embodiment, the step S13 may specifically include:
step S131: and inputting the keyword set into the pre-trained word embedding model to obtain a word vector.
Step S132: and replacing the ontology word by using the adjacent word which is closest to the ontology word in the embedding space to obtain the vectorization feature sentence corresponding to the original corpus short sentence.
And performing vectorization processing on each keyword to obtain a word vector of each keyword, so as to obtain a word vector set formed by the word vectors corresponding to each original corpus short sentence. The word vector is output through the word embedding model, the similarity between the closest words in the embedding space is highest, therefore, the ontology word is replaced by the adjacent word which is closest to the ontology word in the embedding space, the vectorization feature sentence corresponding to the original corpus short sentence is obtained, and the difference of the structured text test cases written by different people through the Chinese natural language can be flattened.
Step S14: and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences.
It can be understood that, by encrypting the vectorization feature statements by using a preset encryption algorithm, a unique feature value corresponding to the vectorization feature statements can be obtained, that is, the test data feature corresponding to the original corpus phrase. It can be understood that the vectorization feature sentences have unique feature values, so that the vectorization feature sentences corresponding to the original corpus phrases can be compared and classified according to the feature values, a plurality of original corpus phrases which are originally different are associated to the same test data feature, the differences of different test cases can be flattened, the original corpus phrases which have different expression modes but have the same actual semantics can establish unique mapping relations with the test data feature, and the redundant structure of the test data is avoided.
In an embodiment, the predetermined encryption algorithm is a message digest algorithm. The message digest algorithm is an algorithm that combines inputs of arbitrary length to produce a pseudorandom input of fixed length. Generally, as long as the input messages are different, the summary messages generated after the input messages are summarized are also different; but the same input must produce the same output. This is exactly the property that a good message digest algorithm has: i.e. the input changes and the output changes. Therefore, the vectorization feature sentences are encrypted through the message digest algorithm to obtain the test data features corresponding to the original corpus short sentences, so that the digests of the two similar vectorization feature sentences are not similar and even are greatly different, and whether a plurality of originally different original corpus short sentences describe the same thing can be accurately judged according to the obtained test data features corresponding to the original corpus short sentences.
Specifically, the Message Digest Algorithm may be Message Digest Algorithm 5 (MD 5), where the test data feature corresponding to the original corpus phrase is an MD5 value. For data of any length, the length of the calculated MD5 value is fixed, and the MD5 value can be easily calculated from the original data, and the obtained MD5 value is very different if the original data is modified by only 1 byte. Therefore, the original corpus phrases in the test cases can be classified according to the obtained MD5 values corresponding to the original corpus phrases, so that a plurality of originally different original corpus phrases are associated to the same MD5 value, the difference of different test cases is flattened, a unique mapping relation can be established between a plurality of original corpus phrases with different expression modes but the same actual semantics and the MD5 values, and the redundant structure of test data is avoided.
Referring to fig. 4, fig. 4 is a schematic flow chart of an embodiment of a testing method of the present application. The test method in this embodiment may include the following steps:
step S41: and extracting the test data characteristics of all the original corpus phrases in the test case by using a test data characteristic extraction method to obtain the test data characteristics corresponding to each original corpus phrase. The test data feature extraction method is any one of the above test data feature extraction methods.
The test method is applied to a test device, and the test device can be a terminal or a server. The terminal can be a smart phone, a tablet computer, a computer and the like. The server may be a single server or a server cluster consisting of several servers. The user can install the software program to be tested in the testing terminal or upload the software program to be tested to the server, and the testing terminal or the server tests the software program to be tested by adopting the testing method provided by the application.
Step S42: and establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics.
Step S43: and testing all the original corpus phrases in the test case according to the classification result.
Specifically, before testing a software program to be tested, a corresponding test case can be established, however, since the chinese natural language has multiple descriptions of the same things, especially the difference of texts written by different people is larger, a plurality of different original corpus phrases often exist in the test case to describe the same things, and if the test case is adopted to test the software program to be tested, a result of high repetition labor and low reuse can occur. Therefore, it is necessary to extract the test data features of all the original corpus phrases in the test case by using a test data feature extraction method to obtain the test data features corresponding to each of the original corpus phrases, and then establish an index value according to the test data features corresponding to each of the original corpus phrases to classify all the original corpus phrases in the test case according to different test data features, so that all the original corpus phrases in the test case can be tested according to the classification results, thereby achieving low-repetition labor and high-reuse.
In the test data feature extraction method, a keyword set is formed by extracting keywords of an original corpus phrase, and adjusting the keyword set by adopting a pre-trained word embedding model to obtain vectorization characteristic sentences corresponding to the original corpus short sentences, then, a preset encryption algorithm is adopted to encrypt the quantitative characteristic sentences, so that test data characteristics corresponding to the original corpus short sentences are obtained, the original corpus phrases in the test cases can be classified according to the test data characteristics corresponding to all the original corpus phrases, so that a plurality of original corpus phrases which are originally different are associated to the same test data characteristics, the difference of different test cases is flattened, the method can establish a unique mapping relation between a plurality of original corpus phrases with different expression modes but same actual semantics and the test data characteristics, and avoid redundant construction of the test data. In addition, according to the test method, the test data obtained by the test data feature extraction method is used for software test, so that low-repeated labor and high-reuse can be realized.
In an application scene, a plurality of original corpus phrases are obtained, wherein the original corpus phrases 1 are used as 'whether a user is registered as a god user', the original corpus phrases 2 are used as 'whether a deer accompany a playing user is registered as a god user', and the original corpus phrases 3 are used as 'whether a user is registered as a god user'. Firstly, performing word segmentation processing on each original corpus phrase to obtain a corresponding original word set, wherein the result of performing word segmentation processing on the original corpus phrase 1 is 'whether a user is | registered | is | god user', the result of performing word segmentation processing on the original corpus phrase 2 is 'whether a fawn accompanies the user | registered | is | god user', and the result of performing word segmentation processing on the original corpus phrase 3 is 'whether the user | is | registered | is | god user'. And then filtering the original word set to remove useless words to obtain a keyword set, wherein the result of removing the useless words from the original word set corresponding to the original corpus phrase 1 is a 'user | registered | god user', the result of removing the useless words from the original word set corresponding to the original corpus phrase 2 is a 'fawn accompanying user | registered | god user', and the result of removing the useless words from the original word set corresponding to the original corpus phrase 3 is a 'user | registered | god user'. Because the 'fawn playing user' is close to the 'user' in the embedding space, ontology word replacement is carried out on the keyword set corresponding to each original corpus phrase, and the obtained results of vectorization feature sentences are 'fawn playing user | registered | goddess user'. Because the vectorization feature sentences corresponding to the original corpus phrases are the same, the vectorization feature sentences are encrypted through the message digest algorithm 5, and the obtained test data features corresponding to the original corpus phrases are necessarily the same. The three original corpus phrases are processed to finally obtain the same test data characteristics, the md5 value of the test data characteristics is 'deer accompanying and playing user | registered | god user', an index value is established by the md5 value of the test data characteristics, so that the three original corpus phrases with different descriptions can be associated to the same test data characteristics, and then specific test automation script implementation is carried out on the classified test data characteristics. Therefore, the difference of the structured text test cases written by different people by using the Chinese natural language can be leveled, so that a plurality of Chinese phrases with different expression modes but the same actual semantics can establish a unique mapping relation with the test data characteristics, the redundant structure of the test data is avoided, and the low-repetition labor and high-reuse are realized.
Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of an apparatus for extracting test data features according to the present application. The test data feature extraction device 50 includes: the corpus acquiring module 500, where the corpus acquiring module 500 is used to acquire an original corpus short sentence in a test case; a keyword extraction module 502, wherein the keyword extraction module 502 is configured to extract keywords of the original corpus short sentences to form a keyword set; a vectorization module 504, where the vectorization module 504 is configured to adjust the keyword set by using a pre-trained word embedding model, so as to obtain a vectorized feature sentence corresponding to the original corpus short sentence; an encryption module 506, where the encryption module 506 is configured to encrypt the vectorization feature sentence by using a preset encryption algorithm, so as to obtain a test data feature corresponding to the original corpus phrase.
In some embodiments, the keyword extraction module 502 performs the step of extracting the keywords of the original corpus clause to form a keyword set, including: performing word segmentation processing on the original corpus short sentences to obtain an original word set; and filtering the original word set by adopting a preset processing rule to obtain the keyword set.
In some embodiments, the vectorization module 504 performs adjustment on the keyword set by using a pre-trained word embedding model, to obtain a vectorized feature sentence corresponding to the original corpus phrase, including: inputting the keyword set into the pre-trained word embedding model to obtain a word vector; and replacing the ontology word by using the adjacent word which is closest to the ontology word in the embedding space to obtain the vectorization feature sentence corresponding to the original corpus short sentence.
Referring to fig. 6, fig. 6 is a schematic diagram of a testing apparatus according to an embodiment of the present application. The test apparatus 60 includes: a feature extraction module 600, where the feature extraction module 600 is configured to extract test data features of all original corpus phrases in a test case by using a test data feature extraction method, so as to obtain a test data feature corresponding to each original corpus phrase; the test data feature extraction method is any one of the test data feature extraction methods; a classification module 602, where the classification module 602 is configured to establish an index value according to a test data feature corresponding to each original corpus phrase, so as to classify all the original corpus phrases in the test case according to different test data features; a testing module 604, where the testing module 604 is configured to test all original corpus phrases in the test case according to the classification result.
Referring to fig. 7, fig. 7 is a schematic diagram of a frame of an electronic device according to an embodiment of the present application. The electronic device 70 comprises a memory 71 and a processor 72 coupled to each other, and the processor 72 is configured to execute program instructions stored in the memory 71 to implement the steps of any of the above-mentioned test data feature extraction method embodiments, or the steps of any of the above-mentioned test method embodiments. In one particular implementation scenario, the electronic device 70 may include, but is not limited to: microcomputer, server.
In particular, the processor 72 is configured to control itself and the memory 71 to implement the steps of any of the above-described test data feature extraction method embodiments, or the steps of any of the above-described test method embodiments. The processor 72 may also be referred to as a CPU (Central Processing Unit). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 72 may be collectively implemented by an integrated circuit chip.
Referring to fig. 8, fig. 8 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 80 stores program instructions 800 executable by the processor, the program instructions 800 for implementing the steps of any of the above-described test data feature extraction method embodiments, or the steps of any of the above-described test method embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (10)

1. A test data feature extraction method is characterized by comprising the following steps:
acquiring an original corpus short sentence in a test case;
extracting key words of the original corpus short sentences to form a key word set;
adjusting the keyword set by adopting a pre-trained word embedding model to obtain vectorization characteristic sentences corresponding to the original corpus short sentences;
and encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus short sentences.
2. The method according to claim 1, wherein the extracting the keywords of the original corpus phrase to form a keyword set comprises:
performing word segmentation processing on the original corpus short sentences to obtain an original word set;
and filtering the original word set by adopting a preset processing rule to obtain the keyword set.
3. The method of claim 2, wherein the predetermined processing rules include at least one of a stop word rule, a punctuation rule, and a digit removal rule.
4. The method of claim 1, wherein the adjusting the keyword set by using a pre-trained word embedding model to obtain a vectorized feature sentence corresponding to the original corpus phrase comprises:
inputting the keyword set into the pre-trained word embedding model to obtain a word vector;
and replacing the ontology word by using the adjacent word which is closest to the ontology word in the embedding space to obtain the vectorization feature sentence corresponding to the original corpus short sentence.
5. The test data feature extraction method of claim 1,
the test case is a structured test case; and/or the preset encryption algorithm is a message digest algorithm.
6. A method of testing, the method comprising:
extracting the test data characteristics of all original corpus phrases in the test case by using a test data characteristic extraction method to obtain the test data characteristics corresponding to each original corpus phrase;
establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics;
testing all original corpus phrases in the test case according to the classification result;
wherein the test data feature extraction method is the test data feature extraction method according to any one of claims 1 to 5.
7. An apparatus for extracting test data features, comprising:
the corpus acquiring module is used for acquiring original corpus short sentences in the test cases;
the keyword extraction module is used for extracting keywords of the original corpus short sentences to form a keyword set;
the vectorization module is used for adjusting the keyword set by adopting a pre-trained word embedding model to obtain vectorization feature sentences corresponding to the original corpus short sentences;
and the encryption module is used for encrypting the vectorization feature sentences by adopting a preset encryption algorithm to obtain the test data features corresponding to the original corpus phrases.
8. A test apparatus, comprising:
the feature extraction module is used for extracting the test data features of all the original corpus phrases in the test case by using a test data feature extraction method to obtain the test data features corresponding to each original corpus phrase;
the classification module is used for establishing an index value according to the test data characteristics corresponding to each original corpus short sentence so as to classify all the original corpus short sentences in the test case according to different test data characteristics;
the test module is used for testing all original corpus phrases in the test case according to the classification result;
wherein the test data feature extraction method is the test data feature extraction method according to any one of claims 1 to 5.
9. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the test data feature extraction method of any one of claims 1 to 5, or the test method of claim 6.
10. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the test data feature extraction method of any one of claims 1 to 5, or the test method of claim 6.
CN202110292100.6A 2021-03-18 Test data feature extraction method, test method and related device Active CN113032253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110292100.6A CN113032253B (en) 2021-03-18 Test data feature extraction method, test method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110292100.6A CN113032253B (en) 2021-03-18 Test data feature extraction method, test method and related device

Publications (2)

Publication Number Publication Date
CN113032253A true CN113032253A (en) 2021-06-25
CN113032253B CN113032253B (en) 2024-04-19

Family

ID=

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672522A (en) * 2021-10-25 2021-11-19 腾讯科技(深圳)有限公司 Test resource compression method and related equipment
CN114020643A (en) * 2021-11-29 2022-02-08 中国银行股份有限公司 Knowledge base testing method and device
CN116610592A (en) * 2023-07-20 2023-08-18 青岛大学 Customizable software test evaluation method and system based on natural language processing technology

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100005083A1 (en) * 2008-07-01 2010-01-07 Xerox Corporation Frequency based keyword extraction method and system using a statistical measure
CN101859273A (en) * 2009-04-07 2010-10-13 西门子(中国)有限公司 Method and device for generating test cases
CN107085581A (en) * 2016-02-16 2017-08-22 腾讯科技(深圳)有限公司 Short text classification method and device
CN109933779A (en) * 2017-12-18 2019-06-25 苏宁云商集团股份有限公司 User's intension recognizing method and system
CN111026671A (en) * 2019-12-16 2020-04-17 腾讯科技(深圳)有限公司 Test case set construction method and test method based on test case set
CN112163419A (en) * 2020-09-23 2021-01-01 南方电网数字电网研究院有限公司 Text emotion recognition method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100005083A1 (en) * 2008-07-01 2010-01-07 Xerox Corporation Frequency based keyword extraction method and system using a statistical measure
CN101859273A (en) * 2009-04-07 2010-10-13 西门子(中国)有限公司 Method and device for generating test cases
CN107085581A (en) * 2016-02-16 2017-08-22 腾讯科技(深圳)有限公司 Short text classification method and device
CN109933779A (en) * 2017-12-18 2019-06-25 苏宁云商集团股份有限公司 User's intension recognizing method and system
CN111026671A (en) * 2019-12-16 2020-04-17 腾讯科技(深圳)有限公司 Test case set construction method and test method based on test case set
CN112163419A (en) * 2020-09-23 2021-01-01 南方电网数字电网研究院有限公司 Text emotion recognition method and device, computer equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672522A (en) * 2021-10-25 2021-11-19 腾讯科技(深圳)有限公司 Test resource compression method and related equipment
CN114020643A (en) * 2021-11-29 2022-02-08 中国银行股份有限公司 Knowledge base testing method and device
CN114020643B (en) * 2021-11-29 2023-01-20 中国银行股份有限公司 Knowledge base testing method and device
CN116610592A (en) * 2023-07-20 2023-08-18 青岛大学 Customizable software test evaluation method and system based on natural language processing technology
CN116610592B (en) * 2023-07-20 2023-09-19 青岛大学 Customizable software test evaluation method and system based on natural language processing technology

Similar Documents

Publication Publication Date Title
CN110347835B (en) Text clustering method, electronic device and storage medium
US20200081899A1 (en) Automated database schema matching
EP2092419B1 (en) Method and system for high performance data metatagging and data indexing using coprocessors
US9323794B2 (en) Method and system for high performance pattern indexing
CN109299280B (en) Short text clustering analysis method and device and terminal equipment
CN107357895B (en) Text representation processing method based on bag-of-words model
CN111177367B (en) Case classification method, classification model training method and related products
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN111177375A (en) Electronic document classification method and device
CN111126067B (en) Entity relationship extraction method and device
CN113722512A (en) Text retrieval method, device and equipment based on language model and storage medium
CN111382243A (en) Text category matching method, text category matching device and terminal
CN114896141A (en) Test case duplication removing method, device, equipment and computer readable storage medium
CN113032253B (en) Test data feature extraction method, test method and related device
CN113032253A (en) Test data feature extraction method, test method and related device
CN111199170B (en) Formula file identification method and device, electronic equipment and storage medium
JP7099254B2 (en) Learning methods, learning programs and learning devices
CN107622129B (en) Method and device for organizing knowledge base and computer storage medium
CN110472031A (en) A kind of regular expression preparation method, device, electronic equipment and storage medium
Cai et al. Extracting phrases as software features from overlapping sentence clusters in product descriptions
CN117235137B (en) Professional information query method and device based on vector database
CN113868438B (en) Information reliability calibration method and device, computer equipment and storage medium
CN117272123B (en) Sensitive data processing method and device based on large model and storage medium
JP7352249B1 (en) Information processing device, information processing system, and information processing method
CN114091456B (en) Intelligent positioning method and system for quotation contents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant