CN110659208A

CN110659208A - Test data set updating method and device

Info

Publication number: CN110659208A
Application number: CN201910873744.7A
Authority: CN
Inventors: 司文雷; 苏少炜; 常乐
Original assignee: Beijing Sound Intelligence Technology Co Ltd
Current assignee: Beijing Sound Intelligence Technology Co Ltd; Beijing SoundAI Technology Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2020-01-07

Abstract

The embodiment of the disclosure discloses a method and a device for updating a test data set, electronic equipment and a computer-readable storage medium. The updating method of the test data set comprises the following steps: acquiring natural language analysis data; screening out a first natural language text with the occurrence frequency larger than a first threshold value and an analysis result of the first natural language text as a first screening result; comparing the first screening result with the data in the test data set to obtain repeated data in the first screening result; deleting the repeated data from the first screening result to obtain a second screening result; in response to receiving a first selection signal, selecting at least part of data from the second screening results to obtain a third screening result; and adding the data in the third screening result into the test data set to obtain an updated test data set. By the method, the technical problem that data in the test data set are inaccurate in the prior art is solved.

Description

Test data set updating method and device

Technical Field

The present disclosure relates to the field of automated testing, and in particular, to a method and an apparatus for updating a test data set, an electronic device, and a computer-readable storage medium.

Background

The human daily life is not open to languages, Natural language is the most direct and simple expression tool, Natural language-Processing (NLP) is a machine language which can be understood by a machine through Processing and converting the language used for human communication, is a model and an algorithm framework for researching language capability, is a cross discipline of linguistics and computer disciplines, is an important branch of artificial intelligence, and occupies more and more important position in the field of data Processing. Modern NLP algorithms are based on machine learning, and typically, when training supervised machine learning models, the data is divided into a training set, a validation set, and a test set.

In the prior art, most of the sources of the test sets are generalized artificially, and some generalized expressions are expanded as test cases to form the test sets on the basis of requirement documents or some common questions. This method is very dependent on the experience of the tester, consumes much manpower, and is different from the query (natural language text, i.e. the original text input by the user) actually used by the user due to the different angle from the user, so that the test set is not primary and secondary, and the most frequently used question method of the user is difficult to cover.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, an embodiment of the present disclosure provides an update method of a test data set, including:

acquiring natural language analysis data, wherein the natural language analysis data comprises a natural language text and an analysis result of the natural language text;

screening out a first natural language text with the occurrence frequency larger than a first threshold value and an analysis result of the first natural language text as a first screening result;

comparing the first screening result with data in a test data set to obtain repeated data in the first screening result, wherein the repeated data is data already existing in the first screening result in the test data set;

deleting the repeated data from the first screening result to obtain a second screening result;

in response to receiving a first selection signal, selecting at least part of data from the second screening results to obtain a third screening result;

and adding the data in the third screening result into the test data set to obtain an updated test data set.

Further, the screening out the first natural language text with the frequency of occurrence greater than a first threshold and the parsing result of the first natural language text as a first screening result includes:

counting the occurrence frequency of the natural language text;

sequencing the natural language texts according to the occurrence frequency to obtain a sequencing result;

acquiring a first natural language text with the occurrence frequency larger than a first threshold value in the sequencing result;

acquiring an analysis result corresponding to the first natural language text from the natural language analysis data;

and taking the first natural language text and an analysis result corresponding to the first natural language text as a first screening result.

Further, after the screening out the first natural language text with the frequency of occurrence greater than the first threshold and the parsing result of the first natural language text as the first screening result, the method further includes:

judging the source types of the first natural language text and the analysis result of the first natural language text;

and storing the first natural language text and the analysis result of the first natural language text into an intermediate file of the test data set corresponding to the source type according to the source type.

Further, the storing the first natural language text and the parsing result of the first natural language text into an intermediate file of the test data set corresponding to the source type according to the source type includes:

in response to the source type being a specific source, storing the first natural language text and an analysis result of the first natural language text in an intermediate file of a specific test data set; if not, then,

and storing the first natural language text and the analysis result of the first natural language text into an intermediate file of a universal test data set.

Further, the comparing the first screening result with the data in the test data set to obtain the repeated data in the first screening result includes:

and comparing the data in the intermediate file corresponding to the source type with the data in the test data set corresponding to the source type to obtain the repeated data in the intermediate file corresponding to the source type.

Further, after the deleting the duplicate data from the first screening result to obtain a second screening result, the method further includes:

and sending the second screening result to a first terminal, wherein the first terminal is used for screening the data in the second screening result.

Further, the selecting at least part of the data from the second screening results in response to receiving the first selection signal to obtain a third screening result includes:

receiving a first selection signal from the first terminal;

acquiring a data mark in the first selection signal;

and selecting the data corresponding to the data mark from the second screening result as a third screening result according to the data mark.

In a second aspect, an embodiment of the present disclosure provides an apparatus for updating a test data set, including:

the natural language analysis data acquisition module is used for acquiring natural language analysis data, wherein the natural language analysis data comprises a natural language text and an analysis result of the natural language text;

the first screening module is used for screening out a first natural language text with the occurrence frequency larger than a first threshold value and an analysis result of the first natural language text as a first screening result;

the comparison module is used for comparing the first screening result with data in the test data set to obtain repeated data in the first screening result, wherein the repeated data is data already existing in the first screening result in the test data set;

the second screening module is used for deleting the repeated data from the first screening result to obtain a second screening result;

the third screening module is used for responding to the received first selection signal and selecting at least partial data from the second screening result to obtain a third screening result;

and the updating module is used for adding the data in the third screening result into the test data set to obtain an updated test data set.

Further, the first screening module further includes:

the frequency counting module is used for counting the occurrence frequency of the natural language text;

the sequencing module is used for sequencing the natural language texts according to the occurrence frequency to obtain a sequencing result;

the first natural language text acquisition module is used for acquiring a first natural language text of which the occurrence frequency is greater than a first threshold value in the sequencing result;

the analysis result acquisition module is used for acquiring an analysis result corresponding to the first natural language text from the natural language analysis data;

and the first screening submodule is used for taking the first natural language text and an analysis result corresponding to the first natural language text as a first screening result.

Further, the device for updating the test data set further includes:

the source type judging module is used for judging the source types of the first natural language text and the analysis result of the first natural language text;

and the classification module is used for storing the first natural language text and the analysis result of the first natural language text into an intermediate file of the test data set corresponding to the source type according to the source type.

Further, the classification module further includes:

the specific type storage module is used for responding to the fact that the source type is a specific source, and storing the first natural language text and the analysis result of the first natural language text into an intermediate file of a specific test data set; if not, then,

and the universal type storage module is used for storing the first natural language text and the analysis result of the first natural language text into an intermediate file of a universal test data set.

Further, the comparing module is further configured to:

Further, the device for updating the test data set further includes:

and the second screening result sending module is used for sending the second screening result to the first terminal, wherein the first terminal is used for screening the data in the second screening result.

Further, the third screening module further includes:

a first selection signal receiving module, configured to receive a first selection signal from the first terminal;

a data flag acquisition module for acquiring a data flag in the first selection signal;

and the third screening submodule is used for selecting the data corresponding to the data mark from the second screening result as a third screening result according to the data mark.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of updating the test data set of any of the preceding first aspects.

In a fourth aspect, the present disclosure provides a non-transitory computer-readable storage medium, which stores computer instructions for causing a computer to execute the method for updating a test data set according to any one of the first aspect.

The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic view of an application scenario of an embodiment of the present disclosure;

FIG. 2 is a flow chart of an embodiment of a method for updating a test data set provided by the present disclosure;

fig. 3 is a flowchart illustrating a specific example of step S202 in an embodiment of a method for updating a test data set provided by the present disclosure;

FIG. 4 is a flow diagram of a further embodiment of an update method of a test data set provided by the present disclosure;

FIG. 5 is a flow diagram of a further embodiment of an update method of a test data set provided by the present disclosure;

fig. 6 is a flowchart illustrating a specific example of step S502 in an embodiment of a method for updating a test data set provided by the present disclosure;

FIG. 7 is a schematic structural diagram of an embodiment of an apparatus for updating a test data set according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a schematic view of an application scenario of the embodiment of the present disclosure. As shown in fig. 1, a user 101 inputs query (natural language text, that is, original text input by the user) to a terminal device 102, where the query may be input of voice, text, and the like capable of representing natural language, the terminal device 102 may be any terminal device capable of receiving the natural language input, such as a smart phone, a smart speaker, a smart home appliance, and the like, the terminal device 102 is connected to a Natural Language Parsing (NLP) device 103 through a network, where the natural language parsing device 103 may be a computer or a smart terminal, and the like; the network on which the terminal apparatus 102 communicates with the natural language parsing device 103 may be a wireless network, such as a 5G network and a wifi network, or may be a wired network, such as an optical fiber network.

It is understood that the natural language parsing device 103 and the terminal apparatus 102 may be disposed together, that is, the terminal apparatus 102 may integrate a natural language parsing function, so that the natural language text input by the user can be parsed in the terminal apparatus 102 to obtain the result.

Fig. 2 is a flowchart of an embodiment of a method for updating a test data set according to an embodiment of the present disclosure, where the method for updating a test data set according to this embodiment may be executed by an updating apparatus for a test data set, where the updating apparatus for a test data set may be implemented as software, or implemented as a combination of software and hardware, and the updating apparatus for a test data set may be integrally disposed in a certain device in an updating system for a test data set, such as an updating server for a test data set or an updating terminal device for a test data set. As shown in fig. 2, the method comprises the steps of:

step S201, acquiring natural language analysis data;

in this step, the natural language parsing data includes a natural language text and a parsing result of the natural language text.

Optionally, in this step, the acquiring natural language parsing data includes: and capturing a semantic understanding recognition result and a corresponding natural language text from the natural language parsing server through the script. Typically, the natural language parsing data is stored in the natural language parsing server in the form of a table, where one table item of the table stores natural language text and another table item stores parsing results of the natural language text. If the natural language text is 'how to place license plates on the head', the analysis result is 'how to install the front license plates', and the pair of data is natural language analysis data.

It can be understood that the parsing result of the natural language text differs according to different natural language parsing models, for example, the natural language parsing model is used for parsing a syntactic structure of the natural language text or a classification of the text, and the parsing results are syntactic structure information and a classification result, which are not described herein again.

Step S202, screening out a first natural language text with the occurrence frequency larger than a first threshold value and an analysis result of the first natural language text as a first screening result;

optionally, the screening, using the first natural language text with the appearance frequency greater than the first threshold and the analysis result of the first natural language text as the first screening result, includes:

step S301, counting the occurrence frequency of the natural language text;

step S302, sequencing the natural language texts according to the occurrence frequency to obtain a sequencing result;

step S303, acquiring a first natural language text with the occurrence frequency larger than a first threshold value in the sequencing result;

step S304, obtaining an analysis result corresponding to the first natural language text from the natural language analysis data;

step S305, using the first natural language text and the parsing result corresponding to the first natural language text as a first screening result.

In step S301, the number of occurrences of each natural language text is counted, and the ratio of the number of occurrences to the total number of natural language texts is calculated as the frequency of occurrence of the natural language text. In step S302, the natural language texts are sorted according to the occurrence frequency obtained in step S301 to obtain a sorting result, which is shown in the following table as an example of a sorting result:

serial number	Natural text language	Frequency of occurrence
			1	Query1	5％
2	Query2	2％
			3	Query3	1％
4	Query4	0.8％
			5	Query5	0.5％
6	Query5	0.1％

In step S303, typically, if the first threshold is 0.1%, all natural language texts whose occurrence frequency is greater than 0.1% in the result are acquired as the first natural language text; in this step, a first occurrence frequency greater than or equal to the first threshold may be searched, where the sequence number corresponding to the occurrence frequency is n, and the natural language text corresponding to the sequence numbers 1-n is the first natural language text. In step S304, an analysis result corresponding to the first natural language text is obtained according to the first natural language text, specifically, the corresponding analysis result may be obtained by searching the table in step S301, and then in step S305, the first natural language text and the analysis result corresponding to the first natural language text are used as a first screening result, where the first screening result is a subset of the natural language analysis data obtained in step S201. It is understood that the first natural language text may include a plurality of parsing results, and the parsing results of the same natural language text may be different under different environments or intentions, and are not described herein again.

Step S203, comparing the first screening result with the data in the test data set to obtain repeated data in the first screening result;

wherein the duplicate data is data already present in the first screening result in the test dataset.

Typically, the test data set includes at least a natural language text and a correct parsing result corresponding to the natural language text. The step is used for comparing and obtaining the first screening result and repeated data in the test data set, wherein the repeated data refers to data with the same result of the natural language text and the natural language text, the same natural language text may have multiple analysis results, and if the natural language text is the same, the repeated data does not form repeated data.

Step S204, deleting the repeated data from the first screening result to obtain a second screening result;

in this step, the repeated data is deleted from the first screening result, and a data set without intersection with the test data set, i.e., a second screening result, is obtained. The purpose of this step is to de-duplicate data to prevent invalid data from being updated in the test dataset.

Step S205, in response to receiving the first selection signal, selecting at least part of data from the second screening results to obtain a third screening result;

optionally, the first selection signal is a selection signal for data in the second screening result received through a human-computer interface, and a typical human-computer interface is a mouse, a keyboard, a touch screen, or the like. In this step, a part or all of the second screened data is selected as a third screening result, where the selection criterion may be the accuracy of the data in the second screening result, and typically, if the parsing result corresponding to the natural language text in the data is incorrect, the first selection signal does not select the data, and selects the correct data to obtain the third screening result; or if the analysis result corresponding to the natural language text in the data is incorrect, the first selection signal selects the data, the data is deleted from the second screening result, the remaining second screening results are all correct data, and the correct data is used as a third screening result.

Optionally, before the step S205, the method further includes: and sending the second screening result to a first terminal, wherein the first terminal is used for screening the data in the second screening result. In this alternative embodiment, said selecting at least part of the data from the second screening results to obtain a third screening result in response to receiving the first selection signal comprises:

step S401, receiving a first selection signal from the first terminal;

step S402, acquiring a data mark in the first selection signal;

step S403, selecting data corresponding to the data flag from the second screening results according to the data flag as a third screening result.

In this alternative embodiment, the first selection signal is sent by a first terminal of a remote terminal, in this embodiment, the second filtering result is sent to a first terminal used by a user, and the user selects data in the second filtering result in the first terminal, and then generates and sends the first selection signal. In step S401, a first selection signal is received from the first terminal, where the first selection signal includes a data identifier of the data in the second filtering result, and the data identifier is an identifier, such as a number, that can uniquely identify the data in the second filtering result. In step S402, the data flag is parsed from the first selection signal, and data corresponding to the data representation is selected from the second screening result as a third screening result according to the data flag. It can be understood that the data flag may also be used as a data flag of data that needs to be deleted from the second screening result, and details are not described herein again.

And step S206, adding the data in the third screening result into the test data set to obtain an updated test data set.

The data in the third screening result are all data that do not exist in the test data set, and are all data that are analyzed correctly, so in this step, the data in the third screening result are added to the test data set to obtain an updated test data set.

Fig. 5 is a flowchart of another embodiment of the method for updating a test data set according to an embodiment of the present disclosure, in this embodiment, in step S202, after the screening out the first natural language text whose occurrence frequency is greater than the first threshold and the parsing result of the first natural language text as the first screening result, the method further includes:

step S501, judging the source types of the first natural language text and the analysis result of the first natural language text;

step S502, storing the first natural language text and the analysis result of the first natural language text into an intermediate file of the test data set corresponding to the source type according to the source type.

In this embodiment, the natural language text and the parsing result of the natural language text both have their source types, and if the natural language text is input and parsed through a smart speaker of a special type, the source type is a smart speaker, and if the natural language text is from another source, it can be considered as a source of a general type. Or its source type may be categorized according to specific application scenarios or functions, such as music, play control, weather, audio books, etc. In this embodiment, the natural language text itself carries a flag to mark its source, typical flags are, for example, a device ID, a user ID, a function ID, and the like, and when the user 101 inputs the natural language text into the terminal device 102, the terminal device 102 may add the ID of the terminal device 102, or the ID of the current user 101 or the ID of a function operated in the terminal device 102, and the like, to the natural language text as needed, so as to classify the natural language text in a subsequent step.

In this embodiment, different test data sets are set corresponding to different source types, so in step S502, the first natural language text and the parsing result of the first natural language text are stored in an intermediate file of the test data set corresponding to the source type according to the source type, the intermediate file is a cache file used for caching an intermediate screening result, and different test data sets correspond to different intermediate files, optionally, step S502 includes:

step S601, responding to the source type as a specific source, storing the first natural language text and the analysis result of the first natural language text into an intermediate file of a specific test data set; in step S602, if not, the step,

The specific test data set is typically the test data set of the specific smart sound box, music, play control, weather, and audio book described above, and the general test data set is not limited to the test data set in the specific field.

In this embodiment, in step S203, the comparing the first screening result with the data in the test data set to obtain the duplicate data in the first screening result includes:

In this step, since the first screening result is divided into a plurality of parts according to the source type of the first screening result, the plurality of parts are respectively compared with the corresponding test data sets to obtain the repeated data of the first screening result in each test data set.

In this embodiment, the steps after step S203 are to perform step S204 to step S206 for each test data set to update each test data set, which is not described herein again. The data test set is subdivided in the embodiment, and each type of test data set is updated through the data source, so that the data in the test data set is more targeted, and the test accuracy is higher.

In the above, although the steps in the above method embodiments are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, other steps may also be added by those skilled in the art, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.

Fig. 7 is a schematic structural diagram of an embodiment of an apparatus for updating a test data set according to an embodiment of the present disclosure, and as shown in fig. 7, the apparatus 700 includes: the system comprises a natural language analysis data acquisition module 701, a first screening module 702, a comparison module 703, a second screening module 704, a third screening module 705 and an updating module 706. Wherein the content of the first and second substances,

a natural language parsing data obtaining module 701, configured to obtain natural language parsing data, where the natural language parsing data includes a natural language text and a parsing result of the natural language text;

a first screening module 702, configured to screen out a first natural language text with an occurrence frequency greater than a first threshold and an analysis result of the first natural language text as a first screening result;

a comparing module 703, configured to compare the first screening result with data in the test data set to obtain duplicate data in the first screening result, where the duplicate data is data already existing in the first screening result in the test data set;

a second screening module 704, configured to delete the duplicate data from the first screening result to obtain a second screening result;

a third filtering module 705, configured to select at least part of the data from the second filtering results to obtain a third filtering result in response to receiving the first selection signal;

and an updating module 706, configured to add the data in the third screening result to the test data set to obtain an updated test data set.

Further, the first filtering module 702 further includes:

Further, the apparatus 700 for updating the test data set further includes:

Further, the classification module further includes:

Further, the comparing module 703 is further configured to:

Further, the apparatus 700 for updating the test data set further includes:

Further, the third filtering module 705 further includes:

The apparatus shown in fig. 7 can perform the method of the embodiment shown in fig. 1-6, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-6. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to 6, and are not described herein again.

Referring now to FIG. 8, shown is a schematic diagram of an electronic device 800 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 806 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 806 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 806, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring natural language analysis data; screening out a first natural language text with the occurrence frequency larger than a first threshold value and an analysis result of the first natural language text as a first screening result; comparing the first screening result with the data in the test data set to obtain repeated data in the first screening result; deleting the repeated data from the first screening result to obtain a second screening result; in response to receiving a first selection signal, selecting at least part of data from the second screening results to obtain a third screening result; and adding the data in the third screening result into the test data set to obtain an updated test data set.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of updating a test data set, comprising:

2. The method for updating a test data set according to claim 1, wherein the screening out, as a first screening result, a first natural language text having an occurrence frequency greater than a first threshold and a parsing result of the first natural language text includes:

counting the occurrence frequency of the natural language text;

3. The method for updating a test data set according to claim 1, wherein after the screening out the first natural language text having the frequency of occurrence greater than the first threshold value and the parsing result of the first natural language text as the first screening result, further comprising:

4. The method for updating the test data set according to claim 3, wherein the storing the first natural language text and the parsing result of the first natural language text into the intermediate file of the test data set corresponding to the source type according to the source type comprises:

5. The method for updating a test data set according to claim 3, wherein the comparing the first screening result with the data in the test data set to obtain duplicate data in the first screening result comprises:

6. The method for updating a test data set according to claim 1, wherein after said deleting the duplicate data from the first screening result to obtain a second screening result, further comprising:

7. The method of updating a test data set according to claim 6, wherein said selecting at least part of the data from the second screening results in response to receiving a first selection signal results in a third screening result comprising:

receiving a first selection signal from the first terminal;

acquiring a data mark in the first selection signal;

8. An apparatus for updating a test data set, comprising:

the comparison module is used for comparing the first screening result with data in a test data set to obtain repeated data in the first screening result, wherein the repeated data is data already existing in the first screening result in the test data set;

9. An electronic device, comprising:

a memory for storing computer readable instructions; and

a processor for executing the computer readable instructions such that the processor when running implements the method of updating a test data set according to any one of claims 1-7.

10. A non-transitory computer-readable storage medium storing computer-readable instructions which, when executed by a computer, cause the computer to perform the method of updating a test data set of any one of claims 1-7.