CN108959412B - Method, device and equipment for generating labeled data and storage medium - Google Patents

Method, device and equipment for generating labeled data and storage medium Download PDF

Info

Publication number
CN108959412B
CN108959412B CN201810580489.2A CN201810580489A CN108959412B CN 108959412 B CN108959412 B CN 108959412B CN 201810580489 A CN201810580489 A CN 201810580489A CN 108959412 B CN108959412 B CN 108959412B
Authority
CN
China
Prior art keywords
sample
data
demand
condition information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810580489.2A
Other languages
Chinese (zh)
Other versions
CN108959412A (en
Inventor
王晓雪
吴世伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mobvoi Information Technology Co Ltd
Original Assignee
Mobvoi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mobvoi Information Technology Co Ltd filed Critical Mobvoi Information Technology Co Ltd
Priority to CN201810580489.2A priority Critical patent/CN108959412B/en
Publication of CN108959412A publication Critical patent/CN108959412A/en
Application granted granted Critical
Publication of CN108959412B publication Critical patent/CN108959412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for generating labeled data, wherein the method comprises the following steps: acquiring sample condition information which is provided by a data demander and matched with a demand sample; wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample; providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information; performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample; and according to the target labeling sample and the sample condition information, structured labeling data are constructed, so that the required data of the multi-round interactive system can be efficiently acquired, the data acquisition process is simplified, and the labor cost is reduced.

Description

Method, device and equipment for generating labeled data and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a method, a device, equipment and a storage medium for generating labeled data.
Background
The multi-round interactive system is applied more and more widely in the field of the existing intelligent electronic products, for example, the multi-round interaction based on the context dialog scene plays a very important role in the field of intelligent questions and is also an important function and a huge problem in intelligent question answering. In practical applications, the problem to be solved by the intelligent question-answering system is likely to be a complex flow-type knowledge, rather than a simple question-answer form.
At present, rule-based models are more commonly used in multi-round interactive systems. As the application scenes of the multi-turn interactive system become more and more complex, the pure rule-based model has difficulty meeting the requirements of the interactive system. Compared with a rule-based model, the statistical model is more flexible, has statistical significance, and can be suitable for complex interaction scenes. But large-scale interactive data is required to train the statistical model.
In the prior art, two ways of acquiring interactive data of a multi-round interactive system mainly include acquiring from a log file and manually constructing. The method for acquiring data from the log file is convenient, and only the data to be acquired needs to be buried in points in advance, and the buried data is extracted after the log file is generated. And the way of manually constructing data requires that the required data be constructed according to the requirements of clear construction data.
In the process of implementing the invention, the inventor finds that the prior art has the following defects: acquiring interactive data for training from a log file, wherein the interactive data is required to depend on the performance of an interactive system excessively and only the interactive data of a system support scene can be acquired; the process of manually constructing interactive data for training is complicated and inefficient, and a large amount of labor time is consumed.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for generating labeled data, which can be used for efficiently acquiring data of a required multi-round interactive system, simplifying a data acquisition process and reducing labor cost.
In a first aspect, an embodiment of the present invention provides a method for generating annotation data, including:
acquiring sample condition information which is provided by a data demander and matched with a demand sample;
wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample;
providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information;
performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample;
and constructing structured labeling data according to the target labeling sample and the sample condition information.
In a second aspect, an embodiment of the present invention further provides a device for generating annotation data, where the device includes:
the information acquisition module is used for acquiring sample condition information which is provided by a data demander and matched with a demand sample;
wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample;
the sample obtaining module is used for providing the sample condition information to at least one data labeling party and obtaining an alternative labeling sample generated by the data labeling party for the sample condition information;
the sample checking module is used for checking the rationality of the alternative marked sample according to the sample condition information to obtain a target marked sample;
and the data construction module is used for constructing structured labeling data according to the target labeling sample and the sample condition information.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement any of the above-mentioned methods for generating annotation data.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements any of the above-mentioned methods for generating annotation data.
The embodiment of the invention obtains the target marking sample by providing the sample condition information which is provided by the data demander and matched with the required sample to at least one data marking party and carrying out rationality verification on the alternative marking sample generated by the data marking party according to the sample condition information; according to the method, the structured labeling data are constructed according to the target labeling sample and the sample condition information, the problems of complicated flow, low efficiency and the like existing when the data applied to the multi-round interactive system are obtained in the prior art are solved, the technical effect of efficiently obtaining the required data of the multi-round interactive system is achieved, the data obtaining flow is simplified, and the labor cost is reduced.
Drawings
Fig. 1 is a flowchart of a method for generating annotation data according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for generating annotation data according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a method for generating annotation data according to a third embodiment of the present invention;
FIG. 4 is a flowchart of a method for generating annotation data according to a fourth embodiment of the present invention;
fig. 5 is a schematic diagram of a device for generating annotation data according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a flowchart of a method for generating annotation data according to an embodiment of the present invention, where the embodiment is applicable to a situation of efficiently acquiring annotation data applied to a multi-round interactive system, and the method can be executed by an apparatus for generating annotation data, where the apparatus can be implemented by software and/or hardware, and can be generally integrated in various computer devices, and specifically includes the following steps:
and S110, acquiring sample condition information matched with the demand sample and provided by the data demand party.
Wherein the sample condition information includes: a current semantic understanding protocol of the demand sample, a historical semantic understanding protocol of the historical sample associated with the demand sample, a sample type of the demand sample, and a grammar rule of the demand sample.
The data demander generally refers to a user who generates or optimizes a multi-round interactive system, and the user provides conditions (i.e., sample condition information) for data (i.e., required samples) required in the process of generating or optimizing the multi-round interactive system, requests the data labeling demander to construct the required data according to the conditions, and trains a statistical model applied to the multi-round interactive system according to the constructed data.
In this embodiment, the requirement sample may be a data sample meeting a certain interaction requirement in any or specific field, in any or specific scene, and optionally, the requirement sample may be text data. Typically, the requirement samples may include: the user side is interactive under the current conversation turn. The current conversation turn may specifically be a conversation turn obtained by obtaining a question input by a user in real time in a voice interaction scenario of a question and a response (the user proposes a question, and the machine returns an answer corresponding to the question). The interactive mode in the current conversation turn may be specifically a question of a question input by a user in real time in a voice interaction scenario of a question and a answer (the user end proposes a question, and the machine end returns an answer corresponding to the question).
It can be understood that, when training the statistical model used by the multi-round interactive system, it is desirable that the statistical model can correctly understand the actual semantics of the interactive (Query) input by the user end, and further provide corresponding feedback to the user according to the understood semantics, so as to meet the actual information acquisition requirement of the user end. In a specific example, the interactive formula input by the user end to the multi-turn interactive system is as follows: the ' coffee shops around ' are available ', the actual semantics of the user is the information of the entity of the ' coffee shop ' in the surrounding environment, and if the multi-turn question-answering system can correctly identify the actual semantics of the user, the required information can be fed back to the user, namely: "cafes near you have: cafe a, cafe B, … ". Accordingly, the current semantic understanding protocol of the requirement sample can be the actual semantic or scene condition, such as the information of the domain, the intention or the semantic, which the data demander needs to meet according to the actual requirement design requirement sample. Typically, the data demander may define a current semantic understanding protocol of the demand sample by data in the JOSN format, and in a specific example, the current semantic understanding protocol may be defined as: "field as catering; intention as takeaway; by defining the current semantic understanding protocol, a demand sample meeting the current semantic understanding protocol can be requested from a data annotation party, and accordingly, the data annotation party can construct a formula according to the current semantic understanding protocol, such as: "take-out of restaurants near my thought point", etc.
Further, the history samples associated with the demand samples include: and the user side and/or the system side are/is interactive under at least one historical conversation turn related to the current conversation turn. The historical conversation turns associated with the current conversation turn specifically refer to the conversation turns of a question and a answer that have been completed before the current conversation turn.
In this embodiment, the inventor considers that the history semantic understanding protocol corresponding to the interaction under the history dialog turn adjacent to one or more interactions under the current dialog turn may include semantic actions that are the same as or similar to the semantic actions corresponding to the current semantic understanding protocol corresponding to the interaction under the current dialog turn. Therefore, the interactive corresponding current semantic understanding protocol of the user terminal and/or the system terminal under at least one historical conversation turn associated with the current conversation turn is obtained while the interactive corresponding historical semantic understanding protocol under the current conversation turn is obtained, so that the method is beneficial to accurately or quickly generating the alternative labeling sample for the sample condition information by the data labeling party. In addition, a historical semantic understanding protocol of a historical sample associated with the demand sample is added to the sample condition information, so that the obtained data can be applied to complex scenes based on context dialogue and the like, and the intelligent demand is met.
For example: the interactive mode input by the user side under a historical conversation turn is as follows: "please recommend the restaurant in the surrounding environment", the interactive mode of the machine end feedback under the history conversation turn is: "the restaurants around you have: restaurant a, restaurant B, and restaurant C "; if the interactive mode input by the user side under the current conversation turn is as follows: the machine end can recognize that the actual semantics of the user is that the user wants to find the Chuan museum in the surrounding museums and the Guangdong museums are to be excluded according to the historical conversation turns. If the multi-turn question-answering system can correctly identify the actual semantics of the user, the required information can be fed back to the user, namely: "the restaurant C near you is Chuan restaurant".
Correspondingly, in order to enable the data annotating party to accurately construct the requirement sample, the sample condition information further includes: a historical semantic understanding protocol of a historical sample associated with a demand sample. The historical semantic understanding protocol can be the actual semantics or scene conditions of the historical sample associated with the demand sample, such as information of a domain, an intention or a semantic.
Generally, in training a statistical model, negative samples are required in addition to positive samples. Correspondingly, the sample type of the demand sample in the sample condition information is specifically used for specifying that the type of the demand sample is a positive sample or a negative sample; the grammar rule of the requirement sample in the sample condition information specifically refers to a grammar habit that the requirement sample must follow, such as: the predicate and object must be included, which specific fields must be included or which specific fields must not be included, etc.
In the embodiment of the invention, optionally, the data demander only provides the sample condition information and is not responsible for operations such as labeling and verifying the required sample, generating the data and the like, so that the generation efficiency of the labeled data can be effectively improved.
In an optional embodiment of the present invention, in the current semantic understanding protocol, a first target field associated with the requirement sample and a field value corresponding to the first target field are defined in JSON format; defining a second target field associated with the history sample and a field value corresponding to the second target field in a JSON format in the history semantic understanding protocol; in the grammar rule, a third target field which must be contained in the requirement sample and a fourth target field which cannot be contained in the requirement sample are defined; the sample types of the demand sample include: a positive sample type that the requirement sample conforms to the current semantic understanding protocol context, or a negative sample type that the requirement sample does not conform to the current semantic understanding protocol context; wherein the first target field is the same as the second target field, the first target field or the second target field including at least one of: domain, intent, semantic action, and slot information.
The field refers to a field related to data in the demand sample, for example, the field corresponding to dishes is catering, and the field corresponding to mountain climbing is sports. The intent is the need for the goal and method to be achieved by the data in the demand sample, e.g., "query Chuan restaurant, exclude Guangdong restaurant," and the intent of the demand sample is to find a restaurant. The semantic action is specifically information which can be understood and processed by the machine terminal, and more specifically, if the content input by the user terminal is good in the morning, the semantic action which can be understood by the machine terminal is obtained as a call after the semantic understanding. The slot information refers to a meaning category of each word, and specifically may be a plurality of information slots, such as dishes, clothes, electronic devices, and the like, which are preset, and a plurality of corresponding field values are set for each information slot, for example, the slot information is a slot value of a dish, which may include a xiang dish, a yue dish, a chuan dish, and the like.
JSON is a lightweight data exchange format, is easy for human reading and writing, and is also easy for machine analysis and generation. In the embodiment of the invention, the first target field, the second target field and the corresponding field value in the current semantic understanding protocol and the historical semantic understanding protocol are defined through the JSON format, so that a data annotation party can conveniently and quickly read the target field and the field value in the semantic understanding protocol. Meanwhile, fields which must be contained and cannot be contained in the requirement sample are regulated through the grammar rule, so that the accuracy of the alternative labeling sample generated by the data labeling party can be improved. In addition, the positive sample type and the negative sample type are set for the demand sample, so that the alternative annotation sample can be further ensured to meet the demand of the demand sample.
For example, the current semantic understanding protocol of the demand sample in the sample condition information may be domain — restaurant; find a restaurant; the semantic action is that the information slot is a dish, the corresponding slot value is a user intention of the Sichuan dish is Yes, the information slot is a dish, and the corresponding slot value is a user intention of the Guangdong dish is No; the information slot is equal to the dish, and the corresponding slot value is equal to the Sichuan dish; the information slot is equal to dish, and the corresponding slot value is equal to Guangdong dish. The historical semantic understanding protocol of the historical samples associated with the demand samples may be domain catering; intention as takeaway; semantic action is the acquisition of surrounding restaurants that provide takeaway services. The sample types of the demand sample may include a positive sample type and a negative sample type. Accordingly, the grammar rule may be a positive sample or a negative sample, and the grammar rule may be that the field which must be contained is restaurant or take-out or dish, etc., and the field which must not be contained is drink or cold drink, etc.
Correspondingly, the data annotating party constructs the demand sample of the positive sample type according to the sample condition information, wherein the demand sample comprises: the method comprises the following steps of selecting restaurants with heavy taste and excluding restaurants with light taste, or selecting dishes with the taste of Chuan dishes and dishes with the taste of Guangdong dishes; the demand sample for the constructed negative sample type may be: "I want to find the movie theatre next to the XX restaurant" or "how the XX dish did", etc.
In an optional embodiment of the invention, the sample condition information further comprises: interactive examples of the current conversation turn matching the demand sample, and interactive examples of the historical conversation turn matching the historical sample; wherein the interactive instance of the current conversation turn is in compliance with a current semantic understanding protocol of the requirement sample; the interactive examples of the historical dialog turn are in accordance with a historical semantic understanding protocol of the historical sample.
Wherein, the interactive example of the current conversation turn and the interactive example of the historical conversation turn are both examples provided by the data demand side according to the required demand sample, and the two examples need to be respectively in accordance with the current semantic understanding protocol of the demand sample and the historical semantic understanding protocol of the historical sample. In other words, the interactive example of the current conversation turn is an illustration of a sample of the needs required by the data demander.
For example, the current semantic understanding protocol of the demand sample is domain navigation; searching and positioning a bus station near Beijing university; information slot is university, slot value is Beijing university; the historical semantic understanding protocol of the historical sample is navigation as field; semantic actions-a route provided to the university of Beijing; information slot is university and slot value is Beijing university.
Accordingly, an interactive example of a current conversation turn for a demand sample of a positive sample type may be "where is a bus stop near Beijing university," consistent with a current semantic understanding protocol for the demand sample; an interactive example of a current conversation turn for a demand sample of negative sample type may be "what market word is best around Beijing university," not compliant with the current semantic understanding protocol for the demand sample; an interactive example of a historical conversation turn of a historical sample associated with a demand sample may be "how to go to Beijing university," etc.
In the embodiment of the present invention, the data annotating party can quickly locate the requirements of the data demanding party on the current demand sample according to the two examples. Through the interactive example of the current conversation turn, the data annotation party can quickly know the context of the requirement sample and the specific requirement sample example, and construct an alternative annotation sample according to the content.
And S120, providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information.
The data annotation party generally refers to constructing annotation data required by the data demander according to conditions (sample condition information matched with the required sample) provided by the data demander. The number of the data labeling parties can be one or multiple, and optionally, multiple data labeling parties can be adopted in order to ensure that the target labeling sample and the structured labeling data can meet the quantity requirement of a preset model on the labeling sample. The alternative annotation sample is at least one sample (usually a plurality of samples) which is constructed by the data annotation party according to the sample condition information and theoretically satisfies the sample condition information.
In the embodiment of the invention, the data demander provides the matched sample condition information for the data annotating party according to the requirement of the current required sample, and the data annotating party learns the requirement of the current sample after acquiring the sample condition information matched with the current required sample, and constructs the alternative annotated sample matched with the current required sample according to the requirement. Briefly, a data demander provides requirements, and a data annotating demander constructs batch samples according to the requirements. Therefore, the data demand side and the data annotation side work separately and cooperate to complete the generation operation of the alternative annotation sample, and other links are handed to the equipment for processing, so that the labor cost can be effectively reduced, the data acquisition efficiency of the multi-round interactive system can be improved, and the data acquisition process is simplified.
S130, performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample.
The rationality check specifically refers to a check operation performed on whether the alternative annotation sample meets the requirement of the current requirement sample.
In this embodiment, the inventor considers that although the data annotating party is the alternative annotation sample generated according to the sample condition information, in view of the influence of subjective factors, in the actual operation process, the alternative annotation sample generated by the data annotating party inevitably has a situation of not matching the required sample. Therefore, the operation of carrying out rationality check on the alternative annotation sample is added, the extremely high accuracy of the finally obtained target annotation sample can be ensured, and meanwhile, the cost and time for manually checking the alternative annotation sample are saved, so that the data acquisition process is simplified, and the data acquisition efficiency is improved.
And S140, constructing structured labeling data according to the target labeling sample and the sample condition information.
The structured labeling data is data formed by taking a target labeling sample as a data source and extracting effective data of the data source according to sample condition information.
In the embodiment of the present invention, the purpose of constructing the structured data is to obtain the structural features of the target annotation sample, and optionally, the target annotation sample with the structural features may be used for training the model.
The embodiment of the invention obtains the target marking sample by providing the sample condition information which is provided by the data demander and matched with the required sample to at least one data marking party and carrying out rationality verification on the alternative marking sample generated by the data marking party according to the sample condition information; according to the method, the structured labeling data are constructed according to the target labeling sample and the sample condition information, the problems of complicated flow, low efficiency and the like existing when the data applied to the multi-round interactive system are obtained in the prior art are solved, the technical effect of efficiently obtaining the required data of the multi-round interactive system is achieved, the data obtaining flow is simplified, and the labor cost is reduced.
Example two
Fig. 2 is a flowchart of a method for generating annotation data according to a second embodiment of the present invention, which is embodied on the basis of the second embodiment, in this embodiment, a plausibility check is performed on an alternative annotation sample according to the sample condition information to obtain a target annotation sample, specifically: acquiring a current semantic understanding protocol of the demand sample in the sample condition information; in the alternative annotation sample, acquiring a field value to be verified corresponding to a first target field included in the current semantic understanding protocol; and if the field value to be verified is determined to be matched with the field value corresponding to the first target field in the current semantic understanding protocol, determining the alternative annotation sample as the target annotation sample. Correspondingly, as shown in fig. 2, the method of the present embodiment may include:
and S210, obtaining sample condition information matched with the demand sample and provided by the data demand party.
S220, providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information.
And S230, performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample.
Specifically, S230 may include:
s231, obtaining the current semantic understanding protocol of the demand sample in the sample condition information.
In the embodiment of the invention, when the rationality of the alternative annotation sample is checked according to the sample condition information, the rationality of the alternative annotation sample can be checked according to the relevant field information in the current semantic understanding protocol of the required sample in the sample condition information.
S232, obtaining a field value to be verified corresponding to a first target field included in the current semantic understanding protocol from the alternative labeling sample.
In one specific example, for an alternative labeled sample: "where the bus stops near Beijing university," the current semantic understanding protocol obtained is: "navigation is a domain; searching and positioning a bus station near Beijing university; information slot-university, slot value-Beijing university ";
accordingly, for each first target field included in the current semantic understanding protocol, that is: and acquiring the field value of each first target field in the alternative labeling sample as the field value to be verified. For example: in the alternative annotation sample, obtaining a value of a field to be verified corresponding to a first target field included in the current semantic understanding protocol is as follows: "navigation is a domain; searching and positioning a bus station near Tianjin university; information slot is university, slot value is Tianjin university ".
S233, determining whether the field value to be verified matches the field value corresponding to the first target field in the current semantic understanding protocol, if yes, executing S234, otherwise, executing S235.
In one specific example, for an alternative labeled sample: the value of the field to be verified corresponding to the first target field included in the current semantic understanding protocol, which is acquired as "where the bus station near Tianjin university is" is: "navigation is a domain; searching and positioning a bus station near Tianjin university; information slot is university, slot value is Tianjin university ". If the first target field and the corresponding field value of the current semantic understanding protocol of the requirement sample are respectively as follows: "navigation is a domain; searching and positioning a bus station near Tianjin university; the information slot is university, and the slot value is Tianjin university, which indicates that the field value to be verified is matched with the field value corresponding to the first target field in the current semantic understanding protocol; if the first target field and the corresponding field value of the current semantic understanding protocol of the requirement sample are respectively as follows: "navigation is a domain; searching and positioning a bus station near Beijing university; and if the information slot is university, and the slot value is Beijing university ", it indicates that the field value to be verified does not match the field value corresponding to the first target field in the current semantic understanding protocol.
It should be noted that the field value to be verified and the field value corresponding to the first target field in the current semantic understanding protocol do not need to completely correspond one to one, and when the field value to be verified and the field value corresponding to the first target field in the current semantic understanding protocol are synonyms, it can also be considered that the field value to be verified is matched with the field value corresponding to the first target field in the current semantic understanding protocol.
For example, for the alternative annotated sample: the value of the field to be verified corresponding to the first target field included in the current semantic understanding protocol acquired by the restaurant nearby is as follows: "field as catering; semantic action-search for surrounding restaurants. The first target field and the corresponding field value of the current semantic understanding protocol of the requirement sample are respectively as follows: "field as diet; and if the semantic action is searching for a nearby restaurant, the semantic action indicates that the field value to be verified is matched with the field value corresponding to the first target field in the current semantic understanding protocol.
And S234, determining the alternative annotation sample as the target annotation sample.
In the embodiment of the invention, the alternative annotation sample which passes the rationality check is determined as the target annotation sample.
And S235, deleting the alternative labeling sample.
Correspondingly, if the alternative annotation sample does not pass the rationality check, the alternative annotation sample is deleted. Or, all the alternative labeled samples which do not pass the rationality check can be summarized into a set, the set is fed back to the data labeling party, and the data labeling party checks and corrects the samples in the set. Since the data annotation party already complies with the sample condition information provided by the data demand party when generating the alternative annotation sample, the number of the alternative annotation samples which do not pass the rationality check is not excessive. Even if the set containing the alternative annotation samples which do not pass the rationality verification is fed back to the data annotation party for correction, too much workload is not added to the data annotation party.
S240, structured labeling data are constructed according to the target labeling sample and the sample condition information.
The embodiment of the invention verifies whether the candidate annotation sample meets the rationality check by verifying whether the value of the field to be verified corresponding to the first target field in the current semantic understanding protocol of the candidate annotation sample is matched with the value of the field corresponding to the first target field in the current semantic understanding protocol of the required sample, thereby ensuring the accuracy of the candidate annotation sample and reducing the labor cost.
EXAMPLE III
Fig. 3 is a flowchart of a method for generating annotation data according to a third embodiment of the present invention, which is embodied on the basis of the third embodiment, in this embodiment, a plausibility check is performed on an alternative annotation sample according to the sample condition information to obtain a target annotation sample, specifically: obtaining a grammar rule of the demand sample in the sample condition information; searching a third target field and a fourth target field corresponding to the grammar rule in the alternative labeling sample; and if the search result is determined to be matched with the grammar rule, determining the alternative annotation sample as the target annotation sample. Accordingly, as shown in fig. 3, the method of the present embodiment may include:
and S310, acquiring sample condition information matched with the demand sample and provided by the data demand party.
S320, providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information.
S330, performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample.
And S331, obtaining the grammar rule of the requirement sample in the sample condition information.
In the embodiment of the invention, when the rationality of the alternative labeled sample is checked according to the sample condition information, the rationality of the alternative labeled sample can be checked according to the grammar rule of the required sample in the sample condition information.
S332, searching a third target field and a fourth target field corresponding to the grammar rule in the alternative labeling sample.
Specifically, when the rationality of the candidate tagged sample is checked according to the grammatical rule of the required sample in the sample condition information, the third target field and the fourth target field corresponding to the grammatical rule corresponding to the candidate tagged sample can be extracted.
It should be noted that the third target field and the fourth target field are predefined fields, and may be one or multiple fields. The third target field may include one or more fields in the first target field, or may be defined separately from the fields included in the first target field. For example, the third field may be a restaurant, take-out, or dish, etc., and the fourth field may be a beverage or cold drink, etc.
S333, judging whether the search result is matched with the grammar rule, if so, executing S334, otherwise, executing S335.
In a specific example, the grammar rule corresponding to the requirement sample is that the field which must be included is restaurant or take-out or dish, and the field which must not be included is beverage or cold drink. For the alternative annotation sample "which restaurants are nearby", it is found that it contains the third target field "restaurant" corresponding to the grammar rule, and it is determined that it does not contain fields such as "drink" or "cold drink", which indicates that the search result matches the grammar rule corresponding to the requirement sample. And aiming at the alternative annotation sample 'taking out of which cold drinks are nearby', the alternative annotation sample is searched for the third target field 'taking out' corresponding to the grammar rule, but the alternative annotation sample is determined to contain the field 'cold drinks', and the search result is not matched with the grammar rule corresponding to the requirement sample.
It should be noted that, performing rationality check according to the syntax rules requires that field values are completely corresponding one to one, and even if synonyms corresponding to the third field and the fourth field are found, the search result cannot be considered to be matched with the syntax rules. Therefore, the matching degree of the fields is required to be higher by the rationality check of the alternative annotation samples through the grammatical rules of the requirement samples in the sample condition information.
And S334, determining the candidate annotation sample as the target annotation sample.
And S335, deleting the alternative labeling sample.
S340, constructing structured labeling data according to the target labeling sample and the sample condition information.
According to the embodiment of the invention, the third target field and the fourth target field corresponding to the grammatical rule of the required sample are searched in the alternative labeling sample, and whether the search result is matched with the grammatical rule of the required sample is checked to realize rationality check, so that the accuracy of the alternative labeling sample is ensured, and the labor cost is reduced.
Example four
Fig. 4 is a flowchart of a method for generating annotation data according to a fourth embodiment of the present invention, which is embodied on the basis of the foregoing embodiment, and in this embodiment, structured annotation data is constructed according to the target annotation sample and the sample condition information, specifically: acquiring a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample and a sample type of the demand sample in the sample condition information; and combining the target labeling sample, the current semantic understanding protocol of the demand sample, the historical semantic understanding protocol of the historical sample associated with the demand sample and the sample type of the demand sample to obtain the structured labeling data. Correspondingly, as shown in fig. 4, the method of this embodiment may include:
and S410, obtaining sample condition information matched with the demand sample and provided by the data demand party.
And S420, providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information.
And S430, performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample.
S440, acquiring a current semantic understanding protocol of the demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample and a sample type of the demand sample in the sample condition information.
In the embodiment of the present invention, in order to train a statistical model using a target labeled sample that passes the rationality check, data in the target labeled sample needs to be extracted and combined to obtain structured data. Because the data finally generated by the data annotation party only comprises the target annotation sample, when the structured data is constructed according to the target annotation sample, the data extraction can be performed on the target annotation sample by taking the current semantic understanding protocol of the demand sample, the historical semantic understanding protocol of the historical sample associated with the demand sample and the sample type of the demand sample as standards.
S450, combining the target labeling sample, the current semantic understanding protocol of the demand sample, the historical semantic understanding protocol of the historical sample associated with the demand sample and the sample type of the demand sample to obtain the structured labeling data.
For example, for a target annotation sample, "i love eating chinese cuisine and do not love eating cantonese", and the current semantic understanding protocol of the demand sample corresponding to the target annotation sample, the historical semantic understanding protocol of the historical sample associated with the demand sample, and the structured annotation data constructed by the sample type of the demand sample may be: "labeling the sample: "I love eating Chuan vegetable and don't love eating Guangdong vegetable";
current semantic understanding protocols: the field is diet; find a restaurant; the semantic action is that the information slot is a dish, the corresponding slot value is a user intention of the Sichuan dish is Yes, the information slot is a dish, and the corresponding slot value is a user intention of the Guangdong dish is No; the information slot is equal to the dish, and the corresponding slot value is equal to the Sichuan dish; the information slot is equal to dish, and the corresponding slot value is equal to Guangdong dish;
historical semantic understanding protocol: the field is catering; intention as takeaway; semantic action, namely acquiring surrounding restaurants providing takeaway services;
sample type: positive samples ".
And S460, inputting the structured labeling data into a preset model for training to obtain a model for interactive semantic action recognition of the user side under the current conversation turn.
The preset model is a self-defined model used for training the target labeling sample. The interactive semantic action recognition model of the user side under the current conversation turn is a statistical model applied in the multi-turn interactive system, and can be used for recognizing semantic actions of the problems raised by the user side and acquired in the multi-turn interactive system application process.
In the embodiment of the invention, after a sufficient number of target labeling samples are obtained, all the target labeling samples can form a structured labeling data customs preset model for model training to train, and a model for performing interactive semantic action recognition on a user side in the current conversation turn is obtained. The structured marking data generated by combination is directly used as training sample resources, so that the link of independently manufacturing the training sample resources can be avoided, and the data utilization rate and the data processing efficiency are improved.
According to the embodiment of the invention, the target labeling sample, the current semantic understanding protocol of the demand sample, the historical semantic understanding protocol of the historical sample associated with the demand sample and the sample type of the demand sample are combined to obtain the structured labeling data, and then the structured labeling data is input into the preset model for training to obtain the model for performing semantic action recognition on the interactive mode of the user side under the current conversation turn, so that the data utilization rate and the data processing efficiency can be effectively improved.
It should be noted that any permutation and combination between the technical features in the above embodiments also belong to the scope of the present invention.
EXAMPLE five
Fig. 5 is a schematic diagram of a device for generating annotation data according to the fifth embodiment of the present invention, which is capable of executing a method for generating annotation data according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method.
The device comprises:
an information obtaining module 510, configured to obtain sample condition information that is provided by a data demander and matches a demand sample;
wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample;
a sample obtaining module 520, configured to provide the sample condition information to at least one data annotating party, and obtain an alternative annotated sample generated by the data annotating party for the sample condition information;
a sample checking module 530, configured to perform rationality checking on the alternative annotation sample according to the sample condition information, to obtain a target annotation sample;
and a data constructing module 540, configured to construct structured annotation data according to the target annotation sample and the sample condition information.
The embodiment of the invention obtains the target marking sample by providing the sample condition information which is provided by the data demander and matched with the required sample to at least one data marking party and carrying out rationality verification on the alternative marking sample generated by the data marking party according to the sample condition information; according to the method, the structured labeling data are constructed according to the target labeling sample and the sample condition information, the problems of complicated flow, low efficiency and the like existing when the data applied to the multi-round interactive system are obtained in the prior art are solved, the technical effect of efficiently obtaining the required data of the multi-round interactive system is achieved, the data obtaining flow is simplified, and the labor cost is reduced.
Optionally, the requirement sample includes: the interactive mode of the user side under the current conversation turn; the history samples associated with the demand samples include: and the user side and/or the system side are/is interactive under at least one historical conversation turn related to the current conversation turn.
Optionally, in the current semantic understanding protocol, a first target field associated with the requirement sample and a field value corresponding to the first target field are defined in a JSON format; defining a second target field associated with the history sample and a field value corresponding to the second target field in a JSON format in the history semantic understanding protocol; in the grammar rule, a third target field which must be contained in the requirement sample and a fourth target field which cannot be contained in the requirement sample are defined; the sample types of the demand sample include: a positive sample type that the requirement sample conforms to the context of the current semantic understanding protocol, or a negative sample type that the requirement sample does not conform to the context of the current semantic understanding protocol; wherein the first target field is the same as the second target field, the first target field or the second target field including at least one of: domain, intent, semantic action, and slot information.
Optionally, the sample checking module 530 is further configured to obtain, in the sample condition information, a current semantic understanding protocol of the demand sample; in the alternative annotation sample, acquiring a field value to be verified corresponding to a first target field included in the current semantic understanding protocol; and if the field value to be verified is determined to be matched with the field value corresponding to the first target field in the current semantic understanding protocol, determining the alternative annotation sample as the target annotation sample.
Optionally, the sample checking module 530 is further configured to obtain a syntax rule of the demand sample in the sample condition information; searching a third target field and a fourth target field corresponding to the grammar rule in the alternative labeling sample; and if the search result is determined to be matched with the grammar rule, determining the alternative annotation sample as the target annotation sample.
Optionally, the interactive examples of the current conversation turn matched with the requirement sample, and the interactive examples of the historical conversation turn matched with the historical sample; wherein the interactive instance of the current conversation turn is in compliance with a current semantic understanding protocol of the requirement sample; the interactive examples of the historical dialog turn are in accordance with a historical semantic understanding protocol of the historical sample.
Optionally, the data constructing module 540 is further configured to obtain, in the sample condition information, a current semantic understanding protocol of the demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, and a sample type of the demand sample; and combining the target labeling sample, the current semantic understanding protocol of the demand sample, the historical semantic understanding protocol of the historical sample associated with the demand sample and the sample type of the demand sample to obtain the structured labeling data.
Optionally, the apparatus further includes a model obtaining module 550, configured to input the structured annotation data into a preset model for training, so as to obtain a model for performing semantic action recognition on the user side in an interactive manner in the current conversation turn.
The generating device of the annotation data can execute the generating method of the annotation data provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method. For details of the technology that are not described in detail in this embodiment, reference may be made to a method for generating annotation data provided in any embodiment of the present invention.
Since the above-described apparatus for generating annotation data is an apparatus capable of executing the method for generating annotation data in the embodiment of the present invention, based on the method for generating annotation data described in the embodiment of the present invention, a person skilled in the art can understand the specific implementation of the apparatus for generating annotation data in the embodiment and various variations thereof, and therefore, how to implement the method for generating annotation data in the embodiment of the present invention by the apparatus for generating annotation data is not described in detail here. The device used by those skilled in the art to implement the method for generating the annotation data in the embodiment of the present invention is within the scope of the protection of the present application.
EXAMPLE six
Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 6 illustrates a block diagram of a computer device 612 suitable for use in implementing embodiments of the present invention. The computer device 612 shown in fig. 6 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in fig. 6, the computer device 612 is in the form of a general purpose computing device. Components of computer device 612 may include, but are not limited to: one or more processors 616, a memory device 628, and a bus 618 that couples the various system components including the memory device 628 and the processors 616.
Bus 618 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Computer device 612 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 612 and includes both volatile and nonvolatile media, removable and non-removable media.
Storage 628 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 630 and/or cache Memory 632. The computer device 612 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 634 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In such cases, each drive may be connected to bus 618 by one or more data media interfaces. Storage device 628 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program 636 having a set (at least one) of program modules 626 may be stored, for example, in storage device 628, such program modules 626 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 626 generally perform the functions and/or methodologies of embodiments of the invention as described herein.
Computer device 612 may also communicate with one or more external devices 614 (e.g., keyboard, pointing device, camera, display 624, etc.), with one or more devices that enable a user to interact with computer device 612, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 612 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 622. Further, computer device 612 may also communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via Network adapter 620. As shown, the network adapter 620 communicates with the other modules of the computer device 612 via the bus 618. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the computer device 612, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 616 executes various functional applications and data processing by executing programs stored in the storage device 628, for example, implementing the generation method of the annotation data provided by the above-described embodiment of the present invention.
That is, the processing unit implements, when executing the program: acquiring sample condition information which is provided by a data demander and matched with a demand sample; wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample; providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information; performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample; and constructing structured labeling data according to the target labeling sample and the sample condition information.
Providing sample condition information which is provided by a data demander and matched with a required sample to at least one data annotating party through the computer equipment, and performing rationality verification on a standby annotated sample generated by the data annotating party according to the sample condition information to obtain a target annotated sample; according to the method, the structured labeling data are constructed according to the target labeling sample and the sample condition information, the problems of complicated flow, low efficiency and the like existing when the data applied to the multi-round interactive system are obtained in the prior art are solved, the technical effect of efficiently obtaining the required data of the multi-round interactive system is achieved, the data obtaining flow is simplified, and the labor cost is reduced.
EXAMPLE seven
An embodiment of the present invention further provides a computer storage medium storing a computer program, where the computer program is used to execute the method for generating annotation data according to any one of the above embodiments of the present invention when executed by a computer processor:
acquiring sample condition information which is provided by a data demander and matched with a demand sample;
wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample;
providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information;
performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample;
and constructing structured labeling data according to the target labeling sample and the sample condition information.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for generating annotation data, comprising:
acquiring sample condition information which is provided by a data demander and matched with a demand sample; the data demander refers to a user for generating or optimizing a multi-round interactive system;
wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample;
providing the sample condition information to at least one data labeling party, and acquiring an alternative labeling sample generated by the data labeling party for the sample condition information;
performing rationality verification on the alternative annotation sample according to the sample condition information to obtain a target annotation sample; the rationality check refers to a check operation performed on whether the alternative annotation sample meets the requirements of the current demand sample;
and constructing structured labeling data according to the target labeling sample and the sample condition information.
2. The method of claim 1, wherein the demand sample comprises: the interactive mode of the user side under the current conversation turn;
the history samples associated with the demand samples include: and the user side and/or the system side are/is interactive under at least one historical conversation turn related to the current conversation turn.
3. The method of claim 2, wherein:
defining a first target field associated with the requirement sample and a field value corresponding to the first target field in a JSON format in the current semantic understanding protocol;
defining a second target field associated with the history sample and a field value corresponding to the second target field in a JSON format in the history semantic understanding protocol;
in the grammar rule, a third target field which must be contained in the requirement sample and a fourth target field which cannot be contained in the requirement sample are defined;
the sample types of the demand sample include: a positive sample type that the requirement sample conforms to the context of the current semantic understanding protocol, or a negative sample type that the requirement sample does not conform to the context of the current semantic understanding protocol;
wherein the first target field is the same as the second target field, the first target field or the second target field including at least one of: domain, intent, semantic action, and slot information.
4. The method according to claim 3, wherein performing rationality check on the alternative labeled sample according to the sample condition information to obtain a target labeled sample comprises:
acquiring a current semantic understanding protocol of the demand sample in the sample condition information;
in the alternative annotation sample, acquiring a field value to be verified corresponding to a first target field included in the current semantic understanding protocol;
and if the field value to be verified is determined to be matched with the field value corresponding to the first target field in the current semantic understanding protocol, determining the alternative annotation sample as the target annotation sample.
5. The method according to claim 3, wherein performing rationality check on the alternative labeled sample according to the sample condition information to obtain a target labeled sample comprises:
obtaining a grammar rule of the demand sample in the sample condition information;
searching a third target field and a fourth target field corresponding to the grammar rule in the alternative labeling sample;
and if the search result is determined to be matched with the grammar rule, determining the alternative annotation sample as the target annotation sample.
6. The method of any of claims 1-5, wherein the sample condition information further comprises:
interactive examples of the current conversation turn matching the demand sample, and interactive examples of the historical conversation turn matching the historical sample;
wherein the interactive instance of the current conversation turn is in compliance with a current semantic understanding protocol of the requirement sample; the interactive examples of the historical dialog turn are in accordance with a historical semantic understanding protocol of the historical sample.
7. The method of claim 1, wherein constructing structured annotation data based on the target annotated sample and the sample condition information comprises:
acquiring a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample and a sample type of the demand sample in the sample condition information;
combining the target labeling sample, the current semantic understanding protocol of the demand sample, the historical semantic understanding protocol of the historical sample associated with the demand sample and the sample type of the demand sample to obtain the structured labeling data;
the method further comprises the following steps: and inputting the structured labeling data into a preset model for training to obtain a model for performing semantic action recognition on the interactive mode of the user side under the current conversation turn.
8. An apparatus for generating annotation data, comprising:
the information acquisition module is used for acquiring sample condition information which is provided by a data demander and matched with a demand sample; the data demander refers to a user for generating or optimizing a multi-round interactive system;
wherein the sample condition information includes: the method comprises the steps of obtaining a current semantic understanding protocol of a demand sample, a historical semantic understanding protocol of a historical sample associated with the demand sample, a sample type of the demand sample and a grammar rule of the demand sample;
the sample obtaining module is used for providing the sample condition information to at least one data labeling party and obtaining an alternative labeling sample generated by the data labeling party for the sample condition information;
the sample checking module is used for checking the rationality of the alternative marked sample according to the sample condition information to obtain a target marked sample; the rationality check refers to a check operation performed on whether the alternative annotation sample meets the requirements of the current demand sample;
and the data construction module is used for constructing structured labeling data according to the target labeling sample and the sample condition information.
9. A computer device, the device comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method of generating annotation data according to any one of claims 1 to 7.
10. A computer storage medium on which a computer program is stored, the program, when being executed by a processor, implementing the method of generating annotation data according to any one of claims 1 to 7.
CN201810580489.2A 2018-06-07 2018-06-07 Method, device and equipment for generating labeled data and storage medium Active CN108959412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810580489.2A CN108959412B (en) 2018-06-07 2018-06-07 Method, device and equipment for generating labeled data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810580489.2A CN108959412B (en) 2018-06-07 2018-06-07 Method, device and equipment for generating labeled data and storage medium

Publications (2)

Publication Number Publication Date
CN108959412A CN108959412A (en) 2018-12-07
CN108959412B true CN108959412B (en) 2021-09-14

Family

ID=64493637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810580489.2A Active CN108959412B (en) 2018-06-07 2018-06-07 Method, device and equipment for generating labeled data and storage medium

Country Status (1)

Country Link
CN (1) CN108959412B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111429895B (en) * 2018-12-21 2023-05-05 广东美的白色家电技术创新中心有限公司 Semantic understanding method and device for multi-round interaction and computer storage medium
CN111753814B (en) * 2019-03-26 2023-07-25 杭州海康威视数字技术股份有限公司 Sample generation method, device and equipment
CN113987147A (en) * 2021-06-16 2022-01-28 北京金山数字娱乐科技有限公司 Sample processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9684683B2 (en) * 2010-02-09 2017-06-20 Siemens Aktiengesellschaft Semantic search tool for document tagging, indexing and search
CN107316643A (en) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 Voice interactive method and device
CN107357838A (en) * 2017-06-23 2017-11-17 上海交通大学 Dialog strategy canbe used on line method based on multi-task learning
CN107799116A (en) * 2016-08-31 2018-03-13 科大讯飞股份有限公司 More wheel interacting parallel semantic understanding method and apparatus
CN108052659A (en) * 2017-12-28 2018-05-18 北京百度网讯科技有限公司 Searching method, device and electronic equipment based on artificial intelligence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9684683B2 (en) * 2010-02-09 2017-06-20 Siemens Aktiengesellschaft Semantic search tool for document tagging, indexing and search
CN107799116A (en) * 2016-08-31 2018-03-13 科大讯飞股份有限公司 More wheel interacting parallel semantic understanding method and apparatus
CN107357838A (en) * 2017-06-23 2017-11-17 上海交通大学 Dialog strategy canbe used on line method based on multi-task learning
CN107316643A (en) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 Voice interactive method and device
CN108052659A (en) * 2017-12-28 2018-05-18 北京百度网讯科技有限公司 Searching method, device and electronic equipment based on artificial intelligence

Also Published As

Publication number Publication date
CN108959412A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
US11625620B2 (en) Techniques for building a knowledge graph in limited knowledge domains
US11698261B2 (en) Method, apparatus, computer device and storage medium for determining POI alias
US10142266B2 (en) Method and system for providing recommendations during a chat session
US20220198327A1 (en) Method, apparatus, device and storage medium for training dialogue understanding model
CN108388638B (en) Semantic parsing method, device, equipment and storage medium
KR20170001550A (en) Human-computer intelligence chatting method and device based on artificial intelligence
CN108959412B (en) Method, device and equipment for generating labeled data and storage medium
US10755052B2 (en) Semantic disambiguation method, device, server and storage medium
WO2021164244A1 (en) Voice interaction method and apparatus, device and computer storage medium
CN109036397B (en) Method and apparatus for presenting content
US10558655B2 (en) Data query method supporting natural language, open platform, and user terminal
US20180254043A1 (en) Image display device, method for driving the same, and computer readable recording medium
US20130311506A1 (en) Method and apparatus for user query disambiguation
CN110377676B (en) Voice instruction processing method, device, equipment and computer storage medium
US20220124421A1 (en) Method of generating bullet comment, device, and storage medium
EP4060476A2 (en) Establishment of audio-based network sessions with non-registered resources
CN112765460A (en) Conference information query method, device, storage medium, terminal device and server
CN106558311A (en) Voice content reminding method and device
CN104484370A (en) Transmitting method, receiving method, transmitting device, receiving device and system for answer information on basis of questions and answers
WO2020052060A1 (en) Method and apparatus for generating correction statement
CN111125550A (en) Interest point classification method, device, equipment and storage medium
WO2023124215A1 (en) User question labeling method and device
CN111814036A (en) Wireless hotspot and interest point matching method based on search engine, electronic device and storage medium
CN106570002B (en) Natural language processing method and device
CN110459203B (en) Intelligent voice tour guide method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant