CN110674638B - Corpus labeling system and electronic equipment - Google Patents

Corpus labeling system and electronic equipment Download PDF

Info

Publication number
CN110674638B
CN110674638B CN201910902201.3A CN201910902201A CN110674638B CN 110674638 B CN110674638 B CN 110674638B CN 201910902201 A CN201910902201 A CN 201910902201A CN 110674638 B CN110674638 B CN 110674638B
Authority
CN
China
Prior art keywords
labeling
corpus data
corpus
result
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910902201.3A
Other languages
Chinese (zh)
Other versions
CN110674638A (en
Inventor
于博文
郭慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Shanghai Xiaodu Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Shanghai Xiaodu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd, Shanghai Xiaodu Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910902201.3A priority Critical patent/CN110674638B/en
Publication of CN110674638A publication Critical patent/CN110674638A/en
Application granted granted Critical
Publication of CN110674638B publication Critical patent/CN110674638B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application discloses a corpus labeling system and electronic equipment, and relates to the field of artificial intelligence. The method specifically comprises the following steps: the auxiliary labeling component is used for responding to a labeling request for labeling the corpus data to be labeled, displaying an auxiliary labeling interface, wherein the auxiliary labeling interface at least displays prompt information, and the prompt information characterizes the relevant information of the labeling result of the corpus data associated with the corpus data to be labeled; the labeling component is used for responding to the input operation of the characterization labeling result and displaying the target labeling result at the position corresponding to the corpus data to be labeled; the quality detection component is used for responding to the storage operation of the target labeling result, starting a detection mechanism, acquiring feedback information aiming at the target labeling result, and determining whether the target labeling result meets the accuracy requirement or not based on the feedback information so as to provide a corpus labeling platform from the system architecture level and aim to improve the corpus labeling quality.

Description

Corpus labeling system and electronic equipment
Technical Field
The application relates to the field of data processing, in particular to the field of artificial intelligence.
Background
In recent years, technology in the field of artificial intelligence is rapidly developed and gradually enters into daily life of people. The basic requirement of artificial intelligence is that a machine can accept and process information like a person, and language is used as the most important carrier of the information, so that the method becomes the forefront research direction of the artificial intelligence field. In the process of language model training, a large amount of corpus labeling is needed to improve the model quality. However, at present, researches on corpus labeling are all labeling method layers based on natural language understanding assistance, and are not raised to a system architecture layer yet.
Disclosure of Invention
The embodiment of the application provides a corpus labeling system and electronic equipment, which are used for providing a corpus labeling platform from a system architecture level and aiming at improving corpus labeling quality.
The embodiment of the application provides a corpus labeling system, which at least comprises the following components:
the auxiliary labeling component is used for responding to a labeling request for labeling the corpus data to be labeled, displaying an auxiliary labeling interface, wherein the auxiliary labeling interface at least displays prompt information, and the prompt information characterizes the relevant information of the labeling result of the corpus data associated with the corpus data to be labeled;
the labeling component is used for responding to the input operation of the characterization labeling result and displaying the target labeling result at the position corresponding to the corpus data to be labeled;
the quality detection component is used for responding to the storage operation of the target labeling result, starting a detection mechanism, acquiring feedback information aiming at the target labeling result, and determining whether the target labeling result meets the accuracy requirement or not based on the feedback information.
The embodiment of the application provides a corpus labeling platform, namely a corpus labeling system, which utilizes the prompting information of an auxiliary labeling component in the corpus labeling system to prompt before labeling by labeling personnel, wherein the prompting information displays corpus data related to corpus data to be labeled and labeling results of the corpus data related to the corpus data to be labeled, so that labeling personnel can refer to the related content of the prompting information conveniently, and a foundation is laid for improving labeling quality. In addition, a quality detection component is further arranged in the corpus labeling system, after the target labeling result aiming at the corpus data to be labeled is stored, a detection mechanism is started, and feedback information aiming at the target labeling result is obtained, so that whether the target labeling result meets the requirement or not can be detected by utilizing the quality detection component, and a foundation is further laid for improving the labeling quality.
In one embodiment, the labeling component is further configured to display a corpus data list, where the corpus data list includes corpus data to be labeled, where the corpus labeling result does not meet the accuracy requirement.
In order to avoid unnecessary labeling work and waste human labeling resources, only corpus data which is labeled by an automatic labeling method but the labeling result does not meet the accuracy requirement is labeled, namely corpus data to be labeled in the embodiment is corpus data which is labeled by the automatic labeling method but the labeling result does not meet the accuracy requirement, so that the human labeling resource waste can be avoided, meanwhile, the labeling quality can be improved by a human labeling mode, the engineering requirement is met, and a foundation is laid for the subsequent engineering online service.
In one embodiment, the auxiliary labeling component is further configured to:
the method comprises the steps of obtaining historical labeling results of corpus data associated with the corpus data to be labeled, and taking the corpus data and the historical labeling results associated with the corpus data to be labeled as prompt information.
In this embodiment, the corpus data associated with the corpus data to be annotated and the historical annotation result of the corpus data associated with the corpus data to be annotated are displayed as prompt information, so that the annotators can refer to the historical annotation result of the associated corpus data to assist in completing the annotation operation, and a foundation is laid for improving the annotation quality.
In one embodiment, the auxiliary labeling component is further configured to:
acquiring a preset corpus data set matched with feature information of corpus data to be annotated, wherein the feature information of each preset corpus data in the preset corpus data set is matched with the feature information of each preset corpus data, and the annotation results of the preset corpus data are the same;
and taking the preset corpus data set and the labeling result as prompt information.
In this embodiment, the preset corpus data set with the matched feature information and the labeling result of the preset corpus data set are displayed as the prompt information, so that labeling personnel can refer to the preset corpus data set with the matched feature information and the labeling result of the preset corpus data set, and therefore the labeling operation is assisted to be completed, and a foundation is laid for improving the labeling quality. In addition, in the embodiment, the prompt content of the prompt information is added from different dimensions, the reference opinion with multiple dimensions is given, and a foundation is further laid for improving the labeling quality.
In one embodiment, the auxiliary labeling component is further configured to:
selecting reference corpus data associated with the corpus data to be annotated and an annotation result of the reference corpus data associated with the corpus data to be annotated from an annotation template, wherein the annotation template is characterized by the corresponding relation between the reference corpus data and the annotation result;
And taking the reference corpus data selected from the labeling template and the labeling result as prompt information.
In this embodiment, a labeling template is set, and the labeling operation is performed by using the reference corpus data in the labeling template and the labeling result of the reference corpus data, that is, the reference corpus data and the labeling result thereof associated with the corpus data to be labeled are selected from the labeling template, so that the reference corpus data associated with the corpus data to be labeled and the labeling result of the reference corpus data associated with the corpus data to be labeled are displayed in the prompting information, and therefore, labeling personnel are assisted to complete the labeling operation, and a foundation is laid for improving the labeling quality. In addition, in the embodiment, the prompt content of the prompt information is added from different dimensions, the reference opinion with multiple dimensions is given, and a foundation is further laid for improving the labeling quality.
In one embodiment, the auxiliary labeling component is further configured to:
acquiring a labeling result of the associated corpus data of the corpus data to be labeled;
determining semantic features of the corpus data to be annotated based on the annotation result of the corpus data associated with the corpus data to be annotated;
And taking semantic features of the corpus data to be annotated as prompt information.
In this embodiment, the content of the prompt information is increased from the dimension of the semantic features, that is, the labeling result of the corpus data associated with the corpus data to be labeled is obtained first, then the semantic features of the corpus data to be labeled are obtained through analysis by using the obtained labeling result, that is, the true intention of the corpus data to be labeled is obtained, and further, the semantic features of the corpus data to be labeled are used as the prompt information to assist labeling personnel to complete the labeling process, so that a foundation is laid for improving the labeling quality. In addition, in the embodiment, the prompt content of the prompt information is added from different dimensions, the reference opinion with multiple dimensions is given, and a foundation is further laid for improving the labeling quality.
In one embodiment, the quality detection assembly is further configured to:
detecting whether a preset labeling result of corpus data to be labeled exists or not;
after the existence of the target marking result is determined, matching the target marking result with a preset marking result;
and taking the matching result as feedback information of the target labeling result to determine whether the target labeling result meets the accuracy requirement.
In this embodiment, the quality detection component matches the target labeling result with the preset labeling result, so as to determine whether the target labeling result meets the requirement, thereby realizing the purpose of monitoring the labeling result of the labeling personnel and laying a foundation for improving the labeling quality. And the method is simple and feasible, and lays a foundation for the subsequent engineering online service. Specifically, in order to achieve the purpose of feeding back the labeling result in real time in the labeling process of the labeling personnel, labeling data with a preset labeling result can be mixed in the labeling process of the labeling personnel, the preset labeling result meets the accuracy requirement, therefore, after the storage operation of the target labeling result is detected, the quality detection component firstly determines whether the preset labeling result corresponding to the labeling process of the labeling personnel exists, if so, the target labeling result is matched with the preset labeling result, whether the target labeling result meets the requirement is detected according to the matching result, a foundation is laid for achieving the purpose of real-time monitoring in the labeling process of the labeling personnel, and further a foundation is laid for improving the labeling quality from the monitoring aspect.
In one embodiment, the quality detection assembly is further configured to:
satisfaction information of a target labeling result aiming at corpus data to be labeled is obtained;
and taking satisfaction information of the target labeling result aiming at the corpus data to be labeled as feedback information to determine whether the target labeling result meets the accuracy requirement.
Here, the embodiment detects whether the target labeling result meets the requirement or not by using the online satisfaction information, namely, the satisfaction information aiming at the target labeling result, and further lays a foundation for improving the labeling quality from the monitoring point of view.
In one embodiment, the quality detection assembly is further configured to:
determining a corpus set to which the corpus data to be annotated is subordinate based on the feature information of the corpus data to be annotated, wherein the feature information of the corpus data contained in the corpus set is matched with the feature information of the corpus data to be annotated, and the annotation result of the corpus data contained in the corpus set is matched with the target annotation result of the corpus data to be annotated;
acquiring satisfaction information aiming at a corpus set to which corpus data to be annotated is subordinate;
and taking satisfaction information of the corpus set affiliated to the corpus data to be annotated as feedback information.
Here, the embodiment detects whether the target labeling result meets the requirement or not by utilizing the online satisfaction information, namely, the satisfaction information aiming at the corpus set, and further lays a foundation for improving the labeling quality from the monitoring point of view. Because the characteristic information of the corpus data to be marked is matched with the characteristic information of the corpus data in the corpus set, the target marking result of the corpus data to be marked is matched with the marking result of the corpus data in the corpus set in the marking process, and therefore, the purpose of monitoring can be achieved by using satisfaction information aiming at the corpus set as feedback information aiming at the target marking result of the corpus data to be marked.
In one embodiment, the alarm component is configured to output early warning information after determining that the target labeling result does not meet the accuracy requirement.
Here, in this embodiment, an alarm component is set in the corpus labeling system, so that after detecting that the target labeling result does not meet the accuracy requirement, early warning information is output, so that labeling personnel can be warned by using the early warning information, and a foundation is further laid for improving the labeling quality.
In a second aspect, an embodiment of the present application provides an electronic device, including:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the functions of the corpus tagging system.
One embodiment of the above application has the following advantages or benefits:
the embodiment of the application provides a corpus labeling platform, namely a corpus labeling system, which utilizes the prompting information of an auxiliary labeling component in the corpus labeling system to prompt before labeling by labeling personnel, wherein the prompting information displays corpus data related to corpus data to be labeled and labeling results of the corpus data related to the corpus data to be labeled, so that labeling personnel can refer to the related content of the prompting information conveniently, and a foundation is laid for improving labeling quality. In addition, a quality detection component is further arranged in the corpus labeling system, after the target labeling result aiming at the corpus data to be labeled is stored, a detection mechanism is started, and feedback information aiming at the target labeling result is obtained, so that whether the target labeling result meets the requirement or not can be detected by utilizing the quality detection component, and a foundation is further laid for improving the labeling quality.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:
FIG. 1 is a schematic diagram of a corpus labeling system according to a first embodiment of the present application;
FIG. 2 is a schematic diagram of a corpus labeling system according to a second embodiment of the application;
FIG. 3 is a schematic view of a scenario of a specific application according to an embodiment of the present application;
FIG. 4 is a block diagram of an electronic device for implementing a corpus tagging system of an embodiment of the application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As the current intelligent (on-screen or off-screen) sound box product, tens of millions and even hundreds of millions of language demands of users are received every day, the language demands of the users are understood and met, the correct labels of the language expressions requested by the users on the line are established, and accordingly dictionary repair and language model repair are carried out, further user satisfaction is improved, and product experience is improved. Obviously, the corpus labeling in the process is the most basic, and the corpus labeling quality plays a vital role in subsequent dictionary repair and/or language model processing. Based on the above, the embodiment of the application provides a corpus labeling system, which aims to monitor and manage labeling quality of labeling personnel, improve corpus labeling accuracy and further lay a foundation for outputting reliable corpus labeling results to a model/dictionary.
Specifically, as shown in fig. 1, the system at least includes:
the auxiliary labeling component 101 is configured to respond to a labeling request for labeling the corpus data to be labeled, display an auxiliary labeling interface, where the auxiliary labeling interface displays at least prompt information, and the prompt information characterizes related information of a labeling result of the corpus data associated with the corpus data to be labeled;
the labeling component 102 is used for responding to the input operation of the characterization labeling result and displaying the target labeling result at the corresponding position of the corpus data to be labeled;
the quality detection component 103 is configured to, in response to a save operation on the target labeling result, start a detection mechanism, obtain feedback information for the target labeling result, and determine whether the target labeling result meets an accuracy requirement based on the feedback information.
It should be noted that, the components of the embodiments of the present application may be specifically integrated into one device, or may be integrated into different devices, which is not limited in the embodiments, so long as the technical solution of the embodiments of the present application can be implemented.
In a specific example, as shown in fig. 2, the alarm component 104 is configured to output early warning information after determining that the target labeling result does not meet the accuracy requirement. That is, in this example, an alarm component is set in the corpus labeling system, so that after detecting that the target labeling result does not meet the accuracy requirement, early warning information is output, so that labeling personnel can be warned by using the early warning information, and a foundation is further laid for improving the labeling quality.
In a specific example, in order to avoid unnecessary labeling work and waste human labeling resources, only corpus data which is labeled by an automatic labeling method but the labeling result does not meet the accuracy requirement is labeled, namely, corpus data to be labeled in the example is corpus data which is labeled by the automatic labeling method but the labeling result does not meet the accuracy requirement, so that the human labeling resource waste can be avoided, meanwhile, the labeling quality can be improved by a human labeling mode, the engineering requirement is met, and a foundation is laid for the subsequent engineering online service. It should be noted that the automatic labeling method may be any method in the existing corpus labeling methods, which is not limited in this embodiment of the present application. The labeling component is further used for displaying a corpus data list, wherein the corpus data list comprises corpus data to be labeled, and the corpus labeling result of the corpus data to be labeled does not meet the accuracy requirement.
In the embodiment of the application, the prompting contents of the prompting information can be enriched in the following way, in particular,
mode one: the auxiliary labeling component is also used for: the method comprises the steps of obtaining historical labeling results of corpus data associated with the corpus data to be labeled, and taking the corpus data and the historical labeling results associated with the corpus data to be labeled as prompt information.
In other words, in the method, the corpus data associated with the corpus data to be annotated and the historical annotation result of the corpus data associated with the corpus data to be annotated are displayed as prompt information, so that the annotation personnel can refer to the historical annotation result of the associated corpus data to assist in completing the annotation operation, and a foundation is laid for improving the annotation quality.
In practical application, the history labeling result is a result subjected to satisfaction degree test, namely, the history labeling result is determined to meet accuracy requirements after satisfaction degree information detection, so that information provided for labeling personnel is accurate enough, and a foundation is laid for improving labeling quality.
Mode two: the auxiliary labeling component is also used for: acquiring a preset corpus data set matched with feature information of corpus data to be annotated, wherein the feature information of each preset corpus data in the preset corpus data set is matched with the feature information of each preset corpus data, and the annotation results of each preset corpus data are the same or related; and taking the preset corpus data set and the labeling result as prompt information. The feature information may be specifically semantic features and/or text features corresponding to the corpus data to be annotated.
In other words, in the method, the preset corpus data set with the matched feature information and the labeling result of the preset corpus data set are displayed as prompt information, so that labeling personnel refer to the preset corpus data set with the matched feature information and the labeling result of the preset corpus data set, and therefore the labeling operation is assisted to be completed, and a foundation is laid for improving the labeling quality. In addition, in the embodiment, the prompt content of the prompt information is added from different dimensions, the reference opinion with multiple dimensions is given, and a foundation is further laid for improving the labeling quality.
In practical application, the labeling result corresponding to the preset corpus data set is a result subjected to satisfaction degree test, namely, the labeling result corresponding to the preset corpus data set is determined to meet the accuracy requirement after satisfaction degree information detection, so that the information provided for labeling personnel is accurate enough, and a foundation is laid for improving labeling quality.
In practical application, a similarity calculation method can be adopted to determine a preset corpus data set matched with the feature information of the corpus data to be annotated, and the embodiment of the application does not limit a specific calculation mode.
Mode three: the auxiliary labeling component is also used for: selecting reference corpus data associated with the corpus data to be annotated and an annotation result of the reference corpus data associated with the corpus data to be annotated from an annotation template, wherein the annotation template is characterized by the corresponding relation between the reference corpus data and the annotation result; and taking the reference corpus data selected from the labeling template and the labeling result as prompt information.
That is to say, in this way, a labeling template is set, and the labeling operation is performed by using the reference corpus data in the labeling template and the labeling result of the reference corpus data, that is, the reference corpus data and the labeling result thereof associated with the corpus data to be labeled are selected from the labeling template, so that the reference corpus data associated with the corpus data to be labeled and the labeling result of the reference corpus data associated with the corpus data to be labeled are displayed in the prompting information, thus assisting labeling personnel to complete the labeling operation, and laying a foundation for improving the labeling quality. In addition, in the embodiment, the prompt content of the prompt information is added from different dimensions, the reference opinion with multiple dimensions is given, and a foundation is further laid for improving the labeling quality.
In a specific example, the labeling result of the reference corpus data in the labeling template is a result subjected to satisfaction degree test, namely, the labeling result of the reference corpus data in the labeling template is determined to meet the accuracy requirement after satisfaction degree information detection, so that the information provided for labeling personnel is accurate enough, and a foundation is laid for improving the labeling quality.
In practical application, the labeling template can be further provided with a word-cutting model aiming at a fixed grammar and a word-cutting mode of the word-cutting model, and is further provided with labeling results of the corpus data obtained after word-cutting processing is performed on the reference corpus data by using the word-cutting model, and correspondingly, the auxiliary labeling component can also display the word-cutting processing mode of the word-cutting model and labeling results of the corpus data obtained after word-cutting processing is performed on the reference corpus data.
Mode four: the auxiliary labeling component is also used for: acquiring a labeling result of the associated corpus data of the corpus data to be labeled; determining semantic features of the corpus data to be annotated based on the annotation result of the corpus data associated with the corpus data to be annotated; and taking semantic features of the corpus data to be annotated as prompt information.
In other words, in the method, the content of the prompt information is increased from the dimension of the semantic features, namely, firstly, the labeling result of the corpus data associated with the corpus data to be labeled is obtained, then, the semantic features of the corpus data to be labeled are obtained through analysis by utilizing the obtained labeling result, namely, the real intention of the corpus data to be labeled is obtained, and furthermore, the semantic features of the corpus data to be labeled are used as the prompt information to assist labeling personnel to complete the labeling process, so that the foundation is laid for improving the labeling quality. In addition, in the embodiment, the prompt content of the prompt information is added from different dimensions, the reference opinion with multiple dimensions is given, and a foundation is further laid for improving the labeling quality.
In practical application, the labeling result of the related corpus data of the corpus data to be labeled is a satisfaction check result, namely, the labeling result of the related corpus data of the corpus data to be labeled is determined to meet the accuracy requirement after satisfaction information detection, so that the information provided for labeling personnel is accurate enough, and a foundation is laid for improving labeling quality.
Here, it should be noted that, in practical application, the above four modes may be alternatively performed, or any two or more of the four modes may be selected, which is not limited in this embodiment of the present application.
In an embodiment of the present application, the quality detection assembly may implement the quality detection process in the following manner, specifically,
mode one: the quality detection subassembly is still used for: detecting whether a preset labeling result of corpus data to be labeled exists or not; after the existence of the target marking result is determined, matching the target marking result with a preset marking result; and taking the matching result as feedback information of the target labeling result to determine whether the target labeling result meets the accuracy requirement.
In other words, in the method, the quality detection component matches the target labeling result with the preset labeling result, so that whether the target labeling result meets the requirement is determined, the purpose of monitoring the labeling result of labeling personnel is achieved, and a foundation is laid for improving the labeling quality. And the method is simple and feasible, and lays a foundation for the subsequent engineering online service. Specifically, in order to achieve the purpose of feeding back the labeling result in real time in the labeling process of the labeling personnel, labeling data with a preset labeling result can be mixed in the labeling process of the labeling personnel, the preset labeling result meets the accuracy requirement, therefore, after the storage operation of the target labeling result is detected, the quality detection component firstly determines whether the preset labeling result corresponding to the labeling process of the labeling personnel exists, if so, the target labeling result is matched with the preset labeling result, whether the target labeling result meets the requirement is detected according to the matching result, a foundation is laid for achieving the purpose of real-time monitoring in the labeling process of the labeling personnel, and further a foundation is laid for improving the labeling quality from the monitoring aspect.
Mode two: the quality detection subassembly is still used for: satisfaction information of a target labeling result aiming at corpus data to be labeled is obtained; and taking satisfaction information of the target labeling result aiming at the corpus data to be labeled as feedback information to determine whether the target labeling result meets the accuracy requirement. That is, the embodiment detects whether the target labeling result meets the requirement or not by using the online satisfaction information, namely the satisfaction information which is fed back by the user and aims at the target labeling result, and further lays a foundation for improving the labeling quality from the monitoring point of view.
Mode three: the quality detection subassembly is still used for: determining a corpus set to which the corpus data to be annotated belongs based on the feature information of the corpus data to be annotated; acquiring satisfaction information aiming at a corpus set to which corpus data to be annotated is subordinate; taking satisfaction information of a corpus set affiliated to corpus data to be annotated as feedback information; the feature information of the corpus data contained in the corpus set is matched with the feature information of the corpus data to be annotated, and the annotation result of the corpus data contained in the corpus set is matched with the target annotation result of the corpus data to be annotated.
That is, the method utilizes the online satisfaction information, namely, the satisfaction information aiming at the corpus set to detect whether the target labeling result meets the requirement, and further lays a foundation for improving the labeling quality from the monitoring point of view. Because the characteristic information of the corpus data to be marked is matched with the characteristic information of the corpus data in the corpus set, the target marking result of the corpus data to be marked is matched with the marking result of the corpus data in the corpus set in the marking process, and therefore, the purpose of monitoring can be achieved by using satisfaction information aiming at the corpus set as feedback information aiming at the target marking result of the corpus data to be marked.
Here, it should be noted that, in practical application, the above three modes may be alternatively performed, or any two or more of the four modes may be selected, which is not limited in this embodiment of the present application.
In this way, the embodiment of the application provides a corpus labeling platform, namely a corpus labeling system, which utilizes the prompting information of the auxiliary labeling component in the corpus labeling system to prompt before labeling by labeling personnel, and the prompting information displays corpus data related to the corpus data to be labeled and labeling results of the corpus data related to the corpus data to be labeled, so that labeling personnel can refer to the related content of the prompting information conveniently, and a foundation is laid for improving labeling quality. In addition, a quality detection component is further arranged in the corpus labeling system, after the target labeling result aiming at the corpus data to be labeled is stored, a detection mechanism is started, and feedback information aiming at the target labeling result is obtained, so that whether the target labeling result meets the requirement or not can be detected by utilizing the quality detection component, and a foundation is further laid for improving the labeling quality.
An embodiment of the present application is described in further detail below in conjunction with fig. 3: the corpus data to be annotated corresponding to the corpus annotation system can be online data of lines of the intelligent sound product, namely real voice interaction data of online users for annotation personnel to make corpus annotation. For example, in practical application, the corpus data to be annotated corresponds to the audio data of the intelligent sound, for example, the audio data input by a user to the intelligent sound is obtained, the audio data is analyzed to obtain text data, and the text data is used as the corpus data to be annotated; or in another example, the text data is input into the automatic labeling system, text data, the labeling result of which does not meet the accuracy requirement, output by the automatic labeling system is obtained based on the detection result, the text data, the labeling result of which does not meet the accuracy requirement, output by the automatic labeling system is used as corpus data to be labeled, and then the labeled reliable corpus labeling result is output to the model or the on-line dictionary, so that the quality of the model or the dictionary is improved.
The corpus labeling system in the example comprises four components, namely an auxiliary labeling component, a quality detection component and an alarm component, and the functions of the components are described in detail below by combining the example:
First, supplementary mark subassembly, main function is for the mark personnel provides diversified mark reference, and then promotes mark personnel's work quality in subjectively. Specifically, as shown in fig. 3, a query corpus to be annotated is obtained, and the query corpus to be annotated and the annotation reference information are displayed; further, the component provides four labeling references, respectively:
labeling references one: based on the auxiliary labeling of the historical data, the query corpus (corresponding to the corpus data to be labeled) to be labeled at present can be correctly labeled by other people before, so that a correctly labeled historical labeling result can be used as a reference. Here, the reference is given only, and the labeling person is not required to be labeled completely by the reference, so that there are cases where different intentions are expressed in the same query but in different environments in different scenes.
Labeling reference two: setting a template based on auxiliary labeling of the template, wherein a fixed grammar is arranged in the template, and the query corpus to be labeled can be automatically segmented by utilizing the fixed grammar in the template; further, the template corresponds to a model, and the query corpus after the segmentation treatment of the template can be analyzed by using the model so as to label the segmented query corpus and obtain a labeling result of the segmented query corpus. Therefore, the labeling mode of the template can also be used as auxiliary information for reference of labeling personnel.
Labeling references three: on-line similar query corpus-based auxiliary labeling, for example, a query similarity calculation method such as query rewrite, editing distance and the like is utilized to obtain a similar query set of the query corpus to be labeled, and labeling personnel is assisted to complete labeling through labeling results of the similar query set. Such as: the query corpus to be marked=i want to hear blue and white porcelain, then the similar query set is = { i want to hear blue and white porcelain, i want to hear light blue and white porcelain }, and the like, and the similar queries are taken as a group, so that the marking results are basically consistent.
Labeling reference four: based on the auxiliary labeling of the tandem co-occurrence information, usually, the front and rear labeling results of different users on the same query may be different, so that the primary intention of the same query can be determined by using the front and/or the tandem query statistics of different users of the same query, and thus, labeling personnel can be assisted to complete labeling. Such as: to-be-annotated query corpus=play little bit, and the real intention of the to-be-annotated query corpus can be obtained to be little bit of sound by utilizing the tandem co-occurrence information.
The second labeling component is used for responding to the input operation of labeling personnel on the labeling result and displaying the target labeling result at the corresponding position of the query corpus to be labeled.
The third, quality detection assembly, is used for after detecting the preserving operation to the labeling result of goal, start the detection mechanism, this assembly can realize two kinds of detection mechanisms, one kind is used for detecting in real time, and feed back in real time; another for delayed feedback; and summarizing the detection results of the two types of detection mechanisms. In particular, the method comprises the steps of,
1. the real-time quality detection function is used for monitoring the real-time quality of the labeling data of the labeling personnel, and comprises two modes which are respectively:
mode one: the correct corpus which is marked manually or the correct corpus with higher classification confidence by the marking model is confused into the corpus data to be marked of routine manual marking, and further, after marking personnel finishes marking the correct corpus, the accuracy of marking results can be immediately determined, so that feedback can be immediately carried out when the marking results are inaccurate, and the purpose of instant quality monitoring is realized.
Mode two: setting a manual auditor, and utilizing part of audit operations such as semantic analysis, query rewrite and the like of the label personnel by the manual auditor to further feed back the accuracy of the label result in real time so as to realize the purpose of real-time quality monitoring.
2. Delay quality detection function for monitoring delay quality of labeling data of labeling personnel
According to the method, firstly, the on-line effect satisfaction is searched, the entity cluster is utilized to finish the detection process of the labeling data of the query corpus to be labeled, for example, the high-heat entity cluster is utilized, so that the satisfaction degree data of the entity cluster of the "song=blue-and-white porcelain" can be obtained by regression of the satisfaction degree classification model on the line, and the satisfaction degree information of the entity cluster of the "song=blue-and-white porcelain" can be obtained by feeding back the satisfaction degree information to the labeling personnel in a delayed manner when the "blue-and-white porcelain" exists in the query corpus to be labeled, and the labeling personnel can label the "blue-and-white porcelain" exists in the query corpus to be labeled based on the entity cluster of the "song=blue-and-white porcelain".
In the second mode, on-line effect satisfaction investigation is performed, a detection process of labeling data of the query corpus to be labeled is completed by utilizing the generalized query cluster, for example, sound is ten, sound is twenty, sound is thirty, the query corpus belongs to a class of query, at this time, the generalized query cluster can be obtained based on the characteristics, if the similar content exists in the query corpus to be labeled, labeling can be completed based on the generalized query cluster, at this time, satisfaction information of the generalized query cluster can be fed back to labeling personnel in a delayed mode, and the purpose of detecting labeling quality is achieved.
Fourth, the alarm component is used for outputting early warning information when the labeling result is determined to not meet the accuracy requirement.
In practical application, the labeling personnel which do not meet the requirement of the labeling accuracy rate are trained for a plurality of times, and the training document can be obtained based on the detection result of the quality detection component.
Therefore, the corpus labeling system can improve the labeling quality of corpus labeling personnel under the limited manpower condition, and further, the labeled reliable corpus labeling result is output to a model or an on-line dictionary so as to improve the quality of the model or the dictionary.
According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.
FIG. 4 is a block diagram of an electronic device of a corpus tagging system according to an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 4, the electronic device includes: one or more processors 401, memory 402, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of a graphical user interface (Graphical User Interface, GUI) on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 401 is illustrated in fig. 4.
Memory 402 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the corpus tagging system functions provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the corpus labeling system functions provided by the present application.
The memory 402 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the auxiliary labeling component 101, the labeling component 102, the quality detection component 103, and the alarm component 104 shown in fig. 2) corresponding to the corpus labeling system in the embodiment of the application. The processor 101 executes various functional applications of the server and data processing, i.e., implements the functions of the corpus tagging system in the above-described embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 102.
Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of a corpus tagging system, and the like. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 402 may optionally include memory remotely located with respect to processor 401, which may be connected to the corresponding electronic device of the corpus tagging system via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the corpus labeling system may further include: an input device 403 and an output device 404. The processor 401, memory 402, input device 403, and output device 404 may be connected by a bus or otherwise, for example in fig. 4.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the corpus labeling system, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output device 404 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be implemented in digital electronic circuitry, integrated circuitry, application specific integrated circuits (Application Specific Integrated Circuits, ASIC), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (programmable logic device, PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., CRT (Cathode Ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The embodiment of the application provides a corpus labeling platform, namely a corpus labeling system, which utilizes the prompting information of an auxiliary labeling component in the corpus labeling system to prompt before labeling by labeling personnel, wherein the prompting information displays corpus data related to corpus data to be labeled and labeling results of the corpus data related to the corpus data to be labeled, so that labeling personnel can refer to the related content of the prompting information conveniently, and a foundation is laid for improving labeling quality. In addition, a quality detection component is further arranged in the corpus labeling system, after the target labeling result aiming at the corpus data to be labeled is stored, a detection mechanism is started, and feedback information aiming at the target labeling result is obtained, so that whether the target labeling result meets the requirement or not can be detected by utilizing the quality detection component, and a foundation is further laid for improving the labeling quality.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (10)

1. A corpus labeling system, comprising at least:
the auxiliary labeling component is used for responding to a labeling request for labeling the corpus data to be labeled, displaying an auxiliary labeling interface, wherein the auxiliary labeling interface at least displays prompt information, and the prompt information characterizes related information of a labeling result of the corpus data associated with the corpus data to be labeled; the corpus data to be marked is corpus data which is marked by an automatic marking method, but the marking result does not meet the accuracy requirement;
the auxiliary labeling component is further used for acquiring a preset corpus data set matched with the characteristic information of the corpus data to be labeled; the feature information among all the preset corpus data in the preset corpus data set is matched, and the labeling results of all the preset corpus data are the same; taking the preset corpus data set and the labeling result as the prompt information; the feature information is semantic features and/or text features corresponding to the corpus data to be annotated; the preset corpus data set is determined by adopting similarity calculation; the labeling result corresponding to the preset corpus data set is a result which is determined to meet the accuracy requirement after satisfaction information detection;
The labeling component is used for responding to the input operation of the characterization labeling result and displaying the target labeling result at the position corresponding to the corpus data to be labeled;
the quality detection component is used for responding to the storage operation of the target labeling result, starting a detection mechanism, acquiring feedback information aiming at the target labeling result, and determining whether the target labeling result meets the accuracy requirement or not based on the feedback information;
wherein the detection mechanism comprises a real-time quality detection mechanism and a delay quality detection mechanism; the real-time quality detection mechanism comprises real-time quality detection performed by confusing correct corpus which is marked manually or correct corpus with higher classification confidence of marking model into the corpus data to be marked, and auditing operation performed by utilizing semantic analysis and rewriting contents performed by manual auditors on marking personnel; the delay quality detection mechanism comprises a detection process of the corpus data to be marked by using an entity cluster and a detection process of the corpus data to be marked by using a generalization cluster.
2. The system according to claim 1, wherein the labeling component is further configured to display a corpus data list, the corpus data list including corpus data to be labeled for which a corpus labeling result does not meet an accuracy requirement.
3. The system of claim 1, wherein the auxiliary labeling component is further configured to:
and acquiring a historical labeling result of the corpus data associated with the corpus data to be labeled, and taking the corpus data and the historical labeling result associated with the corpus data to be labeled as the prompt information.
4. The system of claim 1, wherein the auxiliary labeling component is further configured to:
selecting reference corpus data associated with the corpus data to be annotated and an annotation result of the reference corpus data associated with the corpus data to be annotated from an annotation template, wherein the annotation template is characterized by the corresponding relation between the reference corpus data and the annotation result;
and taking the reference corpus data selected from the labeling template and the labeling result as the prompt information.
5. The system of claim 1, wherein the auxiliary labeling component is further configured to:
acquiring a labeling result of the associated corpus data of the corpus data to be labeled;
determining semantic features of the corpus data to be annotated based on the annotation result of the corpus data associated with the corpus data to be annotated;
And taking the semantic features of the corpus data to be annotated as prompt information.
6. The system of any one of claims 1 to 5, wherein the mass detection assembly is further configured to:
detecting whether a preset labeling result of the corpus data to be labeled exists or not;
after the existence of the target marking result is determined, matching the target marking result with the preset marking result;
and taking the matching result as feedback information of the target labeling result to determine whether the target labeling result meets the accuracy requirement.
7. The system of any one of claims 1 to 5, wherein the mass detection assembly is further configured to:
satisfaction information of a target labeling result aiming at corpus data to be labeled is obtained;
and taking satisfaction information of a target labeling result aiming at corpus data to be labeled as the feedback information so as to determine whether the target labeling result meets accuracy requirements.
8. The system of any one of claims 1 to 5, wherein the mass detection assembly is further configured to:
determining a corpus set to which the corpus data to be annotated is subordinate based on the feature information of the corpus data to be annotated, wherein the feature information of the corpus data contained in the corpus set is matched with the feature information of the corpus data to be annotated;
Acquiring satisfaction information of a corpus set affiliated to the corpus data to be annotated;
and taking satisfaction information of the corpus set affiliated to the corpus data to be annotated as the feedback information.
9. The system of claim 1, wherein the alarm component is configured to output early warning information after determining that the target labeling result does not meet an accuracy requirement.
10. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the functionality of the corpus tagging system of any of claims 1-9.
CN201910902201.3A 2019-09-23 2019-09-23 Corpus labeling system and electronic equipment Active CN110674638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910902201.3A CN110674638B (en) 2019-09-23 2019-09-23 Corpus labeling system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910902201.3A CN110674638B (en) 2019-09-23 2019-09-23 Corpus labeling system and electronic equipment

Publications (2)

Publication Number Publication Date
CN110674638A CN110674638A (en) 2020-01-10
CN110674638B true CN110674638B (en) 2023-12-01

Family

ID=69077335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910902201.3A Active CN110674638B (en) 2019-09-23 2019-09-23 Corpus labeling system and electronic equipment

Country Status (1)

Country Link
CN (1) CN110674638B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925910A (en) * 2021-02-25 2021-06-08 中国平安人寿保险股份有限公司 Method, device and equipment for assisting corpus labeling and computer storage medium
CN113312131B (en) * 2021-06-11 2023-04-18 北京百度网讯科技有限公司 Method and device for generating and operating marking tool

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662930A (en) * 2012-04-16 2012-09-12 乐山师范学院 Corpus tagging method and corpus tagging device
CN102831131A (en) * 2011-06-16 2012-12-19 富士通株式会社 Method and device for establishing labeling webpage linguistic corpus
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN105573887A (en) * 2015-12-14 2016-05-11 合一网络技术(北京)有限公司 Quality evaluation method and device of search engine
CN108460011A (en) * 2018-02-01 2018-08-28 北京百度网讯科技有限公司 A kind of entitative concept mask method and system
CN108710612A (en) * 2018-05-22 2018-10-26 腾讯科技(深圳)有限公司 The method, apparatus of semantic tagger, computer equipment, readable storage medium storing program for executing
CN108897869A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 Corpus labeling method, device, equipment and storage medium
CN109062950A (en) * 2018-06-22 2018-12-21 北京奇艺世纪科技有限公司 A kind of method and device of text marking
CN109325213A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for labeled data
CN109902285A (en) * 2019-01-08 2019-06-18 平安科技(深圳)有限公司 Corpus classification method, device, computer equipment and storage medium
CN110019696A (en) * 2017-08-09 2019-07-16 百度在线网络技术(北京)有限公司 Query intention mask method, device, equipment and storage medium
CN110032714A (en) * 2019-02-25 2019-07-19 阿里巴巴集团控股有限公司 A kind of corpus labeling feedback method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10509814B2 (en) * 2014-12-19 2019-12-17 Universidad Nacional De Educacion A Distancia (Uned) System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831131A (en) * 2011-06-16 2012-12-19 富士通株式会社 Method and device for establishing labeling webpage linguistic corpus
CN102662930A (en) * 2012-04-16 2012-09-12 乐山师范学院 Corpus tagging method and corpus tagging device
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN105573887A (en) * 2015-12-14 2016-05-11 合一网络技术(北京)有限公司 Quality evaluation method and device of search engine
CN110019696A (en) * 2017-08-09 2019-07-16 百度在线网络技术(北京)有限公司 Query intention mask method, device, equipment and storage medium
CN108460011A (en) * 2018-02-01 2018-08-28 北京百度网讯科技有限公司 A kind of entitative concept mask method and system
CN108710612A (en) * 2018-05-22 2018-10-26 腾讯科技(深圳)有限公司 The method, apparatus of semantic tagger, computer equipment, readable storage medium storing program for executing
CN109062950A (en) * 2018-06-22 2018-12-21 北京奇艺世纪科技有限公司 A kind of method and device of text marking
CN108897869A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 Corpus labeling method, device, equipment and storage medium
CN109325213A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for labeled data
CN109902285A (en) * 2019-01-08 2019-06-18 平安科技(深圳)有限公司 Corpus classification method, device, computer equipment and storage medium
CN110032714A (en) * 2019-02-25 2019-07-19 阿里巴巴集团控股有限公司 A kind of corpus labeling feedback method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于群体智慧的语料标注方法研究;柯永红 等;《中文信息学报》;第31卷(第4期);第108-113、131页 *

Also Published As

Publication number Publication date
CN110674638A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
JP2021089739A (en) Question answering method and language model training method, apparatus, device, and storage medium
US20210200947A1 (en) Event argument extraction method and apparatus and electronic device
CN110597959B (en) Text information extraction method and device and electronic equipment
US20210216819A1 (en) Method, electronic device, and storage medium for extracting spo triples
US20210256038A1 (en) Method and apparatus for recognizing entity word, and storage medium
US11907671B2 (en) Role labeling method, electronic device and storage medium
US11508162B2 (en) Method and apparatus for detecting mobile traffic light
CN110674638B (en) Corpus labeling system and electronic equipment
US20220027575A1 (en) Method of predicting emotional style of dialogue, electronic device, and storage medium
CN113220836A (en) Training method and device of sequence labeling model, electronic equipment and storage medium
US20210090562A1 (en) Speech recognition control method and apparatus, electronic device and readable storage medium
KR20210090576A (en) A method, an apparatus, an electronic device, a storage medium and a program for controlling quality
WO2021254251A1 (en) Input display method and apparatus, and electronic device
CN111767334A (en) Information extraction method and device, electronic equipment and storage medium
CN111858905A (en) Model training method, information identification method, device, electronic equipment and storage medium
US11481733B2 (en) Automated interfaces with interactive keywords between employment postings and candidate profiles
CN111126063B (en) Text quality assessment method and device
CN110532487B (en) Label generation method and device
JP7309818B2 (en) Speech recognition method, device, electronic device and storage medium
CN111858880A (en) Method and device for obtaining query result, electronic equipment and readable storage medium
CN111708800A (en) Query method and device and electronic equipment
CN112270169B (en) Method and device for predicting dialogue roles, electronic equipment and storage medium
CN112466277B (en) Prosody model training method and device, electronic equipment and storage medium
US20210382918A1 (en) Method and apparatus for labeling data
CN110516030B (en) Method, device and equipment for determining intention word and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210508

Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant after: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

Applicant after: Shanghai Xiaodu Technology Co.,Ltd.

Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant