CN112417850B - Audio annotation error detection method and device - Google Patents

Audio annotation error detection method and device Download PDF

Info

Publication number
CN112417850B
CN112417850B CN202011263694.XA CN202011263694A CN112417850B CN 112417850 B CN112417850 B CN 112417850B CN 202011263694 A CN202011263694 A CN 202011263694A CN 112417850 B CN112417850 B CN 112417850B
Authority
CN
China
Prior art keywords
text
error detection
labeling
confusion
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011263694.XA
Other languages
Chinese (zh)
Other versions
CN112417850A (en
Inventor
张晴晴
朱冬
贾艳明
何淑琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qingshu Intelligent Technology Co ltd
Original Assignee
Beijing Qingshu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qingshu Intelligent Technology Co ltd filed Critical Beijing Qingshu Intelligent Technology Co ltd
Priority to CN202011263694.XA priority Critical patent/CN112417850B/en
Publication of CN112417850A publication Critical patent/CN112417850A/en
Application granted granted Critical
Publication of CN112417850B publication Critical patent/CN112417850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses an error detection method for audio annotation, which comprises the following steps: acquiring audio data and segmenting the audio data into a plurality of audio clips; labeling the audio fragment to obtain an initial labeling text; performing error detection processing on the initial labeling text by adopting a universal text error detection model to obtain a first labeling text; determining a confusion dictionary of the universal text error detection model; identifying the domain category of the first labeling text by adopting a text classification model; according to the field category, performing error detection processing on the first labeling text by adopting a field text error detection model corresponding to the field category so as to obtain a second labeling text; the confusion dictionary of the universal text error detection model and the second labeling text of the field text error detection model are used as a database of the fine tuning model; and performing fine adjustment processing on the second annotation text by adopting a fine adjustment model according to the semantics of the second annotation text so as to obtain a final third annotation text.

Description

Audio annotation error detection method and device
Technical Field
The application belongs to the field of voice recognition, and particularly relates to an error detection method and device for audio annotation.
Background
With the development of speech recognition technology, speech recognition technology is increasingly applied to various fields, such as: the intelligent home for daily life, intelligent application in the education field, intelligent robot in the medical or financial field and other scenes. Current speech recognition techniques rely on deep learning trained speech recognition models to transcribe speech into text, which is then subsequently processed. Efficient and high-accuracy speech recognition models in turn rely on large amounts of high-quality speech data.
However, in the process of implementing the present application, the inventor finds that the voice data required for training the voice recognition model is obtained by adopting a manual labeling method under normal conditions.
At least the following problems exist at present: the labeling quality of each voice is affected by the fatigue degree and knowledge cognition level of the current labeling personnel, and the situation that wrongly written characters exist in the labeling text is unavoidable in the labeling process. Even if the follow-up quality inspector strictly controls, the labeling data possibly obtained finally has text errors, and the use of the data can cause the bending of the trained voice recognition model process, so that the recognition effect is poor. Of course, the quality inspection cost is increased, and the quality inspection pressure of quality inspection personnel is increased.
Disclosure of Invention
The embodiment of the application aims to provide an error detection method and device for audio annotation, which can solve the technical problems that the voice recognition model is low in accuracy and poor in recognition effect due to the fact that the quality of the voice annotation is easily influenced by the fatigue degree and knowledge cognition level of an annotator at present.
In order to solve the technical problems, the application is realized as follows:
In a first aspect, an embodiment of the present application provides an error detection method for audio annotation, including:
Acquiring audio data and segmenting the audio data into a plurality of audio fragments;
labeling the audio fragment to obtain an initial labeling text;
Performing error detection processing on the initial labeling text by adopting a universal text error detection model to obtain a first labeling text;
Determining a confusion dictionary of the universal text error detection model;
identifying the domain category of the first marked text by adopting a text classification model;
according to the domain category, adopting a domain text error detection model corresponding to the domain category to carry out error detection processing on the first labeling text so as to obtain a second labeling text;
A database of a fine tuning model is used as the confusion dictionary of the universal text error detection model and the second marked text of the field text error detection model;
and carrying out fine adjustment processing on the second annotation text by adopting the fine adjustment model according to the semantics of the second annotation text so as to obtain a final third annotation text.
Further, the confusion dictionary includes a personal confusion dictionary and a shared confusion dictionary, and the confusion dictionary for determining the universal text error detection model specifically includes:
After the modification and confirmation of a specific labeling person, recording the text of the labeling error and the occurrence frequency of the labeling error;
When the frequency is higher than a threshold value, adding the text with the labeling error into a personal confusion dictionary of the specific labeling person;
and counting the personal confusion dictionaries of a plurality of labeling personnel, and adding the text with the labeling errors into a shared confusion dictionary when the number of times of the text with the labeling errors is higher than a preset number of times.
Further, the error detection processing is performed on the initial labeling text by adopting a general text error detection model to obtain a first labeling text, which specifically comprises the following steps:
searching out the position of the annotation error by adopting a universal text error detection model;
obtaining a candidate item list for replacing the error label from the confusion dictionary;
Obtaining candidate items from the candidate item list to replace error labels;
Calculating the fluency and confusion of the replaced marked text by adopting an N-gram model;
And determining the best target candidate item according to the fluency and the confusion degree so as to obtain a first labeling text.
Further, according to the domain category, performing error detection processing on the first labeling text by using a domain text error detection model corresponding to the domain category to obtain a second labeling text, and then, further including:
generating error detection information under the condition that the first labeling text has errors;
wherein the error detection information comprises an audio fragment index, an error location index and a candidate word.
Further, the domain categories include: economic, educational, scientific, social, gaming, and recreational.
In a second aspect, an embodiment of the present application provides an error detection apparatus for audio annotation, which is characterized in that the apparatus includes:
the acquisition module is used for acquiring audio data and segmenting the audio data into a plurality of audio fragments;
the labeling module is used for labeling the audio clips to obtain an initial labeling text;
the first error detection module is used for carrying out error detection processing on the initial labeling text by adopting a universal text error detection model so as to obtain a first labeling text;
The determining module is used for determining a confusion dictionary of the universal text error detection model;
the identification module is used for identifying the domain category of the first marked text by adopting a text classification model;
The second error detection module is used for carrying out error detection processing on the first marked text by adopting a field text error detection model corresponding to the field category according to the field category so as to obtain a second marked text;
The warehousing module is used for taking the confusion dictionary of the universal text error detection model and the second labeling text of the field text error detection model as a database of a fine adjustment model;
And the fine tuning module is used for carrying out fine tuning processing on the second annotation text by adopting the fine tuning model according to the semantics of the second annotation text so as to obtain a final third annotation text.
Further, the confusion dictionary includes a personal confusion dictionary and a shared confusion dictionary, and the determining module specifically includes:
The recording sub-module is used for recording the text of the marking errors and the occurrence frequency of the marking errors after the marking errors are modified and confirmed by the specific marking personnel;
a personal dictionary sub-module for adding the mislabeled text to the personal confusion dictionary of the specific labeling person when the frequency is higher than a threshold;
and the shared dictionary sub-module is used for counting the personal confusion dictionaries of a plurality of labeling personnel, and adding the text with the labeling error into the shared confusion dictionary when the frequency of the text with the labeling error is higher than the preset frequency.
Further, the first error detection module specifically includes:
the searching sub-module is used for searching the position of the annotation error by adopting a universal text error detection model;
the obtaining sub-module is used for obtaining a candidate item list for replacing the error label from the confusion dictionary;
the replacing sub-module is used for acquiring candidate items from the candidate item list to replace error labels;
The computing sub-module is used for computing the fluency and confusion degree of the replaced marked text by adopting the N-gram model;
and the determining submodule is used for determining the best target candidate item according to the fluency and the confusion degree so as to obtain a first annotation text.
Further, the error detection apparatus further includes:
the generation module is used for generating error detection information under the condition that the first annotation text has errors;
wherein the error detection information comprises an audio fragment index, an error location index and a candidate word.
Further, the domain categories include: economic, educational, scientific, social, gaming, and recreational.
In the embodiment of the application, the automatic error detection of the audio data is realized through the universal text error detection model, the field text error detection model and the fine adjustment model, the advantages of the universal text error detection model, namely the rapidness and the accuracy are fully utilized, the field category and the upper part and the lower part Wen Yuyi are further considered, the influence of the fatigue degree and the knowledge cognition level of the labeling personnel on the labeling quality is avoided, the labeling quality is improved, and the accuracy and the recognition effect of the voice recognition model are further improved.
Drawings
FIG. 1 is a flow chart of an error detection method for audio annotation according to an embodiment of the present application;
FIG. 2 is a flow chart of another method for detecting errors of audio annotation according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an audio labeling error detection device according to an embodiment of the present application.
Reference numerals illustrate:
30-error detection means, 301-acquisition module, 302-labeling module, 303-first error detection module, 3031-lookup sub-module, 3032-acquisition sub-module, 3033-replacement sub-module, 3034-computation sub-module, 3035-determination sub-module, 304-determination module, 3041-recording sub-module, 3042-personal dictionary sub-module, 3043-shared dictionary sub-module, 305-identification module, 306-second error detection module, 307-binning module, 308-trimming module, 309-generation module.
The achievement of the object, functional features and advantages of the present invention will be further described with reference to the embodiments, referring to the accompanying drawings.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more.
The following describes in detail the speech processing method provided by the embodiment of the present application through specific embodiments and application scenarios thereof with reference to the accompanying drawings.
Example 1
Referring to fig. 1, a flow chart of an audio annotation error detection method provided by an embodiment of the present application is shown, where the audio annotation error detection method includes:
S101: audio data is acquired and is sliced into a plurality of audio clips.
Optionally, voice dotting is performed on the audio data using a voice detection system, and the audio data is sliced according to the dotting.
Alternatively, the audio data may be sliced for a preset period of time, for example, 3 s. The audio data may also be sliced according to phoneme length, e.g. 6 phoneme units.
S102: and labeling the audio fragment to obtain an initial labeling text.
The existing audio labeling method can be adopted for labeling, and the description is omitted here.
S103: and carrying out error detection processing on the initial labeling text by adopting a universal text error detection model so as to obtain a first labeling text.
Specifically, the generic text error detection model includes at least a language model of N-Gram.
The database of the universal text error detection model includes a confusion dictionary, wherein the confusion dictionary includes a word-level pinyin confusion set, a glyph confusion set, and a word-level confusion set.
It should be appreciated that the location of the annotation error can be found by a generic error detection model and the candidate list is looked up from the confusion dictionary.
The first marked text processed by the general text error detection model initially solves some basic confusion errors.
S104: a confusion dictionary of the generic text error detection model is determined.
Specifically, the confusion dictionary of the universal text error detection model can be determined by recording wrong words and corresponding candidate words in the background.
Further, the confusion dictionary may include a personal confusion dictionary for a particular annotator and a shared confusion dictionary for all annotators.
S105: and identifying the domain category of the first marked text by adopting a text classification model.
In particular, the text classification model includes, but is not limited to TextCNN, textRNN, textRCNN, BERT.
The domain categories may include economy, education, science, society, games, and entertainment.
S106: and according to the domain category, adopting a domain text error detection model corresponding to the domain category to carry out error detection processing on the first annotation text so as to obtain a second annotation text.
It is understood that each domain category corresponds to a respective domain text error detection model.
Specifically, the domain text error detection model includes, but is not limited to, BERT and Transformers.
In this embodiment, BERT is taken as a classification model as an example, and a field text error detection model is further described.
And adopting BERT (Bidirectional Encoder Representation from Transformers) to realize the discovery of the error words in the first marked text and the filtering of the candidate words, utilizing a language model (Masked LM) masked in the BERT to mask the first marked text word by word, and finally using a decoder of the BERT to obtain the candidate words from the font confusion set of the confusion dictionary.
And compared with the first labeling text, the second labeling text processed by the field text error detection model has higher accuracy due to the fact that the field of the text is considered.
S107: the confusion dictionary of the universal text error detection model and the second labeling text of the field text error detection model are used as a database of the fine tuning model;
S108: and performing fine adjustment processing on the second annotation text by adopting a fine adjustment model according to the semantics of the second annotation text so as to obtain a final third annotation text.
It should be noted that, the fine tuning model extracts semantic information of the second labeling text by using the confusion dictionary of the general text error detection model and the second labeling text of the field text error detection model, and further performs error detection according to the upper and lower Wen Yuyi so as to obtain a final third labeling text.
For example, the second labeled text processed by the general text error detection model and the domain text error detection model is "in order to escape from the surrounding of the enemy, his winning is very strong", and is correct when the audio clip is used for judgment. However, when the context semantic recognition fine tuning model is adopted, the "winning desire" can be checked to be the "survival desire".
In the embodiment of the application, the automatic error detection of the audio data is realized through the universal text error detection model, the field text error detection model and the fine adjustment model, the advantages of the universal text error detection model, namely the rapidness and the accuracy are fully utilized, the field category and the upper part and the lower part Wen Yuyi are further considered, the influence of the fatigue degree and the knowledge cognition level of the labeling personnel on the labeling quality is avoided, the labeling quality is improved, and the accuracy and the recognition effect of the voice recognition model are further improved.
Example two
Referring to fig. 2, a flow chart of another audio annotation error detection method provided by an embodiment of the present application is shown, and a speech processing method includes:
s201: audio data is acquired and is sliced into a plurality of audio clips.
S202: and labeling the audio fragment to obtain an initial labeling text.
S203: and (5) adopting a universal text error detection model to find out the position of the labeling error.
S204: a candidate list for replacing the error label is obtained from the confusion dictionary.
S205: and obtaining candidates from the candidate list to replace the error labels.
Optionally, the candidate with the highest priority in the candidate list is selected for replacement.
S206: and calculating the fluency and confusion degree of the replaced marked text by adopting an N-gram model.
It should be appreciated that the higher the fluency, the lower the confusion, the higher the accuracy of labeling the text. Conversely, the lower the fluency, the higher the confusion, the lower the accuracy of labeling the text.
S207: and determining the best target candidate item according to the fluency and the confusion degree so as to obtain the first labeling text.
It should be appreciated that the best target candidate may have the highest fluency and/or lowest confusion after replacement.
By comparing the fluency and the confusion, the accuracy of selecting the best target candidate can be improved.
S208: after the modification and confirmation by a specific labeling person, the text of the labeling error and the frequency of the labeling error are recorded.
S209: and when the frequency is higher than the threshold value, adding the text marked with errors into a personal confusion dictionary of a specific marking person.
S210: and counting personal confusion dictionaries of a plurality of labeling personnel, and adding the text with the labeling error into the shared confusion dictionary when the occurrence frequency of the text with the labeling error is higher than a preset frequency.
In this embodiment, the confusion dictionary generally includes a personal confusion dictionary and a shared confusion dictionary.
The personal confusion dictionary and the shared confusion dictionary can achieve the combined effect of personalized error correction and commonality error sharing.
S211: and identifying the domain category of the first marked text by adopting a text classification model.
S212: and according to the domain category, adopting a domain text error detection model corresponding to the domain category to carry out error detection processing on the first annotation text so as to obtain a second annotation text.
S213: and generating error detection information under the condition that the first labeling text has errors.
Wherein the error detection information includes an audio clip index, an error location index, and a candidate word.
The specific position where the error occurs can be rapidly positioned through the audio fragment index, the error position index and the candidate words, and the error detection efficiency is improved.
S214: and using the confusion dictionary of the universal text error detection model and the second labeling text of the field text error detection model as a database of the fine tuning model.
S215: and performing fine adjustment processing on the second annotation text by adopting a fine adjustment model according to the semantics of the second annotation text so as to obtain a final third annotation text.
In the embodiment of the application, the accuracy of selecting the optimal target candidate item is improved through the comparison of the fluency and the confusion degree, and the effect of combining personalized error correction and commonality error sharing is achieved through the forms of the personal confusion dictionary and the shared confusion dictionary, so that the influence of the fatigue degree and knowledge cognition level of the labeling personnel on the labeling quality can be further avoided, and the labeling quality is improved.
Example III
Referring to fig. 3, a schematic structural diagram of an audio labeling error detection device according to an embodiment of the present application is shown, where the error detection device 30 includes:
an obtaining module 301, configured to obtain audio data, and segment the audio data into a plurality of audio segments;
The labeling module 302 is configured to label the audio clip to obtain an initial labeling text;
The first error detection module 303 is configured to perform error detection processing on the initial labeling text by using a general text error detection model, so as to obtain a first labeling text;
A determining module 304, configured to determine a confusion dictionary of the generic text error detection model;
an identifying module 305, configured to identify a domain category of the first labeled text using a text classification model;
The second error detection module 306 is configured to perform error detection processing on the first labeling text by using a domain text error detection model corresponding to the domain category according to the domain category, so as to obtain a second labeling text;
A warehousing module 307, configured to use the confusion dictionary of the universal text error detection model and the second labeling text of the domain text error detection model as a database of the fine tuning model;
And the fine tuning module 308 is configured to perform fine tuning processing on the second labeling text by using a fine tuning model according to the semantics of the second labeling text, so as to obtain a final third labeling text.
Further, the confusion dictionary includes a personal confusion dictionary and a shared confusion dictionary, and the determining module 304 specifically includes:
the recording submodule 3041 is used for recording the text of the labeling error and the occurrence frequency of the labeling error after the modification and confirmation of a specific labeling person;
the personal dictionary submodule 3042 is used for adding the text with the wrong labeling into the personal confusion dictionary of the specific labeling person when the frequency is higher than the threshold value;
The shared dictionary submodule 3043 is used for counting personal confusion dictionaries of a plurality of labeling personnel, and adding the text with the labeling error into the shared confusion dictionary when the frequency of the text with the labeling error is higher than the preset frequency.
Further, the first error detection module 303 specifically includes:
A searching sub-module 3031, configured to search a location of the labeling error by using a universal text error detection model;
An obtaining sub-module 3032, configured to obtain a candidate list for replacing the error label from the confusion dictionary;
a replacing sub-module 3033, configured to obtain a candidate item from the candidate item list to replace the error label;
a calculating submodule 3034, configured to calculate fluency and confusion of the replaced labeled text by using the N-gram model;
a determining sub-module 3035, configured to determine, according to the fluency and the confusion degree, the best target candidate, so as to obtain the first labeling text.
Further, the error detection device 30 further includes:
A generating module 309, configured to generate error detection information when the first markup text has an error;
wherein the error detection information includes an audio clip index, an error location index, and a candidate word.
Further, the domain categories include: economic, educational, scientific, social, gaming, and recreational.
The error detection device 30 provided in the embodiment of the present application can implement each process implemented in the above method embodiment, and in order to avoid repetition, a description is omitted here.
In the embodiment of the application, the automatic error detection of the audio data is realized through the universal text error detection model, the field text error detection model and the fine adjustment model, the advantages of the universal text error detection model, namely the rapidness and the accuracy are fully utilized, the field category and the upper part and the lower part Wen Yuyi are further considered, the influence of the fatigue degree and the knowledge cognition level of the labeling personnel on the labeling quality is avoided, the labeling quality is improved, and the accuracy and the recognition effect of the voice recognition model are further improved.
The virtual device in the embodiment of the application can be a device, and also can be a component, an integrated circuit or a chip in a terminal.
The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims (10)

1. An error detection method for audio annotation, comprising:
Acquiring audio data and segmenting the audio data into a plurality of audio fragments;
labeling the audio fragment to obtain an initial labeling text;
Performing error detection processing on the initial labeling text by adopting a universal text error detection model to obtain a first labeling text;
Determining a confusion dictionary of the universal text error detection model;
identifying the domain category of the first marked text by adopting a text classification model;
according to the domain category, adopting a domain text error detection model corresponding to the domain category to carry out error detection processing on the first labeling text so as to obtain a second labeling text;
A database of a fine tuning model is used as the confusion dictionary of the universal text error detection model and the second marked text of the field text error detection model;
and carrying out fine adjustment processing on the second annotation text by adopting the fine adjustment model according to the semantics of the second annotation text so as to obtain a final third annotation text.
2. The error detection method of claim 1, wherein the confusion dictionary comprises a personal confusion dictionary and a shared confusion dictionary, and wherein the determining the confusion dictionary of the generic text error detection model specifically comprises:
After the modification and confirmation of a specific labeling person, recording the text of the labeling error and the occurrence frequency of the labeling error;
When the frequency is higher than a threshold value, adding the text with the labeling error into a personal confusion dictionary of the specific labeling person;
and counting the personal confusion dictionaries of a plurality of labeling personnel, and adding the text with the labeling errors into a shared confusion dictionary when the number of times of the text with the labeling errors is higher than a preset number of times.
3. The method for detecting errors according to claim 1, wherein the error detection processing is performed on the initial labeling text by using a general text error detection model to obtain a first labeling text, and specifically comprises:
searching out the position of the annotation error by adopting a universal text error detection model;
obtaining a candidate item list for replacing the error label from the confusion dictionary;
Obtaining candidate items from the candidate item list to replace error labels;
Calculating the fluency and confusion of the replaced marked text by adopting an N-gram model;
And determining the best target candidate item according to the fluency and the confusion degree so as to obtain a first labeling text.
4. The method according to claim 1, wherein the performing error detection processing on the first markup text by using a domain text error detection model corresponding to the domain category according to the domain category to obtain a second markup text, and further comprises:
generating error detection information under the condition that the first labeling text has errors;
wherein the error detection information comprises an audio fragment index, an error location index and a candidate word.
5. The error detection method of claim 1, wherein the domain categories include: economic, educational, scientific, social, gaming, and recreational.
6. An audio-noted error detection apparatus, comprising:
the acquisition module is used for acquiring audio data and segmenting the audio data into a plurality of audio fragments;
the labeling module is used for labeling the audio clips to obtain an initial labeling text;
the first error detection module is used for carrying out error detection processing on the initial labeling text by adopting a universal text error detection model so as to obtain a first labeling text;
The determining module is used for determining a confusion dictionary of the universal text error detection model;
the identification module is used for identifying the domain category of the first marked text by adopting a text classification model;
The second error detection module is used for carrying out error detection processing on the first marked text by adopting a field text error detection model corresponding to the field category according to the field category so as to obtain a second marked text;
The warehousing module is used for taking the confusion dictionary of the universal text error detection model and the second labeling text of the field text error detection model as a database of a fine adjustment model;
And the fine tuning module is used for carrying out fine tuning processing on the second annotation text by adopting the fine tuning model according to the semantics of the second annotation text so as to obtain a final third annotation text.
7. The error detection apparatus of claim 6, wherein the confusion dictionary comprises a personal confusion dictionary and a shared confusion dictionary, and wherein the determining module specifically comprises:
The recording sub-module is used for recording the text of the marking errors and the occurrence frequency of the marking errors after the marking errors are modified and confirmed by the specific marking personnel;
a personal dictionary sub-module for adding the mislabeled text to the personal confusion dictionary of the specific labeling person when the frequency is higher than a threshold;
and the shared dictionary sub-module is used for counting the personal confusion dictionaries of a plurality of labeling personnel, and adding the text with the labeling error into the shared confusion dictionary when the frequency of the text with the labeling error is higher than the preset frequency.
8. The error detection apparatus of claim 6, wherein the first error detection module specifically comprises:
the searching sub-module is used for searching the position of the annotation error by adopting a universal text error detection model;
the obtaining sub-module is used for obtaining a candidate item list for replacing the error label from the confusion dictionary;
the replacing sub-module is used for acquiring candidate items from the candidate item list to replace error labels;
The computing sub-module is used for computing the fluency and confusion degree of the replaced marked text by adopting the N-gram model;
and the determining submodule is used for determining the best target candidate item according to the fluency and the confusion degree so as to obtain a first annotation text.
9. The error detection apparatus of claim 6, further comprising:
the generation module is used for generating error detection information under the condition that the first annotation text has errors;
wherein the error detection information comprises an audio fragment index, an error location index and a candidate word.
10. The error detection apparatus of claim 6, wherein the domain categories include: economic, educational, scientific, social, gaming, and recreational.
CN202011263694.XA 2020-11-12 2020-11-12 Audio annotation error detection method and device Active CN112417850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011263694.XA CN112417850B (en) 2020-11-12 2020-11-12 Audio annotation error detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011263694.XA CN112417850B (en) 2020-11-12 2020-11-12 Audio annotation error detection method and device

Publications (2)

Publication Number Publication Date
CN112417850A CN112417850A (en) 2021-02-26
CN112417850B true CN112417850B (en) 2024-07-02

Family

ID=74831047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011263694.XA Active CN112417850B (en) 2020-11-12 2020-11-12 Audio annotation error detection method and device

Country Status (1)

Country Link
CN (1) CN112417850B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053393B (en) * 2021-03-30 2024-04-30 闽江学院 Audio annotation processing device
CN113421553B (en) * 2021-06-15 2023-10-20 北京捷通数智科技有限公司 Audio selection method, device, electronic equipment and readable storage medium
CN114896965B (en) * 2022-05-17 2023-09-12 马上消费金融股份有限公司 Text correction model training method and device, text correction method and device
CN115146622B (en) * 2022-07-21 2023-05-05 平安科技(深圳)有限公司 Data annotation error correction method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390928A (en) * 2019-08-07 2019-10-29 广州多益网络股份有限公司 It is a kind of to open up the speech synthesis model training method and system for increasing corpus automatically
CN110532522A (en) * 2019-08-22 2019-12-03 深圳追一科技有限公司 Error-detecting method, device, computer equipment and the storage medium of audio mark

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017151757A1 (en) * 2016-03-01 2017-09-08 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Recurrent neural feedback model for automated image annotation
CN110786847B (en) * 2018-08-02 2022-11-04 深圳市理邦精密仪器股份有限公司 Electrocardiogram signal library building method and analysis method
CN110968695A (en) * 2019-11-18 2020-04-07 罗彤 Intelligent labeling method, device and platform based on active learning of weak supervision technology
CN111476783B (en) * 2020-04-13 2022-11-15 腾讯科技(深圳)有限公司 Image processing method, device and equipment based on artificial intelligence and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390928A (en) * 2019-08-07 2019-10-29 广州多益网络股份有限公司 It is a kind of to open up the speech synthesis model training method and system for increasing corpus automatically
CN110532522A (en) * 2019-08-22 2019-12-03 深圳追一科技有限公司 Error-detecting method, device, computer equipment and the storage medium of audio mark

Also Published As

Publication number Publication date
CN112417850A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN112417850B (en) Audio annotation error detection method and device
CN107291783B (en) Semantic matching method and intelligent equipment
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
CN110046350B (en) Grammar error recognition method, device, computer equipment and storage medium
CN107729313B (en) Deep neural network-based polyphone pronunciation distinguishing method and device
CN109800414B (en) Method and system for recommending language correction
CN111160031A (en) Social media named entity identification method based on affix perception
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN110119510B (en) Relationship extraction method and device based on transfer dependency relationship and structure auxiliary word
CN112883732A (en) Method and device for identifying Chinese fine-grained named entities based on associative memory network
US20230069935A1 (en) Dialog system answering method based on sentence paraphrase recognition
Sitaram et al. Speech synthesis of code-mixed text
CN113380223B (en) Method, device, system and storage medium for disambiguating polyphone
CN112818680B (en) Corpus processing method and device, electronic equipment and computer readable storage medium
CN113312914B (en) Security event entity identification method based on pre-training model
CN109614623B (en) Composition processing method and system based on syntactic analysis
CN112101032A (en) Named entity identification and error correction method based on self-distillation
Nguyen et al. Domain-shift conditioning using adaptable filtering via hierarchical embeddings for robust Chinese spell check
CN112417132A (en) New intention recognition method for screening negative samples by utilizing predicate guest information
CN109086274A (en) English social media short text time expression recognition method based on restricted model
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt
CN112183060B (en) Reference resolution method of multi-round dialogue system
CN111274354B (en) Referee document structuring method and referee document structuring device
US20120197894A1 (en) Apparatus and method for processing documents to extract expressions and descriptions
CN112071304B (en) Semantic analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 411, 4th floor, building 4, No.44, Middle North Third Ring Road, Haidian District, Beijing 100088

Applicant after: Beijing Qingshu Intelligent Technology Co.,Ltd.

Address before: 100044 1415, 14th floor, building 1, yard 59, gaoliangqiaoxie street, Haidian District, Beijing

Applicant before: BEIJING AISHU WISDOM TECHNOLOGY CO.,LTD.

CB02 Change of applicant information
GR01 Patent grant