CN112002311A - Text error correction method and device, computer readable storage medium and terminal equipment - Google Patents

Text error correction method and device, computer readable storage medium and terminal equipment Download PDF

Info

Publication number
CN112002311A
CN112002311A CN201910387845.3A CN201910387845A CN112002311A CN 112002311 A CN112002311 A CN 112002311A CN 201910387845 A CN201910387845 A CN 201910387845A CN 112002311 A CN112002311 A CN 112002311A
Authority
CN
China
Prior art keywords
text data
text
candidate
usage scenario
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910387845.3A
Other languages
Chinese (zh)
Inventor
毛俊峰
李靖阳
郭泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TCL Corp
TCL Research America Inc
Original Assignee
TCL Research America Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TCL Research America Inc filed Critical TCL Research America Inc
Priority to CN201910387845.3A priority Critical patent/CN112002311A/en
Publication of CN112002311A publication Critical patent/CN112002311A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of voice recognition, and particularly relates to a text error correction method and device, a computer readable storage medium and terminal equipment. The method comprises the steps that first text data are obtained, wherein the first text data are output text data after voice recognition is carried out on input voice by preset terminal equipment; determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and context information of the first text data; determining a confusion set of the first text data according to the usage scenario; performing error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data; and optimizing the second text data according to the confusion set to obtain third text data. According to the method and the device, based on methods such as deep learning and iterative learning, different error corrections are performed on the error text according to different scenes, and the error correction accuracy rate is greatly improved.

Description

Text error correction method and device, computer readable storage medium and terminal equipment
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a text error correction method and device, a computer readable storage medium and terminal equipment.
Background
Artificial intelligence and AI technology develop rapidly, and voice interaction mode appears on various intelligent terminal devices, but the requirements of users cannot be well realized. The fundamental reason is that the Speech Recognition (ASR) method is not effective, and thus the ASR text error correction method appears. The current ASR text error correction method can finish fixed error correction in some simple scenes, for example, can correct ' I want to order music ' into ' I want to listen to music ', i need to be provided with auxiliary eyes ' into ' I need to be provided with auxiliary glasses ' and the like, but the current ASR text error correction method does not consider that the same sentence has different error correction tasks in different contexts when the complex scenes are interacted, so that correct results cannot be obtained frequently, and the error correction accuracy is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a text error correction method, a text error correction device, a computer-readable storage medium, and a terminal device, so as to solve the problem that the error correction accuracy of the existing ASR text error correction method is low.
A first aspect of an embodiment of the present invention provides a text error correction method, which may include:
acquiring first text data, wherein the first text data is output text data after voice recognition is carried out on input voice by preset terminal equipment;
determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and context information of the first text data;
determining a confusion set of the first text data according to the usage scenario;
performing error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data;
and optimizing the second text data according to the confusion set to obtain third text data.
Further, the determining the usage scenario of the first text data according to the current state of the terminal device, the first text data and the context information of the first text data includes:
determining a first candidate use scene of the first text data according to the current state of the terminal equipment;
determining a second candidate use scene of the first text data according to the first text data and the context information of the first text data;
determining a usage scenario of the first text data according to the first candidate usage scenario and the second candidate usage scenario.
Further, the training process of the deep learning model comprises the following steps:
acquiring a training data set, wherein each piece of training data in the training data set comprises input data and a label, the input data is error text data and context information output by voice recognition, and the label is a correct result obtained after error correction is performed on the error text data;
and training the deep learning model by using the training data set to obtain the trained deep learning model.
Further, the training process of the iterative learning model comprises:
judging whether the iterative learning model is activated or not through a preset identifier, wherein the activated iterative learning model receives feedback information;
and training the iterative model by using the training data set to obtain the trained iterative model.
Further, the optimizing the second text data according to the confusion set to obtain third text data includes:
constructing a candidate set of the second text data according to the confusion set, wherein the candidate set comprises each candidate text data of the second text data;
respectively calculating scores of the second text data and each candidate text data by using a preset score function;
selecting candidate text data with the highest score as preferred text data;
if the difference between the scores of the preferred text data and the second text data is larger than a preset threshold value, determining the preferred text data as the third text data;
and if the difference between the scores of the preferred text data and the second text data is less than or equal to the threshold value, determining the second text data as the third text data.
A second aspect of an embodiment of the present invention provides a text error correction apparatus, which may include:
the text data acquisition module is used for acquiring first text data, wherein the first text data is output text data after voice recognition is carried out on input voice by preset terminal equipment;
the usage scenario determining module is used for determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and the context information of the first text data;
a confusion set determining module for determining a confusion set of the first text data according to the usage scenario;
the error correction module is used for carrying out error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data;
and the optimization module is used for optimizing the second text data according to the confusion set to obtain third text data.
Further, the usage scenario determination module may include:
a first candidate usage scenario determining unit, configured to determine a first candidate usage scenario of the first text data according to a current state of the terminal device;
a second candidate usage scenario determination unit, configured to determine a second candidate usage scenario of the first text data according to the first text data and context information of the first text data;
a usage scenario determination unit configured to determine a usage scenario of the first text data according to the first candidate usage scenario and the second candidate usage scenario.
Further, the text correction apparatus may further include:
a training data set obtaining module, configured to obtain a training data set, where each piece of training data in the training data set includes input data and a label, the input data is error text data and context information output by speech recognition, and the label is a correct result obtained after error correction is performed on the error text data;
and the first model training module is used for training the deep learning model by using the training data set to obtain the trained deep learning model.
Further, the text correction apparatus may further include:
the feedback information receiving module is used for judging whether the iterative learning model is activated or not through a preset identifier, and the activated iterative learning model receives feedback information;
and the second model training module is used for training the iterative model by using the training data set to obtain the trained iterative model.
Further, the optimization module may include:
a candidate set constructing unit, configured to construct a candidate set of the second text data according to the confusion set, where the candidate set includes candidate text data of the second text data;
the score calculating unit is used for calculating scores of the second text data and each candidate text data by using a preset score function;
the preferred text data selecting unit is used for selecting the candidate text data with the highest score as the preferred text data;
a first determining unit, configured to determine the preferred text data as the third text data if a difference between scores of the preferred text data and the second text data is greater than a preset threshold;
a second determining unit, configured to determine the second text data as the third text data if a difference between the scores of the preferred text data and the second text data is less than or equal to the threshold.
A third aspect of embodiments of the present invention provides a computer-readable storage medium storing computer-readable instructions, which, when executed by a processor, implement the steps of any one of the above-mentioned text error correction methods.
A fourth aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor implements the steps of any one of the text error correction methods when executing the computer readable instructions.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: acquiring first text data, wherein the first text data is output text data after voice recognition is carried out on input voice by preset terminal equipment; determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and context information of the first text data; determining a confusion set of the first text data according to the usage scenario; performing error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data; and optimizing the second text data according to the confusion set to obtain third text data. According to the embodiment of the invention, based on methods such as deep learning and iterative learning, a set of complete optimization methods are designed for the text recognized by ASR in a complex scene, and different error corrections are performed on the error text according to different scenes, so that the error correction accuracy is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of an embodiment of a text error correction method according to an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of a usage scenario for determining first text data;
FIG. 3 is a schematic flow chart of an optimization process performed on second text data;
FIG. 4 is a block diagram of an embodiment of a text correction apparatus according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of a terminal device in an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of a text error correction method according to an embodiment of the present invention may include:
and step S101, acquiring first text data.
The first text data is output text data after voice recognition is carried out on input voice by preset terminal equipment.
When a user needs to interact with the terminal equipment through voice, the user can speak the content which the user wants to express, the terminal equipment acquires the input voice of the user through the voice acquisition equipment, performs voice recognition on the input voice and outputs a recognition result, namely the first text data. It should be noted that the first text data may not be consistent with the actual content that the user wants to express, for example, the content that the user says "i want to see the langue bar" and the result of the voice recognition output is "i want to see the langue bar", so that further error correction processing needs to be performed on the first text data through subsequent steps.
Step S102, determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and the context information of the first text data.
In this embodiment, specific use scenes such as a movie scene, a shopping scene, a region scene, a news scene, and the like can be divided according to actual conditions, and various scenes can be preset, so that the meaning of the related text content can be judged according to the scenes, because the same text may have different meanings in different scenes.
As shown in fig. 2, step S102 may specifically include the following processes:
step S1021, determining a first candidate use scene of the first text data according to the current state of the terminal equipment.
The current state of the terminal device specifically refers to which application program (app) is currently used, and after the current state of the terminal device is determined, the current state of the terminal device can be converted into an alpha value by a preset rule method, wherein the alpha value represents a usage scenario determined according to the current state of the terminal device, namely the first candidate usage scenario. For example, when the current state of the terminal device is a map app, determining that the α value is a region scene classification, and when the current state of the terminal device is a pan app, determining that the α value is a shopping scene classification, and so on, if the current state of the terminal device cannot obtain the determined α value, setting the α value to 0.
Step S1022, determining a second candidate usage scenario of the first text data according to the first text data and the context information of the first text data.
In this embodiment, a neural network model classifier may be constructed based on deep learning techniques. The number of the training data classification of the classifier is determined by the actual situation of the terminal device, and different labels are respectively set to be 1,2, 3. In training the classifier, pre-processing operations are required on the training data, and these pre-processing operations include, but are not limited to, generating a dictionary using a text segmentation technique, processing text using a stop-word technique, obtaining an Embedding matrix using a text feature extraction technique, and vectorizing the text to generate input data. The neural network model structure used specifically may be a DNN model, a CNN model, an RNN model, a fasttext model, or the like, and is not limited specifically herein, it should be noted that the model needs to perform a synchronous preprocessing operation on the context information of the current error correction statement, and merge the context information into the input training data, and if there is no context information, the model may be filled with meaningless characters (e.g., < pad >). Training is carried out by inputting training data into the model, and a trained scene classification model is obtained.
And inputting the first text data and the context information of the first text data into the scene classification model to obtain an output result beta value, wherein the beta value represents the use scene determined by the scene classification model, namely the second candidate use scene.
Step S103, determining the use scene of the first text data according to the first candidate use scene and the second candidate use scene.
Specifically, a final classification result r value may be calculated according to the following formula, using a classification result α value determined according to the current state of the terminal device and a classification result β value obtained according to a scene classification model: and r ═ α + (1-) β, where the value is a preset empirical value, and the value interval is [0, 1], and the specific value can be set according to practical situations, for example, it can be set to 0.2, 0.3, 0.5, or other values. The r value represents the finally determined usage scenario.
Example one: when a user opens movie and television apps such as an Aichi art app and a Youke app, the voice is input into the 'wo xiang zhao lang ya bang', and the 'I wants to find the wolf teeth stick' is obtained after the ASR recognition. Through the method, the classification r as a movie scene can be obtained by processing the wolf tooth stick and the context information thereof.
Example two: when a user opens map apps such as a Baidu map app and a Gaode map app, the voice is input to 'wo yao zhao lang ya bang', and 'I want to find a wolf tooth stick' is obtained after ASR recognition. By processing the wolf tooth stick to be found by me and the context information thereof through the method, the scene of the classification r as the region can be obtained.
And step S103, determining a confusion set of the first text data according to the use scene.
In this embodiment, a confusion set, which is a set composed of various possible candidate values of words in the first text data, may be constructed by using a method of a conventional machine learning language model. The steps are mainly divided into two parts: building a language model, and building a confusion set. And identifying and extracting wrong words in the ASR identification result by using the constructed language model, constructing n corresponding confusion sets by using the classification number n of the scene classification module, wherein each confusion set has different emphasis in different classification scenes, and the results of the scene classification module and the words obtained by the language model are jointly limited.
The language model construction uses a character-based bidirectional n-gram LM, a method for carrying out error recognition by utilizing maximum entropy classification and the like, the confusion set is also constructed by using a traditional machine learning method, the format of the data set is { key: value } key value pair, and the error category of the confusion set comprises common errors such as adjective errors, pronunciation confusion, shape confusion and the like. And finally, constructing n confusion sets, and selecting the corresponding confusion sets according to the result of the scene classification module for output.
The difference between the language model and the confusion set constructed in the embodiment and the conventional method is that n confusion sets need to be constructed, each confusion set is different in the correction emphasis point of common errors in corresponding classification, and for different terminal devices, the emphasis points of the confusion sets in the same classification are different, and the difference is whether the terminal device can realize related functions. For all the traditional error correction methods, the quality of the confusion sets has a large influence on the final recognition result, n confusion sets are constructed after classification in the embodiment, the construction difficulty of each confusion set is relatively simple, and the accuracy of the final recognition result is improved.
Example one: the classification result r of the wolf teeth stick is obtained by the scene classification module as the movie scene, and the confusion set C corresponding to the classification result r is processed by the language modelr ═ film and television scenesThe outputs are as follows: cr ═ film and television scenesThe Chinese characters are that { "wolf teeth stick" - "Lanya", "wolf teeth stick" - "wolf teeth side", "wolf teeth stick" - "wolf teeth mountain" -, was.
Example two: the classification result r of the wolf teeth stick is obtained by the scene classification module as the region scene, and the confusion set C corresponding to the classification result r is collected through the language modelr is regional sceneThe outputs are as follows: cr is regional sceneThe two types of the wolf teeth are arranged in the order of priority from high to low.
And S104, performing error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data.
The step aims to perform error correction processing on the first text data by using an error correction module composed of a deep learning model and an iterative learning model to obtain an ASR error correction result, namely the second text data. The deep learning model is used in the prior art, and is not described herein again. However, the deep learning model is improved and the iterative learning model is added, so that the deep learning model can continue to perform iterative optimization after training is completed. The iterative training aims to fine-tune and optimize the deep learning model, the more time the user uses the terminal equipment is, the richer the data generated by human-computer interaction is, the better the effect of the deep learning model after iterative training is, the fewer the errors in the error correction result sentence are, the smaller the candidate set which needs to be constructed by subsequent correction processing is, and the efficiency and the accuracy can be correspondingly improved.
The implementation method of the deep learning model is a deep learning technology, a neural network model capable of correcting the ASR result is constructed in the embodiment, and the deep learning model and the iterative learning model are combined to train an optimization result. In the deep learning model, a hierarchical model method is used for improving a coding model structure to enable the coding model structure to have context processing capability, and the overall framework is a BiGRU and Encoder-Decoder mode. The training process of the deep learning model comprises the following steps: acquiring a training data set, wherein each piece of training data in the training data set comprises input data and a label, the input data is error text data and context information output by voice recognition, and the label is a correct result obtained after error correction is performed on the error text data; and training the deep learning model by using the training data set to obtain the trained deep learning model. In the use stage, the iterative learning model can be used to optimize the deep learning model.
The implementation method of the iterative learning model is a reinforcement learning method, and aims to construct an iterative learning model and a deep learning model for combined training. The internal structure of the iterative learning model mainly refers to the ideas of an AC model, a DQN model and a NAF model, the preprocessing work of data is taken charge of by the deep learning model, and the iterative learning model only needs to use the data processed by the deep learning model (namely the training data set). The training process of the iterative learning model comprises the following steps: judging whether the iterative learning model is activated or not through a preset identifier, wherein the activated iterative learning model receives feedback information; and training the iterative model by using the training data set to obtain the trained iterative model.
Example one: inputting the wolf teeth stick wanted to be found and the context information thereof into a model, inputting the preprocessed wolf teeth stick and the context information into a deep learning model to obtain an error correction result S, namely the Langya bang wanted to be found, and performing iterative learning by using an iterative learning model to wait for feedback information.
Example two: inputting the wolf teeth stick to be found and the context information thereof into a model, inputting the preprocessed wolf teeth stick to a deep learning model to obtain an error correction result S, namely the Langya post to be found, and performing iterative learning by using an iterative learning model for waiting for feedback information.
And S105, optimizing the second text data according to the confusion set to obtain third text data.
And the third text data is the final recognition result obtained after the text error correction.
As shown in fig. 3, step S105 may specifically include the following processes:
and S1051, constructing a candidate set of the second text data according to the confusion set.
In this embodiment, the candidate set may be specifically constructed by using a graph model, an HMM, and the like, where the candidate set includes each candidate text data of the second text data.
Step S1052, respectively calculating scores of the second text data and each candidate text data using a preset scoring function.
The scoring function may include, but is not limited to, edit distance, LM, etc. scoring functions.
And S1053, selecting the candidate text data with the highest score as the preferred text data.
Step S1054, determining whether the difference between the scores of the preferred text data and the second text data is greater than a preset threshold.
The threshold may be set according to actual conditions, and this embodiment does not specifically limit the threshold.
If so, step S1055 is executed, and if not, step S1056 is executed.
And step S1055, determining the preferred text data as the third text data.
Step S1056, determining the second text data as the third text data.
That is, if none of the candidate sentences has a score higher than the original sentence or the score is not higher than the threshold value compared with the original sentence, the original sentence is considered to have no error, otherwise, the candidate sentence with the highest score is output.
After the final recognition result is selected, feedback information may also be sent to the iterative learning model according to the following rules: and for the condition that the final recognition result is not the output result of the error correction module, giving negative feedback, for the condition that the final recognition result is the output result of the error correction module, judging the similarity between the final recognition result and the context information thereof (the similarity can be a method of cosine distance, TFIDF, Word2Vec and the like), if the similarity exceeds a threshold value, indicating that the meaning of the sentence expressed by the user is repeated, indicating that the sentence and the sentence with the same meaning are insufficient in error correction capability, giving negative feedback, and for the rest conditions, indicating that the final recognition result is the result of the error correction module and the sentence with the same meaning is not repeatedly expressed by the user, wherein the effect of the iterative learning model meets the requirement of the user and giving positive feedback. Particularly, if a plurality of continuous feedback information are all positive feedback, the construction of the candidate set can be interrupted after the threshold value is reached, and the second text data can be directly output.
Example one: according to the error correction result S, i.e. "i want to find Langya" and the confusion set Cr ═ film and television scenesConstructing a candidate set Lr ═ film and television scenesThat is, the user needs to find the Lanya area.]. For candidate set Lr ═ film and television scenesAnd scoring, namely finding that the original sentence 'i want to find Langya board' is the highest in score, outputting a result and sending forward feedback to the iterative learning model according to rules.
Example two: according to the error correction result S, i.e. "I want to find Langya" and the confusion set Cr is regional sceneConstructing a candidate set Lr is regional sceneThat is, i want to find the langa area, i want to find the wolf ridge.]. For candidate set Lr is regional sceneAnd scoring, namely finding that the scoring of the area where I want to find Langya is the highest, outputting a result and sending negative feedback to the iterative learning model according to a rule.
The method in this embodiment may be used after the ASR method on the terminal device, where the input data is text data and the output data is also text data. Because the deep learning technology is not mature at present, the problem of poor generalization and the like exists, the deep learning model effect is basically guaranteed by using the traditional machine learning method, the deep learning model can be subjected to iterative optimization and fine tuning by using the reinforcement learning method, and when the method in the embodiment is applied to different terminal devices such as mobile phones, smart televisions and smart homes, corresponding adjustment can be made on training data, classification quantity and confusion sets so as to adapt to the currently used terminal devices.
Through the embodiment, when the wolf teeth stick in the wolf teeth stick is corrected, the wolf teeth stick in the movie scene can be corrected into Langya brand; the Langerhan stick in the regional scene can be corrected into a proper one of regional names such as Langya area, Langerhan and the like; the Langerhans' sticks in the news scene can be corrected into Langya networks. When the wakening in the 'I want to wake up' is corrected, the wakening in the astronomical scene is corrected into a 'star'; the "wake up" in the animal scene is corrected to become "orangutan"; the "wake up" in the entertainment scene will correct the error to become "harmonious star".
In summary, in the embodiments of the present invention, first text data is obtained, where the first text data is text data that is output after a preset terminal device performs voice recognition on an input voice; determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and context information of the first text data; determining a confusion set of the first text data according to the usage scenario; performing error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data; and optimizing the second text data according to the confusion set to obtain third text data. According to the embodiment of the invention, based on methods such as deep learning and iterative learning, a set of complete optimization methods are designed for the text recognized by ASR in a complex scene, and different error corrections are performed on the error text according to different scenes, so that the error correction accuracy is greatly improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 4 is a structural diagram of an embodiment of a text error correction apparatus according to an embodiment of the present invention, which corresponds to the text error correction method described in the foregoing embodiment.
In this embodiment, a text error correction apparatus may include:
the text data acquisition module 401 is configured to acquire first text data, where the first text data is text data that is output after a preset terminal device performs voice recognition on input voice;
a usage scenario determining module 402, configured to determine a usage scenario of the first text data according to a current state of the terminal device, the first text data, and context information of the first text data;
a confusion set determining module 403, configured to determine a confusion set of the first text data according to the usage scenario;
an error correction module 404, configured to perform error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data;
and an optimizing module 405, configured to perform optimization processing on the second text data according to the confusion set, so as to obtain third text data.
Further, the usage scenario determination module may include:
a first candidate usage scenario determining unit, configured to determine a first candidate usage scenario of the first text data according to a current state of the terminal device;
a second candidate usage scenario determination unit, configured to determine a second candidate usage scenario of the first text data according to the first text data and context information of the first text data;
a usage scenario determination unit configured to determine a usage scenario of the first text data according to the first candidate usage scenario and the second candidate usage scenario.
Further, the text correction apparatus may further include:
a training data set obtaining module, configured to obtain a training data set, where each piece of training data in the training data set includes input data and a label, the input data is error text data and context information output by speech recognition, and the label is a correct result obtained after error correction is performed on the error text data;
and the first model training module is used for training the deep learning model by using the training data set to obtain the trained deep learning model.
Further, the text correction apparatus may further include:
the feedback information receiving module is used for judging whether the iterative learning model is activated or not through a preset identifier, and the activated iterative learning model receives feedback information;
and the second model training module is used for training the iterative model by using the training data set to obtain the trained iterative model.
Further, the optimization module may include:
a candidate set constructing unit, configured to construct a candidate set of the second text data according to the confusion set, where the candidate set includes candidate text data of the second text data;
the score calculating unit is used for calculating scores of the second text data and each candidate text data by using a preset score function;
the preferred text data selecting unit is used for selecting the candidate text data with the highest score as the preferred text data;
a first determining unit, configured to determine the preferred text data as the third text data if a difference between scores of the preferred text data and the second text data is greater than a preset threshold;
a second determining unit, configured to determine the second text data as the third text data if a difference between the scores of the preferred text data and the second text data is less than or equal to the threshold.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, modules and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Fig. 5 shows a schematic block diagram of a terminal device according to an embodiment of the present invention, and for convenience of description, only the relevant parts related to the embodiment of the present invention are shown.
As shown in fig. 5, the terminal device 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52 stored in said memory 51 and executable on said processor 50. The processor 50, when executing the computer program 52, implements the steps in the various text error correction method embodiments described above, such as the steps S101 to S105 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of each module/unit in the above-mentioned device embodiments, for example, the functions of the modules 401 to 405 shown in fig. 4.
Illustratively, the computer program 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 52 in the terminal device 5.
The terminal device 5 may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a cloud server, or other computing devices. It will be understood by those skilled in the art that fig. 5 is only an example of the terminal device 5, and does not constitute a limitation to the terminal device 5, and may include more or less components than those shown, or combine some components, or different components, for example, the terminal device 5 may further include an input-output device, a network access device, a bus, etc.
The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer programs and other programs and data required by the terminal device 5. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A text error correction method, comprising:
acquiring first text data, wherein the first text data is output text data after voice recognition is carried out on input voice by preset terminal equipment;
determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and context information of the first text data;
determining a confusion set of the first text data according to the usage scenario;
performing error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data;
and optimizing the second text data according to the confusion set to obtain third text data.
2. The text error correction method according to claim 1, wherein the determining the usage scenario of the first text data according to the current state of the terminal device, the first text data, and the context information of the first text data comprises:
determining a first candidate use scene of the first text data according to the current state of the terminal equipment;
determining a second candidate use scene of the first text data according to the first text data and the context information of the first text data;
determining a usage scenario of the first text data according to the first candidate usage scenario and the second candidate usage scenario.
3. The text correction method of claim 1, wherein the training process of the deep learning model comprises:
acquiring a training data set, wherein each piece of training data in the training data set comprises input data and a label, the input data is error text data and context information output by voice recognition, and the label is a correct result obtained after error correction is performed on the error text data;
and training the deep learning model by using the training data set to obtain the trained deep learning model.
4. The text correction method of claim 3, wherein the training process of the iterative learning model comprises:
judging whether the iterative learning model is activated or not through a preset identifier, wherein the activated iterative learning model receives feedback information;
and training the iterative model by using the training data set to obtain the trained iterative model.
5. The text error correction method according to any one of claims 1 to 4, wherein the optimizing the second text data according to the confusion set to obtain third text data comprises:
constructing a candidate set of the second text data according to the confusion set, wherein the candidate set comprises each candidate text data of the second text data;
respectively calculating scores of the second text data and each candidate text data by using a preset score function;
selecting candidate text data with the highest score as preferred text data;
if the difference between the scores of the preferred text data and the second text data is larger than a preset threshold value, determining the preferred text data as the third text data;
and if the difference between the scores of the preferred text data and the second text data is less than or equal to the threshold value, determining the second text data as the third text data.
6. A text correction apparatus, comprising:
the text data acquisition module is used for acquiring first text data, wherein the first text data is output text data after voice recognition is carried out on input voice by preset terminal equipment;
the usage scenario determining module is used for determining a usage scenario of the first text data according to the current state of the terminal device, the first text data and the context information of the first text data;
a confusion set determining module for determining a confusion set of the first text data according to the usage scenario;
the error correction module is used for carrying out error correction processing on the first text data by using a preset deep learning model and an iterative learning model to obtain second text data;
and the optimization module is used for optimizing the second text data according to the confusion set to obtain third text data.
7. The text correction apparatus of claim 6, wherein the usage scenario determination module comprises:
a first candidate usage scenario determining unit, configured to determine a first candidate usage scenario of the first text data according to a current state of the terminal device;
a second candidate usage scenario determination unit, configured to determine a second candidate usage scenario of the first text data according to the first text data and context information of the first text data;
a usage scenario determination unit configured to determine a usage scenario of the first text data according to the first candidate usage scenario and the second candidate usage scenario.
8. The text correction apparatus according to claim 6 or 7, wherein the optimization module comprises:
a candidate set constructing unit, configured to construct a candidate set of the second text data according to the confusion set, where the candidate set includes candidate text data of the second text data;
the score calculating unit is used for calculating scores of the second text data and each candidate text data by using a preset score function;
the preferred text data selecting unit is used for selecting the candidate text data with the highest score as the preferred text data;
a first determining unit, configured to determine the preferred text data as the third text data if a difference between scores of the preferred text data and the second text data is greater than a preset threshold;
a second determining unit, configured to determine the second text data as the third text data if a difference between the scores of the preferred text data and the second text data is less than or equal to the threshold.
9. A computer readable storage medium storing computer readable instructions, wherein the computer readable instructions, when executed by a processor, implement the steps of the text correction method according to any one of claims 1 to 5.
10. A terminal device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, characterized in that the processor implements the steps of the text correction method according to any one of claims 1 to 5 when executing the computer readable instructions.
CN201910387845.3A 2019-05-10 2019-05-10 Text error correction method and device, computer readable storage medium and terminal equipment Pending CN112002311A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910387845.3A CN112002311A (en) 2019-05-10 2019-05-10 Text error correction method and device, computer readable storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910387845.3A CN112002311A (en) 2019-05-10 2019-05-10 Text error correction method and device, computer readable storage medium and terminal equipment

Publications (1)

Publication Number Publication Date
CN112002311A true CN112002311A (en) 2020-11-27

Family

ID=73461185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910387845.3A Pending CN112002311A (en) 2019-05-10 2019-05-10 Text error correction method and device, computer readable storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN112002311A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541342A (en) * 2020-12-08 2021-03-23 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and storage medium
CN113192497A (en) * 2021-04-28 2021-07-30 平安科技(深圳)有限公司 Speech recognition method, apparatus, device and medium based on natural language processing
CN114120972A (en) * 2022-01-28 2022-03-01 科大讯飞华南有限公司 Intelligent voice recognition method and system based on scene
WO2022135206A1 (en) * 2020-12-25 2022-06-30 华为技术有限公司 Text error correction method and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413549A (en) * 2013-07-31 2013-11-27 深圳创维-Rgb电子有限公司 Voice interaction method and system and interaction terminal
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN108091328A (en) * 2017-11-20 2018-05-29 北京百度网讯科技有限公司 Speech recognition error correction method, device and readable medium based on artificial intelligence
CN108646580A (en) * 2018-05-14 2018-10-12 中兴通讯股份有限公司 The determination method and device of control object, storage medium, electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413549A (en) * 2013-07-31 2013-11-27 深圳创维-Rgb电子有限公司 Voice interaction method and system and interaction terminal
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
US20180349327A1 (en) * 2017-06-05 2018-12-06 Baidu Online Network Technology (Beijing)Co., Ltd. Text error correction method and apparatus based on recurrent neural network of artificial intelligence
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN108091328A (en) * 2017-11-20 2018-05-29 北京百度网讯科技有限公司 Speech recognition error correction method, device and readable medium based on artificial intelligence
CN108646580A (en) * 2018-05-14 2018-10-12 中兴通讯股份有限公司 The determination method and device of control object, storage medium, electronic device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541342A (en) * 2020-12-08 2021-03-23 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and storage medium
JP2022091121A (en) * 2020-12-08 2022-06-20 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Text error correction method, apparatus, electronic device, storage medium, and program
JP7286737B2 (en) 2020-12-08 2023-06-05 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Text error correction method, device, electronic device, storage medium and program
WO2022135206A1 (en) * 2020-12-25 2022-06-30 华为技术有限公司 Text error correction method and electronic device
CN113192497A (en) * 2021-04-28 2021-07-30 平安科技(深圳)有限公司 Speech recognition method, apparatus, device and medium based on natural language processing
CN113192497B (en) * 2021-04-28 2024-03-01 平安科技(深圳)有限公司 Speech recognition method, device, equipment and medium based on natural language processing
CN114120972A (en) * 2022-01-28 2022-03-01 科大讯飞华南有限公司 Intelligent voice recognition method and system based on scene
CN114120972B (en) * 2022-01-28 2022-04-12 科大讯飞华南有限公司 Intelligent voice recognition method and system based on scene

Similar Documents

Publication Publication Date Title
CN112002311A (en) Text error correction method and device, computer readable storage medium and terminal equipment
WO2018040899A1 (en) Error correction method and device for search term
WO2020119496A1 (en) Communication method, device and equipment based on artificial intelligence and readable storage medium
CN110163181B (en) Sign language identification method and device
CN111145732B (en) Processing method and system after multi-task voice recognition
CN110930980B (en) Acoustic recognition method and system for Chinese and English mixed voice
WO2024098533A1 (en) Image-text bidirectional search method, apparatus and device, and non-volatile readable storage medium
WO2020143320A1 (en) Method and apparatus for acquiring word vectors of text, computer device, and storage medium
CN114022882B (en) Text recognition model training method, text recognition device, text recognition equipment and medium
WO2024098623A1 (en) Cross-media retrieval method and apparatus, cross-media retrieval model training method and apparatus, device, and recipe retrieval system
CN114445831A (en) Image-text pre-training method, device, equipment and storage medium
WO2024098524A1 (en) Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium
US11615247B1 (en) Labeling method and apparatus for named entity recognition of legal instrument
CN112687266A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
WO2023029354A1 (en) Text information extraction method and apparatus, and storage medium and computer device
CN114861758A (en) Multi-modal data processing method and device, electronic equipment and readable storage medium
WO2024098763A1 (en) Text operation diagram mutual-retrieval method and apparatus, text operation diagram mutual-retrieval model training method and apparatus, and device and medium
CN113220828A (en) Intention recognition model processing method and device, computer equipment and storage medium
CN115688868B (en) Model training method and computing equipment
CN110516125A (en) Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string
CN110717022A (en) Robot dialogue generation method and device, readable storage medium and robot
WO2021082570A1 (en) Artificial intelligence-based semantic identification method, device, and semantic identification apparatus
CN113342981A (en) Demand document classification method and device based on machine learning
WO2022141855A1 (en) Text regularization method and apparatus, and electronic device and storage medium
CN113961701A (en) Message text clustering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination