CN111881297A

CN111881297A - Method and device for correcting voice recognition text

Info

Publication number: CN111881297A
Application number: CN202010763534.5A
Authority: CN
Inventors: 聂镭; 齐凯杰; 聂颖
Original assignee: Longma Zhixin Zhuhai Hengqin Technology Co ltd
Current assignee: Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-11-03

Abstract

The invention discloses a method and a device for correcting a voice recognition text. Wherein, the method comprises the following steps: acquiring an original voice recognition text to be corrected, wherein the original voice recognition text is obtained by performing voice recognition on a voice to be recognized; acquiring position information of a recognized error word in an original voice recognition text; processing the original voice recognition text by using a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized; acquiring keywords for replacing and identifying error words according to the text intention; and replacing the recognized error words by using the keywords based on the position information of the recognized error words so as to correct the original voice recognition text and obtain the actual text corresponding to the voice to be recognized. The invention solves the technical problems that in the related art, due to the existence of interference factors, wrong recognition results are easy to occur during voice recognition, and the wrong voice recognition results cannot be corrected.

Description

Method and device for correcting voice recognition text

Technical Field

The invention relates to the technical field of voice recognition, in particular to a method and a device for correcting a voice recognition text.

Background

When the voice chat robot in the vertical field is used, the voice of a client can obtain an incorrect result when voice recognition is carried out due to the influence of the user, such as a noisy environment, inaccurate pronunciation of a domain proper noun and the like, so that the user experience is greatly reduced.

Correcting from a text perspective and training speech recognition models from vertical domain corpora are two common solutions. And the voice recognition result is corrected from the text perspective, so that the cost is saved. But there are also drawbacks. For the text with serious error in the speech recognition result, the effect is poor by correcting the text result.

In view of the above-mentioned problems in the related art that erroneous recognition results are likely to occur during speech recognition due to the existence of interference factors, and the erroneous speech recognition results cannot be corrected, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for correcting a voice recognition text, which are used for at least solving the technical problems that in the related art, due to the existence of interference factors, wrong recognition results are easy to appear during voice recognition, and the wrong voice recognition results cannot be corrected.

According to an aspect of an embodiment of the present invention, there is provided a method for correcting a speech recognition text, including: acquiring an original voice recognition text to be corrected, wherein the original voice recognition text is obtained by performing voice recognition on a voice to be recognized; acquiring position information of recognized error words in the original voice recognition text; processing the original voice recognition text by using a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized; acquiring a keyword for replacing the recognized error word according to the text intention; and replacing the recognized error words with the keywords based on the position information of the recognized error words to correct the original voice recognition text to obtain an actual text corresponding to the voice to be recognized.

Optionally, before the obtaining of the original speech recognition text to be corrected, the method for correcting the speech recognition text further includes: performing voice recognition on the voice to be recognized to obtain the original voice recognition text; and determining that the original voice text has an abnormal recognition phenomenon.

Optionally, the obtaining of the position information of the recognized error word in the original speech recognition text includes: acquiring a feedback file for labeling the original voice recognition text, wherein the feedback file comprises: the actual text marked with the position information I of the correct key words and the original speech recognition text marked with the position information II of the recognition error words; and acquiring the position information of the recognized error words in the original voice recognition text based on the feedback file.

Optionally, after the obtaining the feedback file labeling the original speech recognition text, the method for correcting the speech recognition text further includes: amplifying the original voice recognition text to obtain a training data set; wherein, amplifying the original speech recognition text to obtain a training data set, comprises: determining key words in the actual text; executing preset editing operation on characters in the position range corresponding to the keyword, wherein the preset editing operation comprises at least one of the following operations: replacement, addition and deletion; and taking all text sequences obtained by executing the preset editing operation as elements of the training data set to obtain the training data set.

Optionally, before the processing the original speech recognition text by using the text classification model to obtain the text intention of the actual text corresponding to the speech to be recognized, the method for correcting the speech recognition text further includes: adding a predetermined field to each text sequence in the training data set to obtain a plurality of text sequences with the predetermined fields added, wherein the predetermined fields represent text intentions of each text sequence in the training data set; and training the plurality of text sequences to obtain the text classification model.

Optionally, the processing the original speech recognition text by using a text classification model to obtain a text intention of an actual text corresponding to the speech to be recognized includes: inputting the original speech recognition text to the text classification model; acquiring the output of the text classification model; the textual intent is derived based on an output of the text classification model.

Optionally, the obtaining a keyword for replacing the recognized error word according to the text intent includes: performing context labeling on each text sequence in the training data set to obtain context information of the recognized error words; and acquiring a keyword for replacing the recognized error word according to the text intention and the context information of the recognized error word.

Optionally, the obtaining a keyword for replacing the recognized error word according to the text intention and the context information of the recognized error word includes: determining similarity of the original speech recognition text and each text training in the training data set; selecting a target text sequence with the maximum similarity to the original speech recognition text from the training data set; and predicting the keywords based on the text intention and the position information of the context information of the recognized error words in the target text sequence.

According to another aspect of the embodiments of the present invention, there is also provided a correction apparatus for a speech recognition text, including: the device comprises a first acquisition unit, a second acquisition unit and a correction unit, wherein the first acquisition unit is used for acquiring an original voice recognition text to be corrected, and the original voice recognition text is obtained by performing voice recognition on voice to be recognized; the second acquisition unit is used for acquiring the position information of the recognized error words in the original voice recognition text; the third acquisition unit is used for processing the original voice recognition text by utilizing a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized; a fourth obtaining unit, configured to obtain a keyword for replacing the recognized error word according to the text intent; and the correcting unit is used for replacing the recognized error words by the keywords based on the position information of the recognized error words so as to correct the original voice recognition text and obtain the actual text corresponding to the voice to be recognized.

Optionally, the apparatus for correcting a speech recognition text further includes: the voice recognition unit is used for performing voice recognition on the voice to be recognized before the original voice recognition text to be corrected is obtained, so as to obtain the original voice recognition text; and the determining unit is used for determining that the original voice text has an abnormal recognition phenomenon.

Optionally, the second obtaining unit includes: a first obtaining module, configured to obtain a feedback file for labeling the original speech recognition text, where the feedback file includes: the actual text marked with the position information I of the correct key words and the original speech recognition text marked with the position information II of the recognition error words; and the second acquisition module is used for acquiring the position information of the recognized error words in the original voice recognition text based on the feedback file.

Optionally, the apparatus for correcting a speech recognition text further includes: the expansion module is used for amplifying the original voice recognition text to obtain a training data set after the feedback file for marking the original voice recognition text is obtained; wherein the expansion module comprises: the first determining submodule is used for determining key words in the actual text; the execution module is used for executing preset editing operation on the characters in the position range corresponding to the keyword, wherein the preset editing operation comprises at least one of the following operations: replacement, addition and deletion; and the first obtaining submodule is used for taking all text sequences obtained by executing the preset editing operation as elements of the training data set to obtain the training data set.

Optionally, the apparatus for correcting a speech recognition text further includes: an adding module, configured to add a predetermined field to each text sequence in the training data set before the original speech recognition text is processed by using the text classification model to obtain a text intention of an actual text corresponding to the speech to be recognized, so as to obtain a plurality of text sequences to which the predetermined fields are added, where the predetermined fields represent the text intention of each text sequence in the training data set; and the training module is used for training the plurality of text sequences to obtain the text classification model.

Optionally, the third obtaining unit includes: an input module for inputting the original speech recognition text to the text classification model; the third acquisition module is used for acquiring the output of the text classification model; and the fourth obtaining module is used for obtaining the text intention based on the output of the text classification model.

Optionally, the fourth obtaining unit includes: the marking module is used for carrying out context marking on each text sequence in the training data set to obtain context information of the recognized error words; and the fifth acquisition module is used for acquiring the keywords for replacing the recognized error words according to the text intention and the context information of the recognized error words.

Optionally, the fifth obtaining module includes: a second determining submodule, configured to determine similarity between the original speech recognition text and each text in the training data set; the second obtaining submodule is used for selecting a target text sequence with the maximum similarity to the original speech recognition text from the training data set; and the predicting sub-module is used for predicting to obtain the keyword based on the text intention and the position information of the context information of the recognized error word in the target text sequence.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, which includes a stored computer program, wherein when the computer program is executed by a processor, the computer program controls an apparatus in which the computer storage medium is located to execute the method for correcting a speech recognition text according to any one of the above.

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a computer program, where the computer program executes to execute the method for correcting a speech recognition text described in any one of the above.

In the embodiment of the invention, the original voice recognition text to be corrected is obtained, wherein the original voice recognition text is obtained by performing voice recognition on the voice to be recognized; acquiring position information of a recognized error word in an original voice recognition text; processing the original voice recognition text by using a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized; acquiring keywords for replacing and identifying error words according to the text intention; based on the position information of the recognized wrong words, the recognized wrong words are replaced by the keywords so as to correct the original voice recognition text to obtain the actual text corresponding to the voice to be recognized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a schematic flow chart of a method for correcting a speech recognition text according to an embodiment of the present invention;

fig. 2 is another schematic flow chart of the method for correcting the speech recognition text before step S102 in fig. 1 according to the embodiment of the present invention;

fig. 3 is a schematic flowchart of the step S104 in fig. 1 of the method for correcting the speech recognition text according to the embodiment of the present invention;

fig. 4 is another schematic flow chart of the correction method for the speech recognition text provided in the embodiment of the present invention after step S301 in fig. 3;

fig. 5 is another schematic flow chart of the method for correcting the speech recognition text before step S106 in fig. 1 according to the embodiment of the present invention;

fig. 6 is a schematic flowchart of step S108 in fig. 1 of the method for correcting the speech recognition text according to the embodiment of the present invention;

fig. 7 is a schematic specific flowchart of step S602 in fig. 6 of the method for correcting the speech recognition text according to the embodiment of the present invention;

FIG. 8(a) is a diagram illustrating a method for correcting a named entity based speech recognition text according to an embodiment of the present invention;

FIG. 8(b) is a diagram illustrating an alternative method for correcting a named entity based speech recognition text according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a device for correcting a speech recognition text according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of a method of correcting a speech recognition text, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a schematic flow chart of a method for correcting a speech recognition text according to an embodiment of the present invention, where the method may be applied to a terminal device or a server, where the terminal device may be a computing device such as a desktop computer, a notebook computer, a palmtop computer, and the server may be a computing device such as a cloud server, as shown in fig. 1, the method for correcting a speech recognition text includes the following steps:

step S102, obtaining an original voice recognition text to be corrected, wherein the original voice recognition text is obtained by performing voice recognition on voice to be recognized.

Optionally, the speech to be recognized may be a segment of speech uttered by the user in a predetermined scene.

In the embodiment, when the method is applied to the terminal equipment, the voice sent by the user can be acquired through the microphone of the terminal equipment; when applied to a server, the voice uttered by the user may be obtained from a peripheral device of the server, such as a call center.

In the embodiment of the present invention, the language type, the voice length, and the like of the voice are not specifically limited, and may be any type of voice, and the acquisition source of the voice is not specifically limited.

Optionally, the original speech recognition text may be a text obtained by performing speech recognition on a speech to be recognized; that is, the original speech recognition text here is text that has not passed the correction.

In an alternative embodiment, referring to fig. 2, another schematic flow chart before step S102 in fig. 1 of the method for correcting a speech recognition text provided in the embodiment of the present invention is shown, before obtaining an original speech recognition text to be corrected, the method for correcting a speech recognition text may further include:

step S201, performing voice recognition on the voice to be recognized to obtain an original voice recognition text.

For example, the speech to be recognized may be speech recognized using an automatic speech recognition technique ASR, resulting in an original speech recognized text. Of course, other technologies capable of implementing the speech recognition function may be used, and the speech recognition method is not particularly limited in the embodiment of the present invention.

The Automatic Speech Recognition technology (ASR) is a technology for converting human Speech into text.

Step S202, determining that the original voice text has an abnormal recognition phenomenon.

For example, after the original speech recognition text is acquired, a word or word with an error in the original speech text is found, so that the whole text is not semantically meaningful, and the user cannot understand what meaning the user wants to express. For example, the speech to be recognized is that "the shelf lives of three boxes of yogurt purchased by you are forty-five days", whereas the original speech text obtained by the speech recognition is that "the shelf lives of hawthorn-flavored yogurt purchased by you are forty-five days".

As can be seen from the above, before obtaining an original speech text to be corrected, the embodiment of the present invention needs to perform speech recognition on a speech to be recognized, convert the speech to be recognized from a speech form into a text form, that is, an original speech recognition text, and preliminarily determine whether an abnormal phenomenon exists in the original speech recognition text obtained by speech recognition, for example, whether a word or a word is recognized more, whether a word or a word is recognized less, and whether a word or a word recognition error occurs. That is, before the original speech recognition text is corrected, whether the original speech recognition text has an abnormal recognition phenomenon is initially determined, so that the defect that the original speech recognition text without abnormal recognition also goes through a correction process can be effectively avoided, the speech processing efficiency is improved, unnecessary operation of a program is reduced, and the system expenditure is reduced.

It should be noted that after the speech to be recognized is obtained, it is analyzed to find that a large amount of noise signals are included in the speech signal, and in this case, it can be determined that the obtained original speech recognition text most likely has a word or a phrase with a recognition error, that is, the obtained original speech recognition text will be corrected.

And step S104, acquiring the position information of the recognized error words in the original voice recognition text.

In an alternative embodiment, referring to fig. 3, a specific flowchart of step S104 in fig. 1 of the method for correcting a speech recognition text according to an embodiment of the present invention is shown, where acquiring location information of a recognized error word in an original speech recognition text includes:

step S301, obtaining a feedback file for labeling the original speech recognition text, wherein the feedback file comprises: and the actual text marked with the position information I of the correct key words and the original speech recognition text marked with the position information II for recognizing the wrong words.

Step S302, acquiring the position information of the recognized error words in the original voice recognition text based on the feedback file.

In the embodiment of the present invention, the original speech recognition text obtained by speech recognition may be labeled first, the wrong place may be labeled, and the correction may be performed, and then the correct keyword and the position information of the recognition error word in the whole sentence may be labeled, for example, the correct text: after receiving the contract, please carefully read the detailed rules of the contract clauses after the dead period of ten days is shared by the insurance bought by the user; speech recognition of the resulting erroneous text: the insurance buying time is that you please read the detailed rules of contract clauses carefully after receiving the contract in the later period; ten-day hesitation period location information: [ 9,15 ], time has your later position information: [ 9,15 ]. The numbers here represent the starting positions and ending positions of the keywords or the misrecognized words, and specifically, the numbers may be different based on the difference of the utilized natural speech processing techniques, so that the efficiency of locating the recognized misrecognized words in the original speech recognition text may be improved, and the correction efficiency of the speech recognition text may be further improved.

In an alternative embodiment, referring to fig. 4, another flow chart after step S301 in fig. 3 of the method for correcting a speech recognition text provided in the embodiment of the present invention is shown, after obtaining a feedback file labeling an original speech recognition text, the method for correcting a speech recognition text may further include:

step S401, the original speech recognition text is amplified to obtain a training data set.

Because the number of samples collected from the actual scene is small, only a small number of samples can be used for training, and the text correction effect is poor.

In view of the above problems, in the embodiment of the present invention, data amplification may be performed specifically for a sample with an error in speech recognition, so as to increase a training data set.

Optionally, the training data set is used for training to obtain a text classification model. The specific training mode of the text classification model is introduced in the following embodiments and will not be described herein again.

In addition, data amplification is performed on the labeled sample, and actually, random operations such as replacement, addition, deletion, and the like are performed on the words in the keyword position range.

Thus, for example, the specific process of augmenting the original speech recognition text to obtain the training data set may be:

first, keywords in the actual text are determined.

And secondly, executing preset editing operation on the characters in the position range corresponding to the keyword, wherein the preset editing operation comprises at least one of the following operations: replacement, addition, and deletion.

In this step, parameters of data amplification may be adjusted, for example, the number of times characters in a position range corresponding to a keyword are edited.

Further, the word list of the added Chinese characters can be replaced, for example, for some frequently-wrong words in the sample, the frequently-wrong words are replaced by the frequently-wrong words to increase the word list of the Chinese characters. In actual operation, the editing times can be 1/5 (minimum is 1) of the length of the keyword, instead of increasing the vocabulary of the Chinese characters, a set of Chinese characters appearing in the vertical field is used as the vocabulary, for example, for the keyword of "ten-day hesitation", errors which may appear are "late you in time", "hesitation in time", "ten-day cause period", and the like, and the error combinations appearing are constructed into the vocabulary, so as to increase the training data set.

For example, the correct text: after receiving the contract, please carefully read the detailed rules of the contract clauses after the dead period of ten days is shared by the insurance bought by the user; speech recognition of the resulting erroneous text: the insurance buying time is that you please read the detailed rules of contract clauses carefully after receiving the contract in the later period; ten-day hesitation period location information: [ 9,15 ], time has your later position information: [ 9,15 ].

Illustratively, words at positions [ 9,15 ] in a sentence are replaced, and for the Chinese characters [ 9,15 ], other Chinese characters can be randomly replaced from the speech recognition corpus in the vertical field; words at positions of [ 9,15 ] in the sentence are added, and Chinese characters randomly selected from a word list can be added at [ 9,15 ]; and deleting the words at the positions of [ 9,15 ] in the sentences, and deleting the Chinese characters from [ 9,15 ] randomly.

And thirdly, taking all text sequences obtained by executing the preset editing operation as elements of the training data set to obtain the training data set.

In the embodiment of the present invention, the data may be amplified to a predetermined multiple, preferably, 3 times, of the original data set; due to the fact that the training data set is added in an amplification mode, when the text classification model is trained and obtained by using the amplified training data set, the text classification model obtained through training is good in adaptability.

And step S106, processing the original voice recognition text by using the text classification model to obtain the text intention of the actual text corresponding to the voice to be recognized.

Optionally, the text classification model may be a model trained in advance and used to process an original speech recognition text obtained by speech recognition to obtain an actual text intention of the text to be recognized.

Optionally, the actual text is a text actually corresponding to the speech to be processed, and the text intention here may be a semantic meaning that the actual text corresponding to the speech to be processed is intended to express.

In an optional embodiment, referring to fig. 5, another flow chart of the method for correcting a speech recognition text provided in the embodiment of the present invention before step S106 in fig. 1 is shown, before processing an original speech recognition text by using a text classification model to obtain a text intention of an actual text corresponding to a speech to be recognized, the method for correcting a speech recognition text may further include:

step S501, adding a predetermined field to each text sequence in the training data set to obtain a plurality of text sequences with the predetermined fields added, wherein the predetermined fields represent text intentions of each text sequence in the training data set.

Step S502, training a plurality of text sequences to obtain a text classification model.

Exemplarily, the specific process of processing the original speech recognition text by using the text classification model to obtain the text intention of the actual text corresponding to the speech to be recognized may be:

first, the original speech recognition text is input to a text classification model.

And secondly, acquiring the output of the text classification model.

And thirdly, obtaining the text intention based on the output of the text classification model.

In the embodiment of the invention, after the original voice text is amplified to obtain the amplified training data set in a mode of amplifying elements in the training data set, a predetermined field can be added to each text sequence in the amplified training data set, so that a plurality of text sequences with the predetermined fields added are obtained; and then training by utilizing a plurality of text sequences to obtain a text classification model, so that the original voice recognition text can be directly input into the text classification model in the subsequent use process, and the text intention of the original voice recognition text can be obtained based on the output of the text classification model.

For example, the original speech recognition text in the embodiment of the present invention may be input into a trained text classification model, and after processing is performed by using the text classification model, output of the text classification model is obtained, so as to obtain a text intention of the original speech recognition text. For example, the correct text: after receiving the contract, please carefully read the detailed rules of the contract clauses after the dead period of ten days is shared by the insurance bought by the user; and (3) voice recognition: the insurance buying time is that you please read the detailed rules of contract clauses carefully after receiving the contract in the later period; text intention: insurance hesitation period.

As can be seen from the above analysis, in the above embodiment, the training data set is obtained according to the data amplification manner, and then a field may be added to each text sequence in the training data set, where the field is used to indicate the text intent of the text sequence and to derive the keyword that needs to be modified accordingly. For example, in the above example, the markable field is [ insurance hesitation ], and the corresponding keyword to be modified is "ten-day hesitation". It should be noted that, for an intention, only one keyword exists. And then extracting features of the labeled text sequence and training to obtain a text classification model, for example, performing text classification training on a fasttext frame to obtain a fasttext classifier.

And step S108, acquiring keywords for replacing and identifying error words according to the text intention.

In an alternative embodiment, referring to fig. 6, a specific flowchart of step S108 in fig. 1 of the method for correcting a speech recognition text according to an embodiment of the present invention is shown, where obtaining a keyword for replacing a recognition error word according to a text intention includes:

step S601, performing context labeling on each text sequence in the training data set to obtain context information for identifying error words.

That is, in the embodiment of the present invention, for a word with a serious speech recognition error, a new label field, for example, a correct text, may be added to a sample by labeling its context, taking text [ dead time of ten days ] as an example, and by counting a large number of samples, whose context mostly includes phrases [ insurance share ] and [ contract received ]: after receiving the contract, please carefully read the detailed rules of the contract clauses after the dead period of ten days is shared by the insurance bought by the user; and (3) voice recognition: the insurance buying time is that you please read the detailed rules of contract clauses carefully after receiving the contract in the later period; text intention: insurance hesitation period; context: insurance enjoys position information [ 5,9 ], and receives contract position information [ 15,19 ].

Step S602, acquiring keywords for replacing and identifying error words according to the text intention and the context information of the identified error words.

In the embodiment, the keywords for replacing the recognized error words can be obtained based on the text intention and the context information of the recognized error words, and the keywords are obtained in an auxiliary manner by combining contexts, so that the reliability of the obtained keywords can be improved.

In an alternative embodiment, referring to fig. 7, a specific flowchart of step S602 in fig. 6 of the method for correcting a speech recognition text according to an embodiment of the present invention is shown, where acquiring a keyword for replacing a recognized error word according to a text intention and context information of the recognized error word includes:

step S701, determining the similarity of the original speech recognition text and each text in the training data set.

It should be noted that, in the embodiment of the present invention, the similarity calculation method is not specifically limited. Two alternative ways are exemplified below.

In an alternative embodiment, error correction may be performed by means of named entity recognition. When text correction is performed on an erroneous speech recognition text, firstly error positioning is performed, and a keyword which is recognized by mistake, namely a misrecognized word, is found. In the present invention, the following description is given by way of example [ ten days of hesitation ]. The method is characterized in that a named entity recognition technology is adopted to train a labeled sample, and named entity calculation is essentially a combination of a bidirectional long-time memory network (Bi-LSTM) and a Conditional Random Field (CRF). The bidirectional long and short time memory network (Bi-LSTM) plays a role in coding Chinese characters, the Chinese characters are presented in a vector form after being coded, and the Chinese characters or words formed by the Chinese characters are usually used as a unit for coding. The vector is put into a neural network for training, and the prediction result of the neural network on the coded Chinese character is a named entity with a label (start-stop label). Fig. 8(a) and 8(B) are schematic diagrams each illustrating a method of correcting a speech recognition text based on a named entity, and as shown in fig. 8(a) and 8(B), B denotes the beginning of an entity, I denotes an entity, and O denotes a non-entity. Where the "-" of B and I is followed by the name of the entity, the following example extracts the named entity [ time hesitations ]. The position of the wrong keyword in the text is obtained [ ten days of hesitation ], and the wrong keyword is replaced by the named entity name.

In an alternative embodiment, text similarity may be used for error correction. When the voice recognition has more errors, the correction is relatively complex, for example, the keywords such as the [ ten-day hesitation ] are more wrong, and the keywords such as the [ ten-day hesitation ] cannot be accurately extracted by using the named entity recognition method. At this time, auxiliary extraction can be performed according to words near the keyword, and the specific manner is as follows:

step one, obtaining keywords [ dead time of ten days ] according to a context identification method, wherein the words nearby include [ insurance share ] and [ contract received ]; and obtaining the keywords of the text which may need to be corrected according to the common words near the keywords.

Secondly, performing text similarity calculation on the text and the text of the corresponding keyword obtained by data amplification, for example, a cos similarity calculation mode can be adopted; when a cos similarity calculation mode is adopted in similarity calculation, positions of sample attachment words [ insurance share ] and [ contract receiving ] with the smallest cos value can be selected, positions of keywords [ ten-day hesitation period ] in sentences are presumed, and the original speech recognition text is replaced and corrected.

Step S702, selecting a target text sequence with the maximum similarity to the original speech recognition text from the training data set.

Step S703, predicting to obtain a keyword based on the text intention and the position information of the context information identifying the error word in the target text sequence.

In the embodiment of the invention, the text sequence with the maximum similarity to the original speech recognition text can be selected from the training data set and used as the target text sequence, and the keywords can be predicted and obtained based on the text intention and the position information of the context information of the recognized error words in the target text sequence, namely, the keywords can be predicted and obtained by finding the text sequence with the maximum similarity to the original speech recognition text as the target text sequence and combining the text intention and the position information of the context information of the recognized error words in the target text sequence, so that the accuracy of the obtained keywords is improved, and the accuracy of correcting the speech recognition text is further improved.

And step S110, replacing the recognized error words with the keywords based on the position information of the recognized error words so as to correct the original voice recognition text to obtain an actual text corresponding to the voice to be recognized.

Since the position information of the recognized error word in the original speech recognition text has been obtained through the above embodiment, and the keyword of the recognized error word is replaced, the recognized error word can be replaced with the keyword based on the position information of the recognized error word, so as to correct the original speech recognition text to obtain the actual text.

As can be seen from the above, in the embodiment of the present invention, an original speech recognition text to be corrected may be obtained, where the original speech recognition text is a text obtained by performing speech recognition on a speech to be recognized; then acquiring position information of the recognized error words in the original voice recognition text; then, processing the original voice recognition text by using a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized; acquiring keywords for replacing and identifying wrong words according to the text intention; and based on the position information of the recognized error word, replacing the recognized error word by the keyword so as to correct the original voice recognition text to obtain the actual text corresponding to the voice to be recognized, thereby achieving the purposes of acquiring the keyword for replacing the recognized error word based on the actual intention of the voice to be recognized and replacing the recognized error word by the keyword, and achieving the technical effect of improving the accuracy of voice recognition.

Therefore, the method for correcting the voice recognition text provided by the embodiment of the invention solves the technical problems that in the related art, due to the existence of interference factors, erroneous recognition results are easy to occur during voice recognition, and the erroneous voice recognition results cannot be corrected.

In summary, the method for correcting the speech recognition text provided by the invention adopts a data amplification mode, so that the named entity recognition has higher robustness when extracting the keywords; in addition, the situation that the keywords are not extracted directly can be met, the reliability of the voice recognition text is improved, and the user experience is improved.

Example 2

According to another aspect of the embodiment of the present invention, there is also provided a correction apparatus for a speech recognition text, and fig. 9 is a schematic diagram of the correction apparatus for a speech recognition text according to the embodiment of the present invention, as shown in fig. 9, the correction apparatus for a speech recognition text includes: a first acquisition unit 91, a second acquisition unit 93, a third acquisition unit 95, a fourth acquisition unit 97, and a correction unit 99. The following describes the correction device of the speech recognition text in detail.

The first acquiring unit 91 is configured to acquire an original speech recognition text to be corrected, where the original speech recognition text is a text obtained by performing speech recognition on a speech to be recognized.

A second obtaining unit 93, configured to obtain position information of a recognized error word in the original speech recognition text.

The third obtaining unit 95 is configured to process the original speech recognition text by using the text classification model, so as to obtain a text intention of the actual text corresponding to the speech to be recognized.

A fourth obtaining unit 97 for obtaining a keyword for replacing the recognized error word according to the text intention.

And the correcting unit 99 is configured to replace the recognized error word with the keyword based on the position information of the recognized error word, so as to correct the original speech recognition text to obtain an actual text corresponding to the speech to be recognized.

It should be noted here that the first acquiring unit 91, the second acquiring unit 93, the third acquiring unit 95, the fourth acquiring unit 97 and the correcting unit 99 correspond to steps S102 to S110 in embodiment 1, and the units are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1. It should be noted that the above-described elements as part of an apparatus may be implemented in a computer system, such as a set of computer-executable instructions.

As can be seen from the above, in the above embodiment of the present application, the first obtaining unit may be used to obtain an original speech recognition text to be corrected, where the original speech recognition text is a text obtained by performing speech recognition on a speech to be recognized; then, acquiring position information of the recognized error words in the original voice recognition text by using a second acquisition unit; processing the original voice recognition text by using a third acquisition unit through a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized; acquiring keywords for replacing and identifying wrong words according to the text intention by using a fourth acquisition unit; and replacing the recognized error words with the keywords by using the correction unit based on the position information of the recognized error words so as to correct the original voice recognition text to obtain an actual text corresponding to the voice to be recognized. The correction device for the voice recognition text provided by the embodiment of the invention can obtain the keyword for replacing the recognition error word based on the actual intention of the voice to be recognized, and replace the recognition error word by using the keyword, thereby achieving the technical effect of improving the accuracy of voice recognition, and further solving the technical problems that the error recognition result is easy to appear during voice recognition and the error voice recognition result cannot be corrected due to the existence of interference factors in the related technology.

In an alternative embodiment, the apparatus for correcting a speech recognition text further includes: the voice recognition unit is used for performing voice recognition on the voice to be recognized before acquiring the original voice recognition text to be corrected to obtain the original voice recognition text; and the determining unit is used for determining that the original voice text has the abnormal recognition phenomenon.

In an alternative embodiment, the second obtaining unit includes: the first obtaining module is configured to obtain a feedback file for labeling an original speech recognition text, where the feedback file includes: the actual text marked with the position information I of the correct key words and the original speech recognition text marked with the position information II for recognizing the wrong words are marked; and the second acquisition module is used for acquiring the position information of the recognized error words in the original voice recognition text based on the feedback file.

In an alternative embodiment, the apparatus for correcting a speech recognition text further includes: the expansion module is used for amplifying the original voice recognition text after acquiring a feedback file for labeling the original voice recognition text to obtain a training data set; wherein, the extension module includes: the first determining submodule is used for determining key words in the actual text; the execution module is used for executing preset editing operation on the characters in the position range corresponding to the keyword, wherein the preset editing operation comprises at least one of the following operations: replacement, addition and deletion; and the first obtaining submodule is used for taking all text sequences obtained by executing the preset editing operation as elements of the training data set to obtain the training data set.

In an alternative embodiment, the apparatus for correcting a speech recognition text further includes: the adding module is used for adding a preset field to each text sequence in the training data set before the text classification model is used for processing the original voice recognition text to obtain the text intention of the actual text corresponding to the voice to be recognized, so as to obtain a plurality of text sequences with the preset fields, wherein the preset fields represent the text intention of each text sequence in the training data set; and the training module is used for training the plurality of text sequences to obtain a text classification model.

In an alternative embodiment, the third obtaining unit includes: the input module is used for inputting the original voice recognition text into the text classification model; the third acquisition module is used for acquiring the output of the text classification model; and the fourth acquisition module is used for acquiring the text intention based on the output of the text classification model.

In an alternative embodiment, the fourth obtaining unit includes: the marking module is used for carrying out context marking on each text sequence in the training data set to obtain context information for identifying error words; and the fifth acquisition module is used for acquiring the keywords for replacing and identifying the error words according to the text intention and the context information of the error words.

In an optional embodiment, the fifth obtaining module includes: the second determining submodule is used for determining the similarity of the original voice recognition text and each text in the training data set; the second acquisition submodule is used for selecting a target text sequence with the maximum similarity to the original speech recognition text from the training data set; and the predicting submodule is used for predicting to obtain the keywords based on the text intention and the position information of the context information for identifying the error words in the target text sequence.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium including a stored computer program, wherein when the computer program is executed by a processor, the apparatus in which the computer storage medium is located is controlled to execute the method for correcting the speech recognition text of any one of the above.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a computer program, where the computer program executes a method for correcting a speech recognition text according to any one of the above methods.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for correcting a speech recognition text, comprising:

acquiring an original voice recognition text to be corrected, wherein the original voice recognition text is obtained by performing voice recognition on a voice to be recognized;

acquiring position information of recognized error words in the original voice recognition text;

processing the original voice recognition text by using a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized;

acquiring a keyword for replacing the recognized error word according to the text intention;

and replacing the recognized error words with the keywords based on the position information of the recognized error words to correct the original voice recognition text to obtain an actual text corresponding to the voice to be recognized.

2. The method of claim 1, wherein prior to said obtaining the original speech recognition text to be corrected, the method further comprises:

performing voice recognition on the voice to be recognized to obtain the original voice recognition text;

and determining that the original voice text has an abnormal recognition phenomenon.

3. The method of claim 1, wherein the obtaining of the position information of the recognized error word in the original speech recognition text comprises:

acquiring a feedback file for labeling the original voice recognition text, wherein the feedback file comprises: the actual text marked with the position information I of the correct key words and the original speech recognition text marked with the position information II of the recognition error words;

and acquiring the position information of the recognized error words in the original voice recognition text based on the feedback file.

4. The method of claim 3, wherein after the obtaining a feedback file that labels the original speech recognition text, the method further comprises:

amplifying the original voice recognition text to obtain a training data set;

wherein, amplifying the original speech recognition text to obtain a training data set, comprises:

determining key words in the actual text;

executing preset editing operation on characters in the position range corresponding to the keyword, wherein the preset editing operation comprises at least one of the following operations: replacement, addition and deletion;

and taking all text sequences obtained by executing the preset editing operation as elements of the training data set to obtain the training data set.

5. The method of claim 4, wherein before the processing the original speech recognition text by using the text classification model to obtain the text intention of the actual text corresponding to the speech to be recognized, the method further comprises:

adding a predetermined field to each text sequence in the training data set to obtain a plurality of text sequences with the predetermined fields added, wherein the predetermined fields represent text intentions of each text sequence in the training data set;

and training the plurality of text sequences to obtain the text classification model.

6. The method of claim 4, wherein the obtaining keywords for replacing the recognized error words according to the text intent comprises:

performing context labeling on each text sequence in the training data set to obtain context information of the recognized error words;

and acquiring a keyword for replacing the recognized error word according to the text intention and the context information of the recognized error word.

7. The method according to claim 6, wherein the obtaining the keyword for replacing the recognized error word according to the text intention and the context information of the recognized error word comprises:

determining similarity of the original speech recognition text and each text training in the training data set;

selecting a target text sequence with the maximum similarity to the original speech recognition text from the training data set;

and predicting the keywords based on the text intention and the position information of the context information of the recognized error words in the target text sequence.

8. A correction apparatus for a speech recognition text, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a correction unit, wherein the first acquisition unit is used for acquiring an original voice recognition text to be corrected, and the original voice recognition text is obtained by performing voice recognition on voice to be recognized;

the second acquisition unit is used for acquiring the position information of the recognized error words in the original voice recognition text;

the third acquisition unit is used for processing the original voice recognition text by utilizing a text classification model to obtain a text intention of an actual text corresponding to the voice to be recognized;

a fourth obtaining unit, configured to obtain a keyword for replacing the recognized error word according to the text intent;

and the correcting unit is used for replacing the recognized error words by the keywords based on the position information of the recognized error words so as to correct the original voice recognition text and obtain the actual text corresponding to the voice to be recognized.

9. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program is executed by a processor, the computer-readable storage medium controls an apparatus to execute the method for correcting a speech recognition text according to any one of claims 1 to 7.

10. A processor for executing a computer program, wherein the computer program executes the method for correcting a speech recognition text according to any one of claims 1 to 7.