CN113160822A

CN113160822A - Speech recognition processing method, speech recognition processing device, electronic equipment and storage medium

Info

Publication number: CN113160822A
Application number: CN202110488931.0A
Authority: CN
Inventors: 夏帅; 黄伟琦; 江鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-07-23
Anticipated expiration: 2041-04-30
Also published as: CN113160822B

Abstract

The application provides a voice recognition processing method and device, electronic equipment and a storage medium, and relates to the artificial intelligence fields such as the natural language processing and voice technical field. The specific implementation scheme is as follows: determining a usage scenario of the speech recognition; acquiring a corresponding preset hot word set according to the use scene; when voice recognition is carried out on voice information and a voice recognition result is obtained, determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to first pinyin information of each hot word in the preset hot word set and second pinyin information of the voice recognition result; and replacing the words to be replaced in the voice recognition result with the corresponding target hot words. According to the method and the device, on the premise that additional manpower and time cost are not needed, the error rate of the voice recognition result is reduced, and the voice recognition effect under a specific scene is improved.

Description

Speech recognition processing method, speech recognition processing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to the field of artificial intelligence in natural language processing and speech technologies, and in particular, to a speech recognition processing method and apparatus, an electronic device, and a storage medium.

Background

Speech Recognition technology, also known as Automatic Speech Recognition (ASR), aims at converting the vocabulary content of human Speech into computer-readable input, such as keystrokes, binary codes or character sequences. With the development of scientific technology, speech recognition technology has been applied to the lives of people.

In the speech recognition technology of the current conference scene, the situation that the same word recognizes different results and proper nouns in different fields is wrong can occur, if the special training is carried out, a large amount of labor and time cost can be consumed, and meanwhile, the training process is relatively complex.

Disclosure of Invention

The application provides a method, a device, an electronic device and a storage medium for voice recognition processing.

According to a first aspect of the present application, there is provided a method for speech recognition processing, comprising:

determining a usage scenario of the speech recognition;

acquiring a corresponding preset hot word set according to the use scene;

when voice recognition is carried out on voice information and a voice recognition result is obtained, determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to first pinyin information of each hot word in the preset hot word set and second pinyin information of the voice recognition result;

and replacing the words to be replaced in the voice recognition result with the corresponding target hot words.

Wherein, the hotword in the preset hotword set comprises:

hot words are obtained according to the voice recognition experience of the use scene; and/or the presence of a gas in the gas,

and in the voice recognition process in the use scene, carrying out correction operation on the voice recognition result to obtain the used replacement words.

In an embodiment of the present application, after configuring hotwords to the preset hotword set, the method further includes:

obtaining the pinyin of each character in the hot words;

if the polyphone character does not exist in the hot word, combining the pinyin of each character in the hot word to obtain first pinyin information of the hot word;

if polyphone characters exist in the hot words, the pinyin of the polyphone characters in the hot words and the pinyin of other characters are arranged and combined pairwise from left to right to obtain a plurality of pinyin arrangement and combination results;

and taking the multiple pinyin arrangement and combination results as first pinyin information of the hot word.

In some embodiments of the present application, the determining, according to the first pinyin information of each hotword in the preset hotword set and the second pinyin information of the speech recognition result, a word to be replaced and a target hotword corresponding to the word to be replaced in the speech recognition result includes:

determining first pinyin information of each hot word in the preset hot word set;

preprocessing the voice recognition result; wherein, the preprocessing comprises at least one of filtering punctuation marks, filtering special characters and English characters;

converting the preprocessed voice recognition result into corresponding pinyin to obtain second pinyin information of the voice recognition result;

comparing the first pinyin information of each hot word with the second pinyin information of the voice recognition result, and determining target pinyin with the same syllable composition and structure from the first pinyin information and the second pinyin information;

determining a text corresponding to the target pinyin in the voice recognition result as the word to be replaced according to the second pinyin information;

and determining the hot word corresponding to the target pinyin as the target hot word.

In some embodiments of the present application, the speech recognition processing method further includes:

judging whether punctuation marks and/or special characters exist among the texts of the words to be replaced;

if punctuation marks and/or special characters exist among the texts in the words to be replaced, the step of replacing the words to be replaced in the voice recognition result with the corresponding target hot words is not executed;

and if punctuation marks and/or special characters do not exist in the texts in the words to be replaced, executing the step of replacing the words to be replaced in the voice recognition result with the corresponding target hot words.

Optionally, in some embodiments of the present application, the speech recognition processing method further includes:

acquiring a preset tone word set;

and performing text matching on each tone word in the preset tone word set and the voice recognition result, and performing replacement processing on the text matched in the voice recognition result as the tone word.

Wherein, the language meaning words in the preset language meaning word set comprise:

obtaining a tone word according to the speech recognition experience of the use scene; and/or the presence of a gas in the gas,

obtaining the tone words according to the speaking habits of the speaker in the use scene; and/or the presence of a gas in the gas,

and when the voice recognition result is corrected in the voice recognition process in the use scene, replacing a word with a null character in the voice recognition result as a tone word.

According to a second aspect of the present application, there is provided a speech recognition processing apparatus including:

the first determination module is used for determining the use scene of the voice recognition;

the first acquisition module is used for acquiring a corresponding preset hot word set according to the use scene;

the second determining module is used for determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the voice recognition result when voice recognition is carried out on voice information and a voice recognition result is obtained;

and the first replacing module is used for replacing the words to be replaced in the voice recognition result into the corresponding target hot words.

The hot words in the preset hot word set acquired by the first acquisition module include:

In this embodiment of the present application, the speech recognition processing apparatus further includes a hotword configuration module, where the hotword configuration module is configured to:

after hot words are configured to the preset hot word set, obtaining the pinyin of each word in the hot words;

In some embodiments of the present application, the second determining module is configured to:

In some embodiments of the present application, the speech recognition processing apparatus further comprises:

the judging module is used for judging whether punctuation marks and/or special characters exist among the texts of the words to be replaced;

if punctuation marks and/or special characters exist among the texts in the words to be replaced, the first replacement module does not execute the step of replacing the words to be replaced in the voice recognition result with the corresponding target hot words;

and if punctuation marks and/or special characters do not exist among the words to be replaced, the first replacement module executes the step of replacing the words to be replaced in the voice recognition result with the corresponding target hot words.

Furthermore, in an embodiment of the present application, the speech recognition processing apparatus further includes:

the second acquisition module is used for acquiring a preset tone word set;

and the second replacement module is used for performing text matching on each tone word in the preset tone word set and the voice recognition result and performing replacement processing on the text matched in the voice recognition result as the tone word.

The language meaning words in the preset language meaning word set acquired by the second acquisition module include:

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech recognition processing method of the first aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the speech recognition processing method according to the first aspect.

According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the speech recognition processing method according to the first aspect.

According to the technical scheme, the pre-configured hot word set is obtained according to the use scene, when the real-time voice recognition result is subjected to error correction processing, the pinyin information corresponding to the hot words in the hot word set is compared with the pinyin information of the voice recognition result, and the words in the voice recognition result are replaced according to the comparison result, so that the error rate of the voice recognition result is reduced and the voice recognition effect under the specific scene is improved on the premise of not adding manpower and time cost. Therefore, the method and the device can effectively realize the function of replacing the hot words in a specific scene, solve the time-consuming problem caused by replacing the model training technology in the natural language processing NLP in the prior art, and improve the output display efficiency of the voice recognition result.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a flowchart of a speech recognition processing method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a process of obtaining first pinyin information according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a method for determining a word to be replaced and a target hotword corresponding to the word in a speech recognition result according to an embodiment of the present application;

FIG. 4 is a flow chart of another speech recognition processing method according to an embodiment of the present application;

FIG. 5 is a flow chart of another speech recognition processing method according to an embodiment of the present application;

fig. 6 is a block diagram of a speech recognition processing apparatus according to an embodiment of the present application;

fig. 7 is a block diagram of another speech recognition processing apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of a structure of another speech recognition processing apparatus according to an embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a speech recognition processing method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In combination with the problems of the speech recognition technology in the conference scene, the existing solution is to replace the hotword by adopting a natural language processing technology and artificial text replacement. The natural language processing technology uses a large amount of texts, performs text training based on a natural language processing algorithm to obtain a trained text model, and performs hot word replacement on the input text on the model. However, because the word collocation modes in different fields are different, a model which is general to all fields cannot be trained, so that model training needs to be carried out again in different fields, the required time is long, and the instantaneity cannot be guaranteed. In addition, the manual text replacement method needs to manually mark out the words to be replaced, which consumes a large labor cost.

In order to solve the existing problems, the application provides a voice recognition processing method, a voice recognition processing device, an electronic device and a storage medium.

Fig. 1 is a flowchart of a speech recognition processing method according to an embodiment of the present application, and it should be noted that the speech recognition processing method according to the embodiment of the present application can be applied to a speech recognition processing apparatus according to the embodiment of the present application, and the speech recognition processing apparatus can be configured in an electronic device. As shown in fig. 1, the implementation steps of the method include:

step 101, determining a use scene of voice recognition.

In the embodiment of the present application, it is first required to determine a usage scenario of speech recognition, and perform error correction processing on a speech recognition result by using a preset hot word set in the usage scenario.

And 102, acquiring a corresponding preset hot word set according to the use scene.

It is understood that a hotword set is a set of words such as words, proper nouns or words with frequent recognition errors in speech recognition in a certain field. In different use scenes, when the speech recognition technology is adopted for speech recognition, the words to be recognized have great difference, and if the use scenes are not considered, the situations that different results and proper nouns of the same word are recognized in different fields are wrong may occur. Therefore, in the embodiment of the present application, in order to avoid the above problem, a manner of presetting corresponding hotword sets for different usage scenarios is adopted.

103, when voice recognition is carried out on the voice information and a voice recognition result is obtained, determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the voice recognition result.

That is to say, when performing voice recognition, the pinyin information of each hot word in the preset hot word set is compared with the pinyin information in the voice recognition result, and words corresponding to which pinyins in the pinyins of the voice recognition result are found as hot words that need to be replaced by the hot words in the preset hot word set.

In the embodiment of the application, the word to be replaced in the voice recognition result can be understood as a word whose pinyin information in the voice recognition result is consistent with the pinyin information of the hot word in the preset hot word set. And the target hot word corresponding to the word to be replaced, namely the pinyin information of the word to be replaced is consistent with the pinyin information of a certain hot word in the preset hot word set, and the hot word is the target hot word corresponding to the word to be replaced. In this embodiment of the present application, an implementation manner of determining a word to be replaced and a hotword corresponding to the word to be replaced may be: aiming at the voice information, performing voice recognition according to the existing voice recognition technology to obtain a voice recognition result; searching in the pinyin information of each hot word in the preset hot word set according to the pinyin information of the voice recognition result, and finding out a word with the pinyin information in the voice recognition result being consistent with the pinyin information of the hot words in the preset hot word set, wherein the word is a word to be replaced; and finding out a hot word consistent with the pinyin information in the preset hot word set according to the word to be replaced, wherein the hot word is a target hot word corresponding to the word to be replaced.

And 104, replacing the words to be replaced in the voice recognition result with corresponding target hot words.

That is to say, the words in the voice recognition result are replaced by the hot words consistent with the pinyin information in the preset hot word set, so that the situation of wrong word recognition can be avoided, and meanwhile, the output of the real-time voice recognition result can be met.

According to the voice recognition processing method provided by the embodiment of the application, the pre-configured hot word set is obtained according to the use scene, when the real-time voice recognition result is subjected to error correction processing, the pinyin information corresponding to the hot words in the hot word set is compared with the pinyin information of the voice recognition result, and the words in the voice recognition result are replaced according to the comparison result, so that the error rate of the voice recognition result is reduced and the voice recognition effect under the specific scene is improved on the premise of not adding additional labor and time cost. Therefore, the hot word replacing function under a specific scene can be effectively realized, time-consuming problems caused by model training technology replacement in natural speech processing NLP in the prior art can be solved, and the output display efficiency of the speech recognition structure can be improved.

It should be noted that the hotword in the preset hotword set may include, but is not limited to: hot words are obtained according to the experience of speech recognition of the use scene; and/or, in the speech recognition process in the use scene, a replacement word used when a correction operation is performed on the speech recognition result. For example, in a conference scene, if a word which is not prepared for recognition occurs, the word can be added to the hot word set in real time, so that the preset hot word set is updated, and during subsequent speech recognition, the hot words in the updated hot word set can be directly acquired to correct the speech recognition result.

In order to further improve the output display efficiency of the voice recognition result, the hot word is immediately subjected to pinyin conversion processing after being configured to a preset hot word set so as to obtain first pinyin information of the hot word. Fig. 2 is a flowchart of acquiring first pinyin information, and as shown in fig. 2, the first pinyin information is acquired in the following manner:

step 201, obtaining pinyin of each character in the hotword.

Step 202, determining whether polyphone characters exist in the hotword, if not, executing step 203, otherwise, executing step 204.

It can be understood that if polyphone characters exist in the hotword, a plurality of corresponding pinyins exist, and special processing needs to be performed for the situation that polyphone characters exist in the hotword in order to enable the voice recognition result to be effectively matched with the hotword in the preset hotword set.

Step 203, combining the pinyin of each character in the hot word to obtain first pinyin information of the hot word.

And step 204, performing pairwise arrangement and combination on the pinyin of the polyphone in the hot word and the pinyins of other characters from left to right to obtain a plurality of pinyin arrangement and combination results.

That is, the pinyin of the polyphone in the hot word and the pinyins of other characters are arranged and combined, so that any one pinyin of the polyphone and the pinyins of other characters in the hot word are combined aiming at the hot word.

As an example, after configuring the hot words, traversing each newly configured hot word to obtain the pinyin of each character in the hot words, and if the hot words contain polyphonic characters, arranging and combining the pinyins of the hot words from left to right in the following manner: taking all the pinyin of the first character, assuming that a pinyin exists, and all the pinyin of the second character, assuming that b pinyin exists, and performing pairwise arrangement and combination to obtain a result u with the length of a multiplied by b; and then, assuming that c pinyin of u and the third character are arranged and combined pairwise to obtain a result v, wherein the length of the result v is axbxc, and the rest is done in the same way to finally obtain a pinyin result list.

Step 205, using the multiple pinyin permutation and combination results as the first pinyin information of the hotword.

According to the voice recognition processing method provided by the embodiment of the application, after the hot words are configured to the preset hot word set, the first pinyin information of the hot words is obtained through processing the hot words, a foundation is provided for error correction of real-time voice recognition, the use efficiency of the preset hot word set is improved, and the pinyin corresponding to the hot words is obtained immediately after the hot words are configured to the hot word set in advance, so that when the error correction processing is carried out on the real-time voice recognition result, the pinyin corresponding to the hot words is directly compared with the pinyin of the voice recognition result, and the output display efficiency of the voice recognition result can be further improved. In addition, aiming at the condition that polyphone characters exist in the hot words, the hot word pinyins are arranged and combined from left to right, the obtained multiple pinyin arrangement and combination results are used as first pinyin information of the hot words, therefore, the condition that the voice recognition result is mistakenly matched with the hot words in the preset hot word set due to the existence of the polyphone characters can be avoided, the accuracy of the voice recognition result is ensured to be improved, and the splicing of all characters in the polyphone characters of the hot words is immediately arranged and combined after the hot words are configured, so that when the real-time voice recognition result is subjected to error correction processing, all pinyin possible conditions corresponding to the hot words are directly compared with the pinyin of the voice recognition result, the output display efficiency of the voice recognition result can be further improved, and meanwhile, the voice recognition effect can be further improved.

To describe the speech recognition processing method in any of the above embodiments in detail, fig. 3 is a flowchart for determining a word to be replaced and a target hotword corresponding to the word in the speech recognition result, as shown in fig. 3, the implementation steps are as follows:

step 301, determining first pinyin information of each hotword in a preset hotword set.

Step 302, preprocessing the voice recognition result, wherein the preprocessing includes at least one of filtering punctuation marks, filtering special characters and English characters.

Step 303, converting the preprocessed voice recognition result into a corresponding pinyin to obtain second pinyin information of the voice recognition result.

It should be noted that, when the preprocessed voice recognition result is converted into the corresponding pinyin, if the recognition result includes a word that is a polyphone, the pinyin corresponding to any pronunciation in the polyphone pronunciations may be selected.

Step 304, comparing the first pinyin information of each hotword with the second pinyin information of the voice recognition result, and determining the target pinyin with the same syllable composition and structure from the first pinyin information and the second pinyin information.

In the embodiment of the application, obtaining the target pinyin can be realized by the following steps: traversing the whole hot word set to obtain each hot word, and traversing a set formed by the first pinyin information corresponding to each hot word to compare with the second pinyin information of the voice recognition result; and finding the pinyin with the same syllable composition and structure in the first pinyin information and the second pinyin information, and taking the pinyin as the target pinyin. It should be noted that, in one speech recognition processing process, the target pinyin is all the pinyins having the same syllable composition and function as the first pinyin information in the second pinyin information of the speech recognition result, that is, the number of the target pinyins may be multiple.

And 305, determining a text corresponding to the target pinyin in the voice recognition result as a word to be replaced according to the second pinyin information.

It can be understood that after the target pinyin is found, that is, the pinyin of the word in the voice recognition result matching with the hot word in the preset hot word set is found, and then the text corresponding to the target pinyin in the voice recognition result is replaced by the hot word, so in the embodiment of the present application, the text corresponding to the target pinyin in the voice recognition result is determined as the word to be replaced.

And step 306, determining the hot word corresponding to the target pinyin as the target hot word.

In the speech recognition process, punctuations or special symbols may exist in the words to be replaced, and if the replacement is performed only according to the determined words to be replaced and the target hot words, a situation of a replacement error may occur. In order to further improve the accuracy of the replacement, the application proposes another speech recognition processing method. Fig. 4 is a flowchart of another speech recognition processing method according to an embodiment of the present application, and as shown in fig. 4, the speech recognition processing method further includes the following steps:

step 404, determine whether punctuation and/or special characters exist between the texts of the words to be replaced. If punctuation marks and/or special characters do not exist in the text of the word to be replaced, step 405 is executed, otherwise, the step is not executed.

It can be understood that if punctuation marks and/or special characters exist among texts of the word to be replaced, the description shows that according to the result of speech recognition, the word to be replaced and the corresponding hot word have the same pinyin composition and structure after the punctuation marks and/or the special characters are removed, but the corresponding real meanings are different, and a replacement error occurs when the replacement is directly executed. In the embodiment of the application, in order to avoid the occurrence of the type of replacement error, the judgment of whether punctuation marks and/or special characters exist between texts in the words to be replaced is added, the words to be replaced and the target hot words are replaced only when the words to be replaced do not exist the punctuation marks and/or the special characters between the texts, and otherwise, the words to be replaced and the target hot words are not executed.

And step 405, replacing the words to be replaced in the voice recognition result into corresponding target hot words.

It should be noted that steps 401 to 403 in fig. 4 are the same as the implementation manners of steps 101 to 103 in fig. 1, and are not described herein again.

According to the speech recognition processing method provided by the embodiment of the application, the step of replacing the word to be replaced in the speech recognition result with the corresponding target hot word is executed only when the punctuation marks and/or the special characters exist in the text of the word to be replaced, otherwise, the step of replacing is not executed, so that the situation of replacement errors is avoided to a certain extent, the accuracy of replacement is further improved, and the speech recognition effect is improved.

In daily communication, people often use the tone words among sentences due to reasons such as thinking, emotional expression or expression habits in the communication process, and the like, and the tone words are unrelated to semantic expression. Therefore, in view of the above problems, the present application provides another speech recognition processing method, and fig. 5 is a flowchart of additional implementation steps for the speech recognition processing method in any of the above embodiments. As shown in fig. 5, the speech recognition processing method adds the following steps:

step 506, a preset tone word set is obtained.

In the embodiment of the application, the preset tone word set is that tone words used by a speaker in a use scene are all configured in the tone word set in advance, so that the configured tone word set is directly obtained for tone word matching during voice recognition, and the purpose of being available for multiple times by one-time configuration is achieved. In addition, in the voice recognition process, the tone words which do not exist in the preset tone word set and are found by the correction operation aiming at the voice recognition result can be added to the preset tone word set in real time, so that continuous iteration optimization of the preset tone word set is realized.

The preset mood words in the mood word set may include, but are not limited to: obtaining a language word according to the speech recognition experience of the use scene; and/or the tone words are obtained according to the speaking habits of the speaker in the use scene; and/or when the voice recognition result is corrected in the voice recognition process in the use scene, replacing the word with the empty character in the voice recognition result as the language word.

And 507, performing text matching on each tone word in the preset tone word set and the voice recognition result, and performing replacement processing on the text matched in the voice recognition result as a tone word.

That is, if the speech recognition result can be matched with the linguistic words in the preset set of linguistic words, which indicates that there are linguistic words in the speech recognition result, the text matched in the speech recognition result needs to be replaced as the linguistic words.

In the embodiment of the application, each tone word in the preset tone word set is subjected to text matching with the voice recognition result, and the matching can be performed by adopting a text similarity algorithm. And if the voice recognition result has a text which can be matched with the Chinese language words in the preset Chinese language word set, taking the matched text as the Chinese language words, and replacing the Chinese language words with the empty characters to finish the replacement processing of the Chinese language words.

It should be noted that, in the embodiment of the present application, steps 502 to 505 and steps 506 to 507 may be performed simultaneously, where steps 501 to 505 in fig. 5 are consistent with steps 401 to 405 in fig. 4, and are not described herein again.

According to the voice recognition processing method provided by the embodiment of the application, the preset tone word set is obtained, the voice recognition result is matched with each tone word in the preset tone word set, and the text matched in the voice recognition result is used as a tone word to be replaced, so that the tone word in the voice recognition result is replaced, the accuracy of the voice recognition result is further improved, and the voice recognition effect is further improved.

In order to implement the above embodiments, the present application proposes a speech recognition processing apparatus.

Fig. 6 is a block diagram of a speech recognition processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes:

a first determining module 610, configured to determine a usage scenario of speech recognition;

a first obtaining module 620, configured to obtain a corresponding preset hotword set according to a usage scenario;

a second determining module 630, configured to determine a word to be replaced and a target hotword corresponding to the word to be replaced in the voice recognition result according to the first pinyin information of each hotword in the preset hotword set and the second pinyin information of the voice recognition result when performing voice recognition on the voice information and obtaining a voice recognition result;

and the first replacing module 640 is used for replacing the word to be replaced in the voice recognition result into a corresponding target hot word.

The hotword in the preset hotword set acquired by the first acquiring module 620 includes:

hot words are obtained according to the experience of speech recognition of the use scene; and/or the presence of a gas in the gas,

in the speech recognition process in the use scene, a replacement word used when a correction operation is performed on the speech recognition result.

In the embodiment of the present application, the speech recognition processing apparatus further includes a hotword configuration module 650, where the hotword configuration module 650 is configured to:

after the hot words are configured to a preset hot word set, obtaining the pinyin of each character in the hot words;

if the polyphone exists in the hot word, the pinyin of the polyphone in the hot word and the pinyin of other characters are arranged and combined pairwise from left to right to obtain a plurality of pinyin arrangement and combination results;

and taking the multiple pinyin permutation and combination results as first pinyin information of the hotword.

In some embodiments of the present application, the second determining module 630 is configured to:

determining first pinyin information of each hot word in a preset hot word set;

preprocessing a voice recognition result; wherein, the preprocessing comprises at least one of filtering punctuation marks, filtering special characters and English characters;

determining a text corresponding to the target pinyin in the voice recognition result as a word to be replaced according to the second pinyin information;

According to the speech recognition processing device provided by the embodiment of the application, the pre-configured hot word set is obtained according to the use scene, when the real-time speech recognition result is subjected to error correction processing, the pinyin information corresponding to the hot words in the hot word set is compared with the pinyin information of the speech recognition result, and the words in the speech recognition result are replaced according to the comparison result, so that the error rate of the speech recognition result is reduced and the speech recognition effect under the specific scene is improved on the premise of not adding additional manpower and time cost. Therefore, the hot word replacing function under a specific scene can be effectively realized, time-consuming problems caused by model training technology replacement in natural speech processing NLP in the prior art can be solved, and the output display efficiency of the speech recognition structure can be improved.

In order to improve the accuracy of replacing the speech recognition result, an embodiment of the present application provides another speech recognition processing apparatus, and fig. 7 is a block diagram of the speech recognition processing apparatus provided in the embodiment of the present application, and as shown in fig. 7, the apparatus further includes:

the judging module 760 is configured to judge whether punctuation marks and/or special characters exist between texts of the words to be replaced;

if punctuation marks and/or special characters exist among the texts in the words to be replaced, the first replacement module 740 does not perform the step of replacing the words to be replaced in the voice recognition result with the corresponding target hotwords;

if punctuation marks and/or special characters do not exist between texts in the words to be replaced, the first replacement module 740 performs a step of replacing the words to be replaced in the voice recognition result with corresponding target hotwords.

It should be noted that 710 to 750 in fig. 7 have the same functions and structures as 610 to 650 in fig. 6, and are not described herein again.

According to the speech recognition processing device provided by the embodiment of the application, the step of replacing the word to be replaced in the speech recognition result with the corresponding target hot word is executed only when the judgment of whether the punctuation marks and/or the special characters exist in the text of the word to be replaced is added and the punctuation marks and/or the special characters do not exist in the text of the word to be replaced, otherwise, the replacement step is not executed, so that the situation of replacement errors is avoided to a certain extent, the accuracy of replacement is further improved, and the speech recognition effect is improved.

To address the problem of a speech word in communication, an embodiment of the present application provides another speech recognition processing apparatus, fig. 8 is a block diagram of a structure of the speech recognition processing apparatus, and as shown in fig. 8, the apparatus further includes:

the second obtaining module 870 is configured to obtain a preset linguistic-temporal set;

a second replacing module 880, configured to perform text matching on each of the preset corpus words and the speech recognition result, and perform replacement processing on a text matched in the speech recognition result as a corpus word.

The preset linguistic words in the preset linguistic word set acquired by the second acquiring module 870 include:

obtaining a language word according to the speech recognition experience of the use scene; and/or the presence of a gas in the gas,

and when the voice recognition result is corrected in the voice recognition process in the use scene, replacing the word with the empty character in the voice recognition result as the language word.

It should be noted that 810 to 860 in fig. 8 have the same functions and structures as 710 to 760 in fig. 7, and are not described herein again.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

According to the voice recognition processing device provided by the embodiment of the application, the preset tone word set is obtained, the voice recognition result is matched with each tone word in the preset tone word set, and the text matched in the voice recognition result is used as the tone word to be replaced, so that the tone word in the voice recognition result is replaced, and the voice recognition effect is further improved.

There is also provided, in accordance with an embodiment of the present application, an electronic device, a non-transitory computer-readable storage medium having stored thereon computer instructions, and a computer program product.

Fig. 9 is a block diagram of an electronic device according to the speech recognition processing method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 9, the electronic apparatus includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 9 illustrates an example of a processor 901.

Memory 902 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the speech recognition processing method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the speech recognition processing method provided by the present application. The computer program product of the present application includes a computer program, and when the computer program is executed by the processor 901, the speech recognition processing method provided by the present application is implemented.

The memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the speech recognition processing method in the embodiments of the present application (e.g., the first determining module 610, the first obtaining module 620, the second determining module 630, the first replacing module 640, and the hotword configuring module 650 shown in fig. 6). The processor 901 executes various functional applications of the server and data processing, i.e., implements the voice recognition processing method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 902.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the stored data area may store data created from use of the electronic device by the voice recognition process, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include a memory remotely located from the processor 901, and these remote memories may be connected to the electronic device of the speech recognition processing method through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the speech recognition processing method may further include: an input device 903 and an output device 904. The processor 901, the memory 902, the input device 903 and the output device 904 may be connected by a bus or other means, and fig. 9 illustrates the connection by a bus as an example.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the voice recognition processing method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions of the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A speech recognition processing method, comprising:

determining a usage scenario of the speech recognition;

acquiring a corresponding preset hot word set according to the use scene;

2. The speech recognition processing method of claim 1, wherein the hotwords in the preset hotword set comprise:

3. The speech recognition processing method of claim 2, after configuring hotwords to the preset hotword set, the method further comprising:

obtaining the pinyin of each character in the hot words;

4. The speech recognition processing method according to any one of claims 1 to 3, wherein the determining, according to the first pinyin information of each hotword in the preset hotword set and the second pinyin information of the speech recognition result, a word to be replaced and a target hotword corresponding to the word to be replaced in the speech recognition result includes:

5. The speech recognition processing method of claim 1, further comprising:

6. The speech recognition processing method of claim 1, further comprising:

acquiring a preset tone word set;

7. The speech recognition processing method of claim 6, wherein the mood words in the preset set of mood words comprise:

8. A speech recognition processing apparatus comprising:

9. The speech recognition processing apparatus according to claim 8, wherein the hotword in the preset hotword set acquired by the first acquiring module includes:

10. The speech recognition processing device of claim 9, further comprising a hotword configuration module to:

11. The speech recognition processing apparatus of any of claims 8 to 10, the second determination module to:

12. The speech recognition processing device of claim 8, further comprising:

13. The speech recognition processing device of claim 8, further comprising:

the second acquisition module is used for acquiring a preset tone word set;

14. The speech recognition processing apparatus according to claim 13, wherein the semantic words in the preset set of semantic words acquired by the second acquiring module include:

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.