CN113160822B

CN113160822B - Speech recognition processing method, device, electronic equipment and storage medium

Info

Publication number: CN113160822B
Application number: CN202110488931.0A
Authority: CN
Inventors: 夏帅; 黄伟琦; 江鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-05-30
Anticipated expiration: 2041-04-30
Also published as: CN113160822A

Abstract

The application provides a voice recognition processing method, a voice recognition processing device, electronic equipment and a storage medium, and relates to the artificial intelligence fields such as natural language processing and voice technical fields. The specific implementation scheme is as follows: determining a usage scenario of the speech recognition; acquiring a corresponding preset hotword set according to the use scene; when voice recognition is carried out on voice information and a voice recognition result is obtained, determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to first pinyin information of each hot word in the preset hot word set and second pinyin information of the voice recognition result; and replacing the word to be replaced in the voice recognition result with the corresponding target hotword. According to the method and the device, on the premise that additional manpower and time cost are not needed, the error rate of the voice recognition result is reduced, and the voice recognition effect under a specific scene is improved.

Description

Speech recognition processing method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence such as natural language processing and speech technology, and in particular, to a method and apparatus for speech recognition processing, an electronic device, and a storage medium.

Background

Speech recognition technology, also known as automatic speech recognition (Automatic Speech Recognition, abbreviated ASR), aims at converting lexical content in human speech into computer readable inputs, such as keys, binary codes or character sequences. With the development of science and technology, speech recognition technology has been applied to people's life.

In the current speech recognition technology of the conference scene, the situation that the same word recognizes different results and proper nouns in different fields can occur, if the special training is performed, a great deal of manpower and time cost can be consumed, and meanwhile, the training process is relatively complex.

Disclosure of Invention

The application provides a method, a device, electronic equipment and a storage medium for voice recognition processing.

According to a first aspect of the present application, there is provided a method for speech recognition processing, comprising:

determining a usage scenario of the speech recognition;

acquiring a corresponding preset hotword set according to the use scene;

when voice recognition is carried out on voice information and a voice recognition result is obtained, determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to first pinyin information of each hot word in the preset hot word set and second pinyin information of the voice recognition result;

And replacing the word to be replaced in the voice recognition result with the corresponding target hotword.

Wherein, the hotword in the preset hotword set comprises:

according to the hot words obtained through the voice recognition experience of the use scene; and/or the number of the groups of groups,

and in the voice recognition process in the use scene, the replacement words used in the correction operation of the voice recognition result are used.

In an embodiment of the present application, after configuring the hotword to the preset hotword set, the method further includes:

acquiring the pinyin of each character in the hot word;

if the multi-tone character does not exist in the hot word, the pinyin of each character in the hot word is combined to obtain first pinyin information of the hot word;

if the multi-tone character exists in the hot word, performing pairwise permutation and combination on the pinyin of the multi-tone character and the pinyin of other characters in the hot word from left to right to obtain a plurality of pinyin permutation and combination results;

and taking the combination result of the plurality of pinyin arrangement as first pinyin information of the hot word.

In some embodiments of the present application, the determining, according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the voice recognition result, a to-be-replaced word in the voice recognition result and a target hot word corresponding to the to-be-replaced word includes:

Determining first pinyin information of each hotword in the preset hotword set;

preprocessing the voice recognition result; the preprocessing comprises filtering punctuation marks, filtering special characters and English characters;

converting the preprocessed voice recognition result into corresponding pinyin so as to obtain second pinyin information of the voice recognition result;

comparing the first pinyin information of each hot word with the second pinyin information of the voice recognition result, and determining target pinyin with the same syllable composition and structure from the first pinyin information and the second pinyin information;

determining a text corresponding to the target pinyin in the voice recognition result as the word to be replaced according to the second pinyin information;

and determining the hot word corresponding to the target pinyin as the target hot word.

In some embodiments of the present application, the voice recognition processing method further includes:

judging whether punctuation marks and/or special characters exist among texts in the words to be replaced;

if punctuation marks and/or special characters exist among the texts in the words to be replaced, the step of replacing the words to be replaced in the voice recognition result with the corresponding target hotwords is not executed;

And if punctuation marks and/or special characters do not exist among the texts in the words to be replaced, executing the step of replacing the words to be replaced in the voice recognition result with the corresponding target hotwords.

Optionally, in some embodiments of the present application, the voice recognition processing method further includes:

acquiring a preset word set;

and performing text matching according to each word in the preset word set and the voice recognition result, and performing replacement processing by taking the matched text in the voice recognition result as the word.

Wherein, the mood words in the preset mood word set include:

according to the speech recognition experience of the use scene, obtaining a mood word; and/or the number of the groups of groups,

according to the speech habit of the speaker in the use scene, obtaining a mood word; and/or the number of the groups of groups,

and when the voice recognition result is corrected in the voice recognition process in the use scene, replacing the word with the empty character in the voice recognition result as a mood word.

According to a second aspect of the present application, there is provided a speech recognition processing apparatus comprising:

a first determining module, configured to determine a usage scenario of the speech recognition;

The first acquisition module is used for acquiring a corresponding preset hotword set according to the use scene;

the second determining module is used for determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the voice recognition result when voice recognition is carried out on the voice information and a voice recognition result is obtained;

and the first replacing module is used for replacing the word to be replaced in the voice recognition result with the corresponding target hotword.

The hot words in the preset hot word set acquired by the first acquisition module include:

In this embodiment of the present application, the speech recognition processing device further includes a hotword configuration module, where the hotword configuration module is configured to:

after the hot words are configured to the preset hot word set, pinyin of each character in the hot words is obtained;

In some embodiments of the present application, the second determining module is configured to:

determining first pinyin information of each hotword in the preset hotword set;

In some embodiments of the present application, the voice recognition processing device further includes:

the judging module is used for judging whether punctuation marks and/or special characters exist among the texts in the words to be replaced;

if punctuation marks and/or special characters exist among texts in the words to be replaced, the first replacing module does not execute the step of replacing the words to be replaced in the voice recognition result with the corresponding target hotwords;

and if punctuation marks and/or special characters do not exist among the texts in the words to be replaced, the first replacing module executes the step of replacing the words to be replaced in the voice recognition result with the corresponding target hotwords.

In addition, in an embodiment of the present application, the voice recognition processing device further includes:

the second acquisition module is used for acquiring a preset word set;

and the second replacing module is used for carrying out text matching on each word in the preset word set and the voice recognition result, and carrying out replacing processing on the matched text in the voice recognition result as the word.

The second obtaining module obtains a preset word set of words, where the word set of words in the preset word set of words includes:

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech recognition processing method of the first aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the speech recognition processing method according to the first aspect.

According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the speech recognition processing method as described in the first aspect.

According to the technical scheme, the preset hot word set is obtained according to the use scene, when the real-time voice recognition result is subjected to error correction processing, the pinyin information corresponding to the hot word in the hot word set is compared with the pinyin information of the voice recognition result, and the words in the voice recognition result are replaced according to the comparison result, so that the error rate of the voice recognition result is reduced and the voice recognition effect under a specific scene is improved on the premise that additional manpower and time cost are not required. Therefore, the method and the device can effectively realize the replacement function of the hot word in the specific scene, solve the time-consuming problem caused by replacing the model training technology by using the natural language processing NLP in the prior art, and improve the output and display efficiency of the voice recognition result.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flow chart of a speech recognition processing method according to an embodiment of the present application;

FIG. 2 is a flowchart of obtaining first pinyin information according to an embodiment of the present application;

FIG. 3 is a flowchart for determining a word to be replaced and a corresponding target hotword in a speech recognition result according to an embodiment of the present application;

FIG. 4 is a flowchart of another speech recognition processing method according to an embodiment of the present application;

FIG. 5 is a flow chart of yet another speech recognition processing method according to an embodiment of the present application;

FIG. 6 is a block diagram of a speech recognition processing device according to an embodiment of the present application;

FIG. 7 is a block diagram of another speech recognition processing device according to an embodiment of the present application;

FIG. 8 is a block diagram of a further speech recognition processing device according to an embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a speech recognition processing method according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In combination with the problems existing in the voice recognition technology in the conference scene, the existing solution is to replace hot words by adopting a natural language processing technology and artificial text replacement. The natural language processing technology can use a large amount of texts, text training is carried out based on a natural language processing algorithm, a trained text model is obtained, and hot word replacement is carried out on the input text on the model. However, because word collocation modes in different fields are different, a model which is universal for all fields cannot be trained, so that model training is required to be performed again in different fields, the required time is long, and instantaneity cannot be guaranteed. In addition, the manual text replacement mode needs to manually mark out words to be replaced, so that great labor cost is consumed.

Aiming at the existing problems, the application provides a voice recognition processing method, a voice recognition processing device, electronic equipment and a storage medium.

Fig. 1 is a flowchart of a voice recognition processing method according to an embodiment of the present application, and it should be noted that the voice recognition processing method according to the embodiment of the present application may be applied to a voice recognition processing device according to an embodiment of the present application, and the voice recognition processing device may be configured in an electronic apparatus. As shown in fig. 1, the implementation steps of the method include:

Step 101, determining a usage scenario of speech recognition.

Because the words and word collocation manners used in different scenes are different, in the embodiment of the application, the use scene of voice recognition needs to be determined first, and error correction processing is performed on the voice recognition result by using a preset hot word set in the scene aiming at the use scene.

Step 102, acquiring a corresponding preset hotword set according to the use scene.

It is understood that a hot word set is a set of words such as words, proper nouns, or words in which recognition errors frequently occur in speech recognition under a certain field. Under different use scenes, when the voice recognition technology is adopted for voice recognition, words to be recognized have larger variability, and if the use scenes are not considered, different results and proper noun recognition errors of the same word recognized in different fields can occur. Therefore, in the embodiment of the present application, in order to avoid the occurrence of the above problem, a manner of presetting a corresponding hot word set for different usage scenarios is adopted.

Step 103, when the voice information is subjected to voice recognition and a voice recognition result is obtained, determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the voice recognition result.

That is, when performing speech recognition, the pinyin information of each hot word in the preset hot word set is compared with the pinyin information in the speech recognition result, and words corresponding to which pinyin in the pinyin of the speech recognition result are hot words which need to be replaced in the preset hot word set are found.

In the embodiment of the present application, the word to be replaced in the speech recognition result may be understood as a word whose pinyin information in the speech recognition result is consistent with the pinyin information of the hot word in the preset hot word set. The target hot word corresponding to the word to be replaced, namely, the pinyin information of the word to be replaced is consistent with the pinyin information of a certain hot word in the preset hot word set, and the hot word is the target hot word corresponding to the word to be replaced. In the embodiment of the present application, the implementation manner of determining the word to be replaced and the hot word corresponding to the word to be replaced may be: aiming at the voice information, performing voice recognition according to the existing voice recognition technology to obtain a voice recognition result; searching in the pinyin information of each hot word in the preset hot word set according to the pinyin information of the voice recognition result, and finding out the word with the pinyin information consistent with the pinyin information of the hot word in the preset hot word set in the voice recognition result, wherein the word is the word to be replaced; and finding out a hot word consistent with the pinyin information of the hot word in the preset hot word set according to the word to be replaced, wherein the hot word is the target hot word corresponding to the word to be replaced.

And 104, replacing the word to be replaced in the voice recognition result with the corresponding target hotword.

That is, the words in the voice recognition result are replaced by the hot words consistent with the pinyin information in the preset hot word set, so that the situation of word recognition errors can be avoided, and the output of the real-time voice recognition result can be met.

According to the voice recognition processing method provided by the embodiment of the application, the preset hot word set is obtained according to the use scene, when the real-time voice recognition result is subjected to error correction processing, the pinyin information corresponding to the hot word in the hot word set is compared with the pinyin information of the voice recognition result, and the words in the voice recognition result are replaced according to the comparison result, so that the error rate of the voice recognition result is reduced and the voice recognition effect under a specific scene is improved on the premise that additional manpower and time cost are not required. Therefore, the method and the device can effectively realize the replacement function of the hot word in a specific scene, can be used for solving the time-consuming problem caused by replacing the model training technology by utilizing the natural voice processing NLP in the prior art, and can improve the output display efficiency of the voice recognition structure.

It should be noted that, the hotwords in the preset hotword set may include, but are not limited to: hotwords obtained according to the speech recognition experience of the usage scenario; and/or, in the voice recognition process in the use scene, the replacement words used in the correction operation of the voice recognition result. For example, in a conference scene, if a word which is not prepared for recognition appears, the word can be added into a hot word set in real time, so that the preset hot word set is updated, and when the subsequent voice recognition is performed, the hot word in the updated hot word set can be directly acquired to perform error correction processing on the voice recognition result.

In order to further improve the output and display efficiency of the voice recognition result, the hot word is immediately subjected to pinyin conversion processing after being configured to a preset hot word set so as to obtain first pinyin information of the hot word. Fig. 2 is a flowchart of acquiring the first pinyin information, and as shown in fig. 2, the implementation manner of acquiring the first pinyin information is as follows:

step 201, the pinyin of each word in the hot word is obtained.

Step 202, judging whether the hotword has polyphones, if not, executing step 203, otherwise executing step 204.

It can be understood that if there are multiple syllables in the hotword, the corresponding pinyin will be multiple, and in order to effectively match the speech recognition result with the hotword in the preset hotword set, special processing needs to be performed for the case that there are multiple syllables in the hotword.

Step 203, the pinyin of each word in the hot word is combined to obtain the first pinyin information of the hot word.

Step 204, the pinyin of the polyphonic character and the pinyin of the other characters in the hot word are arranged and combined pairwise from left to right, and a plurality of pinyin arrangement and combination results are obtained.

That is, the pinyin of the polyphones in the hot word is arranged and combined with the pinyin of the other words, so that any one of the pinyin of the polyphones is combined with the pinyin of the other words in the hot word for the hot word.

As an example, after the hot words are configured, each newly configured hot word is traversed, pinyin of each word in the hot word is obtained, and if the multi-tone word is contained in the hot word, the hot word pinyin is arranged and combined from left to right in the arrangement mode: taking all the pinyin of the first word, supposing a, and all the pinyin of the second word, supposing b, and performing pairwise permutation and combination to obtain a result u with the length of a multiplied by b; and then carrying out pairwise permutation and combination on all the pinyin of the u and the third word, assuming that c are arranged, obtaining a result v, the length of which is a multiplied by b multiplied by c, and the like, and finally obtaining a pinyin result list.

In step 205, the combination result of the plurality of pinyin arrangements is used as the first pinyin information of the hot word.

According to the voice recognition processing method provided by the embodiment of the application, after the hot words are configured to the preset hot word set, the first pinyin information of the hot words is obtained through processing the hot words, a basis is provided for error correction of real-time voice recognition, the use efficiency of the preset hot word set is improved, and the pinyin corresponding to the hot words is obtained immediately after the hot words are pre-configured to the hot word set, so that when the error correction processing is carried out on the real-time voice recognition result, the pinyin corresponding to the hot words is directly compared with the pinyin of the voice recognition result, and the output and display efficiency of the voice recognition result can be further improved. In addition, aiming at the situation that multi-tone characters exist in the hot words, by arranging and combining the hot word pinyin from left to right, the obtained multi-pinyin arrangement and combination result is used as the first pinyin information of the hot words, so that the situation that the voice recognition result is wrong in matching with the hot words in a preset hot word set due to the existence of the multi-tone characters can be avoided, the accuracy of the voice recognition result is improved, and the splicing of all characters in the multi-tone characters of the hot words is immediately arranged and combined after the hot words are configured, so that when the real-time voice recognition result is subjected to error correction processing, all possible pinyin conditions corresponding to the hot words are directly compared with the pinyin of the voice recognition result, the output and display efficiency of the voice recognition result can be further improved, and meanwhile, the voice recognition effect can be further improved.

In order to describe the speech recognition processing method in any of the above embodiments in detail, fig. 3 is a flowchart for determining a word to be replaced and a corresponding target hotword in a speech recognition result, and as shown in fig. 3, the implementation steps are as follows:

step 301, determining first pinyin information of each hotword in a preset hotword set.

Step 302, preprocessing is performed on the voice recognition result, wherein the preprocessing includes filtering punctuation marks, filtering special characters and English characters.

Step 303, converting the preprocessed voice recognition result into corresponding pinyin so as to obtain second pinyin information of the voice recognition result.

When the preprocessed speech recognition result is converted into the corresponding pinyin, if the recognition result contains a word which is a polyphone, the pinyin corresponding to any reading in the polyphone reading is selected.

Step 304, comparing the first pinyin information of each hot word with the second pinyin information of the voice recognition result, and determining the target pinyin with the same syllable composition and structure from the first pinyin information and the second pinyin information.

In the embodiment of the application, the target pinyin can be obtained by the following ways: traversing the whole hot word set to obtain each hot word, and comparing a set formed by traversing the first pinyin information corresponding to each hot word with the second pinyin information of the voice recognition result; and finding the pinyin with the same syllable composition and structure in the first pinyin information and the second pinyin information, and taking the pinyin as a target pinyin. It should be noted that, in the process of one speech recognition, the target pinyin is all pinyins having the same syllable composition and function as the first pinyin in the second pinyin information of the speech recognition result, that is, the number of target pinyins may be plural.

And 305, determining a text corresponding to the target pinyin in the voice recognition result as a word to be replaced according to the second pinyin information.

It can be understood that after the target pinyin is found, that is, the pinyin of the word matched with the hot word in the preset hot word set in the voice recognition result is found, then the text corresponding to the target pinyin in the voice recognition result is replaced by the hot word, so in the embodiment of the application, the text corresponding to the target pinyin in the voice recognition result is determined to be the word to be replaced.

Step 306, determining the hotword corresponding to the target pinyin as the target hotword.

Since punctuation or special symbols and the like may exist in the word to be replaced in the voice recognition process, if the replacement is performed only according to the determined word to be replaced and the target hot word, a replacement error may occur. In order to further improve the accuracy of the replacement, another speech recognition processing method is proposed. Fig. 4 is a flowchart of another voice recognition processing method according to an embodiment of the present application, as shown in fig. 4, where the voice recognition processing method further includes the following steps:

step 404, determining whether punctuation marks and/or special characters exist between the texts in the words to be replaced. If there are no punctuation marks and/or special characters between the texts in the word to be replaced, step 405 is executed, otherwise, not executed.

It can be understood that if punctuation marks and/or special characters exist between texts in the word to be replaced, the description indicates that according to the voice recognition result, the word to be replaced and the corresponding hot word have the same pinyin composition and structure after the punctuation marks and/or the special characters are removed, but the corresponding true meanings are different, and the direct execution of replacement can cause a replacement error. In the embodiment of the application, in order to avoid the occurrence of the type of substitution error, the judgment of whether punctuation marks and/or special characters exist between texts in the words to be substituted is increased, the words to be substituted without the punctuation marks and/or the special characters exist between the texts are subjected to substitution of the words to be substituted and the target hot words, and otherwise, the words to be substituted are not subjected to substitution.

And step 405, replacing the word to be replaced in the voice recognition result with the corresponding target hotword.

It should be noted that, steps 401 to 403 in fig. 4 are consistent with the implementation manner of steps 101 to 103 in fig. 1, and are not described herein again.

According to the voice recognition processing method provided by the embodiment of the application, the step of replacing the word to be replaced in the voice recognition result with the corresponding target hot word is performed only for the situation that the punctuation marks and/or the special characters do not exist among the texts in the word to be replaced by increasing the judgment that whether the punctuation marks and/or the special characters exist among the texts in the word to be replaced or not, otherwise, the step of replacing is not performed, so that the occurrence of the situation of replacing errors is avoided to a certain extent, the accuracy of replacement is further improved, and the voice recognition effect is improved.

In daily communication, people often use speech words among sentences due to reasons such as thinking, emotion expression or expression habit in the communication process, and the speech words are irrelevant to semantic expression, but in the voice recognition process, recognition of the speech words may cause interference to a voice recognition result, so that the voice recognition effect is affected. Therefore, in view of the above problems, the embodiments of the present application propose another speech recognition processing method, and fig. 5 is a flowchart of implementation steps added to the speech recognition processing method in any of the above embodiments. As shown in fig. 5, the voice recognition processing method adds the steps of:

step 506, obtaining a preset word set.

In the embodiment of the application, the preset word and word set is that the words and words used by the talker in the use scene are all configured in the word and word set in advance, so that the configured word and word set is directly obtained for word and word matching when the voice recognition is carried out, and the purpose of configuration for multiple times is achieved. In addition, in the voice recognition process, the non-existing Chinese words in the preset Chinese word set found by the correction operation of the voice recognition result can be added to the preset Chinese word set in real time, so that continuous iterative optimization of the preset Chinese word set is realized.

Wherein, the mood words in the preset mood word set may include, but are not limited to: according to the speech recognition experience of the use scene, obtaining the mood word; and/or, according to the speech habit of the speaker in the use scene, obtaining the mood word; and/or when the voice recognition result is corrected in the voice recognition process in the use scene, the word which is replaced by the empty character in the voice recognition result is used as the word of the tone.

And 507, performing text matching on each word in the preset word set and the voice recognition result, and performing substitution processing on the matched text in the voice recognition result as the word.

That is, if the speech recognition result can be matched with the intonation words in the preset intonation word set, it is indicated that the intonation words exist in the speech recognition result, and the text matched with the speech recognition result needs to be replaced as the intonation words.

In the embodiment of the application, text matching is performed on each word in the preset word set and the voice recognition result, and a text similarity algorithm can be adopted for matching. If a text which can be matched with the intonation word in the preset intonation word set exists in the voice recognition result, the matched text is used as the intonation word, and the intonation word is replaced by an empty character, so that the replacement processing of the intonation word is completed.

It should be noted that, in the embodiment of the present application, steps 502 to 505 and steps 506 to 507 may be performed simultaneously, where steps 501 to 505 in fig. 5 are consistent with the implementation manner of steps 401 to 405 in fig. 4, and are not repeated here.

According to the voice recognition processing method provided by the embodiment of the application, the voice recognition result is matched with each word in the preset word set by acquiring the preset word set, and the matched text in the voice recognition result is used as the word to be replaced, so that the replacement of the word in the voice recognition result is realized, the accuracy of the voice recognition result is further improved, and the voice recognition effect is further improved.

In order to achieve the above embodiments, the present application proposes a speech recognition processing apparatus.

Fig. 6 is a block diagram of a voice recognition processing device according to an embodiment of the present application, as shown in fig. 6, where the device includes:

a first determining module 610, configured to determine a usage scenario of speech recognition;

a first obtaining module 620, configured to obtain a corresponding preset hotword set according to a usage scenario;

the second determining module 630 is configured to determine, when performing speech recognition on the speech information and obtaining a speech recognition result, a word to be replaced in the speech recognition result and a target hot word corresponding to the word to be replaced according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the speech recognition result;

The first replacing module 640 is configured to replace a word to be replaced in the speech recognition result with a corresponding target hotword.

The hotwords in the preset hotword set acquired by the first acquiring module 620 include:

hotwords obtained according to the speech recognition experience of the usage scenario; and/or the number of the groups of groups,

in the speech recognition process in the usage scenario, a replacement word used when a correction operation is performed on the speech recognition result.

In the embodiment of the present application, the speech recognition processing device further includes a hotword configuration module 650, where the hotword configuration module 650 is configured to:

after the hot words are configured to a preset hot word set, pinyin of each word in the hot words is obtained;

if the multi-tone character exists in the hot word, the pinyin of the multi-tone character in the hot word and the pinyin of other characters are arranged and combined pairwise from left to right to obtain a plurality of pinyin arrangement and combination results;

and taking the combination result of the plurality of pinyin arrangements as first pinyin information of the hot word.

In some embodiments of the present application, the second determining module 630 is configured to:

determining first pinyin information of each hot word in a preset hot word set;

Preprocessing a voice recognition result; the preprocessing comprises filtering punctuation marks, filtering special characters and English characters;

according to the second pinyin information, determining a text corresponding to the target pinyin in the voice recognition result as a word to be replaced;

and determining the hotword corresponding to the target pinyin as the target hotword.

According to the voice recognition processing device provided by the embodiment of the application, the preset hot word set is obtained according to the use scene, when the real-time voice recognition result is subjected to error correction processing, the pinyin information corresponding to the hot word in the hot word set is compared with the pinyin information of the voice recognition result, and the words in the voice recognition result are replaced according to the comparison result, so that the error rate of the voice recognition result is reduced and the voice recognition effect under a specific scene is improved on the premise that additional manpower and time cost are not required. Therefore, the method and the device can effectively realize the replacement function of the hot word in a specific scene, can be used for solving the time-consuming problem caused by replacing the model training technology by utilizing the natural voice processing NLP in the prior art, and can improve the output display efficiency of the voice recognition structure.

In order to improve accuracy of speech recognition result replacement, another speech recognition processing device is provided in the embodiment of the present application, and fig. 7 is a block diagram of a speech recognition processing device provided in the embodiment of the present application, as shown in fig. 7, where the device further includes:

the judging module 760 is configured to judge whether punctuation marks and/or special characters exist between the texts in the words to be replaced;

if punctuation marks and/or special characters exist among the texts in the words to be replaced, the first replacing module 740 does not execute the step of replacing the words to be replaced in the voice recognition result with the corresponding target hot words;

if punctuation marks and/or special characters do not exist among the texts among the words to be replaced, the first replacing module 740 performs a step of replacing the words to be replaced among the voice recognition results with corresponding target hotwords.

It should be noted that 710 to 750 in fig. 7 have the same functions and structures as 610 to 650 in fig. 6, and are not described here again.

According to the voice recognition processing device provided by the embodiment of the application, the step of replacing the word to be replaced in the voice recognition result with the corresponding target hot word is performed only when the punctuation marks and/or the special characters do not exist among the texts among the words to be replaced by increasing the judgment of whether the punctuation marks and/or the special characters exist among the texts among the words to be replaced or not, otherwise, the step of replacing is not performed, so that the occurrence of the situation of replacing errors is avoided to a certain extent, the accuracy of replacement is further improved, and the voice recognition effect is improved.

In order to solve the problem of the speech word in the communication, another speech recognition processing device is provided in the embodiment of the present application, and fig. 8 is a block diagram of the speech recognition processing device, as shown in fig. 8, where the device further includes:

a second obtaining module 870, configured to obtain a preset word and phrase set;

and a second replacing module 880, configured to perform text matching on each word in the preset word set and the speech recognition result, and perform replacing processing on the text matched in the speech recognition result as the word.

Wherein, the second obtaining module 870 obtains the mood words in the preset mood word set, which includes:

according to the speech recognition experience of the use scene, obtaining the mood word; and/or the number of the groups of groups,

when the voice recognition result is corrected in the voice recognition process in the use scene, the word in which the voice recognition result is replaced by the empty character is used as the word of the tone.

It should be noted that 810 to 860 in fig. 8 have the same functions and structures as 710 to 760 in fig. 7, and are not described here again.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

According to the voice recognition processing device provided by the embodiment of the application, the voice recognition result is matched with each word in the preset word set by acquiring the preset word set, and the matched text in the voice recognition result is used as the word to be replaced, so that the replacement of the word in the voice recognition result is realized, and the voice recognition effect is further improved.

According to embodiments of the present application, there is also provided an electronic device, a non-transitory computer-readable storage medium storing computer instructions, and a computer program product.

As shown in fig. 9, a block diagram of an electronic device according to a voice recognition processing method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 9, the electronic device includes: one or more processors 901, memory 902, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). In fig. 9, a processor 901 is taken as an example.

Memory 902 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the speech recognition processing methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the voice recognition processing method provided by the present application. The computer program product of the present application comprises a computer program which, when executed by the processor 901, implements the speech recognition processing method proposed in the present application.

The memory 902 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the first determining module 610, the first obtaining module 620, the second determining module 630, the first replacing module 640, and the hotword configuring module 650 shown in fig. 6) corresponding to the speech recognition processing method in the embodiments of the present application. The processor 901 executes various functional applications of the server and data processing, i.e., implements the voice recognition processing method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 902.

The memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device by the voice recognition process, or the like. In addition, the memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 902 optionally includes memory remotely located relative to processor 901, which may be connected to the electronic device of the speech recognition processing method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the voice recognition processing method may further include: an input device 903 and an output device 904. The processor 901, memory 902, input devices 903, and output devices 904 may be connected by a bus or other means, for example in fig. 9.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the speech recognition processing method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output means 904 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present application are achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A speech recognition processing method, comprising:

determining a usage scenario of the speech recognition;

acquiring a corresponding preset hotword set according to the use scene;

when voice information is subjected to voice recognition and a voice recognition result is obtained, determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to first pinyin information of each hot word in the preset hot word set and second pinyin information of the voice recognition result, wherein the word to be replaced is a word with the same pinyin information as the hot word in the preset hot word set;

replacing the word to be replaced in the voice recognition result with the corresponding target hotword;

the determining the word to be replaced in the voice recognition result and the target hot word corresponding to the word to be replaced according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the voice recognition result comprises the following steps:

Determining first pinyin information of each hotword in the preset hotword set;

determining a hot word corresponding to the target pinyin as the target hot word;

wherein, the hotword in the preset hotword set comprises:

in the voice recognition process in the use scene, the replacement words used in the correction operation of the voice recognition result are used;

after configuring the hotword to the preset hotword set, the method further comprises:

Acquiring the pinyin of each character in the hot word;

taking the combination result of the plurality of pinyin arrangement as first pinyin information of the hot word;

if the multi-tone character exists in the hot word, the pinyin of the multi-tone character and the pinyin of other characters in the hot word are arranged and combined pairwise from left to right to obtain a plurality of pinyin arrangement and combination results, wherein the arrangement and combination results comprise;

after the hot words are configured, traversing each newly configured hot word to obtain the pinyin of each word in the hot word, if the hot word contains polyphone, taking all the pinyin of the first word and all the pinyin of the second word, carrying out pairwise permutation and combination to obtain a result u, carrying out pairwise permutation and combination to u and all the pinyin of the third word to obtain a result v, and finally obtaining a result list of the pinyin by analogy.

2. The speech recognition processing method of claim 1 further comprising:

3. The speech recognition processing method of claim 1 further comprising:

acquiring a preset word set;

4. The speech recognition processing method of claim 3 wherein the vocabulary words in the preset vocabulary word set comprise:

5. A speech recognition processing device, comprising:

the second determining module is used for determining a word to be replaced in the voice recognition result and a target hot word corresponding to the word to be replaced according to the first pinyin information of each hot word in the preset hot word set and the second pinyin information of the voice recognition result when voice recognition is carried out on voice information and a voice recognition result is obtained, wherein the word to be replaced is a word with the pinyin information of the hot word in the voice recognition result being consistent with the pinyin information of the hot word in the preset hot word set;

the first replacing module is used for replacing the word to be replaced in the voice recognition result with the corresponding target hotword;

the second determining module is configured to:

determining first pinyin information of each hotword in the preset hotword set;

the hotwords in the preset hotword set acquired by the first acquisition module comprise:

the voice recognition processing device further comprises a hotword configuration module, wherein the hotword configuration module is used for:

6. The speech recognition processing device of claim 5, further comprising:

7. The speech recognition processing device of claim 5, further comprising:

the second acquisition module is used for acquiring a preset word set;

8. The apparatus of claim 7, wherein the speech recognition processing unit further comprises:

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.