CN111739514A - Voice recognition method, device, equipment and medium - Google Patents
Voice recognition method, device, equipment and medium Download PDFInfo
- Publication number
- CN111739514A CN111739514A CN201910710043.1A CN201910710043A CN111739514A CN 111739514 A CN111739514 A CN 111739514A CN 201910710043 A CN201910710043 A CN 201910710043A CN 111739514 A CN111739514 A CN 111739514A
- Authority
- CN
- China
- Prior art keywords
- pinyin
- data
- sequence
- standard
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000002457 bidirectional effect Effects 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 abstract description 23
- 238000010586 diagram Methods 0.000 description 11
- 238000012937 correction Methods 0.000 description 8
- 238000013507 mapping Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The embodiment of the invention discloses a voice recognition method, a voice recognition device, voice recognition equipment and a voice recognition medium, wherein the method comprises the following steps: acquiring voice data to be recognized, and determining original pinyin data corresponding to the voice data to be recognized; correcting the original pinyin data to obtain pinyin data to be matched; and matching the pinyin data to be matched with a pre-established standard pinyin sequence, and determining text data corresponding to the voice data to be recognized according to a matching result. The voice recognition method provided by the embodiment of the invention corrects the original pinyin data and recognizes based on the corrected voice data, thereby improving the accuracy of voice recognition and further improving the response accuracy of voice intelligent customer service.
Description
Technical Field
Embodiments of the present invention relate to the field of information processing, and in particular, to a method, an apparatus, a device, and a medium for speech recognition.
Background
With the continuous development of network technology, the application of voice recognition is more and more extensive, for example, in a response scene of voice intelligent customer service, the purpose of solving the user problem in the response scene of the voice robot can be achieved through voice response interaction.
The response for realizing the voice intelligent customer service comprises the following steps: converting voice input by a user into characters, identifying user intention based on a voice-to-character result, acquiring a response text corresponding to the characters based on the user intention, and then converting the response text into voice to broadcast response. The current main way of converting voice into text is as follows: the method comprises the steps of collecting a voice sample, marking features in the voice sample, training a model based on a deep learning algorithm (such as a recurrent neural network, a rolling machine neural network and the like) to obtain a trained voice recognition model, carrying out real-time voice recognition through the trained voice recognition model, and converting voice into characters.
In the process of implementing the invention, the inventor finds that at least the following technical problems exist in the prior art: the recognition result is relatively fixed by using the universal speech corpus training, but due to the characteristics of accent and Chinese expression of a user and the change of the volume of background noise or dictation of the user, the problems of wrong recognition of near-phonetic words, word missing recognition and the like are caused, so that the result of converting speech into characters is wrong, further, the recognition of the user intention based on the result of converting speech into characters is inconsistent with the actual intention of the user, the response is inaccurate, the expression of the user is diversified, and the model suitable for all users is not easy to realize by training.
Disclosure of Invention
The embodiment of the invention provides a voice recognition method, a voice recognition device, voice recognition equipment and a voice recognition medium, so that the accuracy of voice recognition is improved, and the response accuracy of voice intelligent customer service is further improved.
In a first aspect, an embodiment of the present invention provides a speech recognition method, including:
acquiring voice data to be recognized, and determining original pinyin data corresponding to the voice data to be recognized;
correcting the original pinyin data to obtain pinyin data to be matched;
and matching the pinyin data to be matched with a pre-established standard pinyin sequence, and determining text data corresponding to the voice data to be recognized according to a matching result.
In a second aspect, an embodiment of the present invention further provides a speech recognition apparatus, including:
the pinyin data acquisition module is used for acquiring the voice data to be recognized and determining original pinyin data corresponding to the voice data to be recognized;
the pinyin data calibration module is used for correcting the original pinyin data to obtain pinyin data to be matched;
and the text data determining module is used for matching the pinyin data to be matched with a pre-established standard pinyin sequence and determining text data corresponding to the voice data to be recognized according to a matching result.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a speech recognition method as provided by any of the embodiments of the invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the speech recognition method provided in any embodiment of the present invention.
The embodiment of the invention determines the original pinyin data corresponding to the voice data to be recognized by acquiring the voice data to be recognized; correcting the original pinyin data to obtain pinyin data to be matched; and matching the pinyin data to be matched with a pre-established standard pinyin sequence, determining text data corresponding to the voice data to be recognized according to a matching result, correcting the original pinyin data, and recognizing based on the corrected voice data, so that the accuracy of voice recognition is improved, and further the response accuracy of the voice intelligent customer service is improved.
Drawings
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a speech recognition method according to a second embodiment of the present invention;
fig. 3a is a flowchart of a speech recognition method according to a third embodiment of the present invention;
fig. 3b is a schematic structural diagram of an intelligent customer service system according to a third embodiment of the present invention;
fig. 3c is a schematic flow chart of an intelligent customer service response method according to a third embodiment of the present invention;
fig. 3d is a schematic diagram of an undirected search graph in a speech recognition method according to a third embodiment of the present invention;
fig. 3e is a schematic diagram of a bidirectional matching method in a speech recognition method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a speech recognition apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention. The embodiment can be applied to the situation when voice data is recognized, and is particularly suitable for the situation when voice intelligent customer service performs voice response. The method may be performed by a speech recognition apparatus, which may be implemented in software and/or hardware, for example, the speech recognition apparatus may be configured in a computer device. As shown in fig. 1, the method includes:
s110, voice data to be recognized are obtained, and original pinyin data corresponding to the voice data to be recognized are determined.
In this embodiment, the voice data to be recognized may be question information input by a user through voice. In order to make the voice recognition result more accurate, in this embodiment, the initial recognition result is corrected according to the pinyin data, and the final recognition result is determined based on the corrected pinyin data.
Optionally, after problem information (to-be-recognized voice data) in a voice form is acquired, the voice data to be recognized may be input into the voice recognition model through the existing voice recognition model that converts the voice data into text information, so as to acquire text data output by the voice recognition model, and then the text data may be converted into original pinyin data corresponding to the voice data to be recognized by using a pinyin conversion tool. Optionally, a pinyin data recognition model for converting the voice data to be recognized into pinyin data can be trained, and after the problem information of the voice form is obtained, the voice data to be recognized is input into the trained pinyin data recognition model to obtain original pinyin data corresponding to the voice data to be recognized.
In one embodiment, when the voice data to be recognized is converted into the data in the character form, and then the data in the character form is converted into the data in the pinyin form, before the data in the character form is converted into the data in the pinyin form, the data in the character form can be generalized, entity words in the data in the character form are generalized, the generalized data in the character form is converted into the data in the pinyin form, and the original pinyin data corresponding to the voice data to be recognized is obtained. Illustratively, if the data in the text form corresponding to the voice data to be recognized is "when a mobile phone that is bought by me arrives", the data in the text form is generalized, an entity word "mobile phone" in the data is generalized to "prodport", the data in the generalized text form "when the prodport that is bought by me arrives" is obtained, the data in the generalized text form is converted into the data in the pinyin form, and the original pinyin data "wo mai de prodport name shi hou dao" corresponding to the voice data to be recognized is obtained.
In one embodiment, if the speech data to be recognized is converted into the original pinyin data through the trained pinyin data recognition model, the speech data to be recognized of the sample and the original pinyin data corresponding to the speech data to be recognized of the sample can be obtained in advance, a pinyin data recognition sample pair is formed based on the speech data to be recognized of the sample and the original pinyin data corresponding to the speech data to be recognized of the sample, and the pre-established speech recognition model is trained by using the pinyin data recognition sample pair to obtain the trained speech data recognition model.
And S120, correcting the original pinyin data to obtain pinyin data to be matched.
In order to simplify the correction process, the initial recognition result is corrected by the speech data in this embodiment, considering that the same pinyin data may represent different text data. Optionally, the correcting the original pinyin data may be: and correcting the wrong pinyin in the original pinyin data into standard pinyin. The high-frequency error-prone near-sound pinyin can be manually sorted in advance, the mapping relation between the error pinyin and the standard pinyin is sorted out, the sorted mapping relation is used as a pinyin near-sound table, and the original pinyin data is corrected based on the preset pinyin near-sound table.
In an embodiment of the present invention, the correcting the original pinyin data to obtain pinyin data to be matched includes: determining wrong pinyin contained in the original pinyin data as pinyin to be corrected according to a preset pinyin near-sound table; wherein, the pinyin near sound table stores the corresponding relation between at least one wrong pinyin and a standard pinyin; and correcting the pinyin to be corrected contained in the original pinyin data into the standard pinyin corresponding to the pinyin to be corrected, so as to obtain the pinyin data to be matched.
Optionally, traversing the original pinyin data, determining a pinyin to be corrected which is contained in the original pinyin data and is the same as a wrong pinyin in a preset pinyin near-sound table, determining a standard pinyin corresponding to the pinyin to be corrected according to the pinyin near-sound table, and correcting the pinyin to be corrected in the original pinyin data into the standard pinyin corresponding to the pinyin to be corrected. Illustratively, if the original pinyin data is 'wo de PRODSORT dao la le', the error pinyin contained in the original pinyin data is determined to be 'la' by searching a preset pinyin nearness table, and the standard pinyin corresponding to the error pinyin is 'na', the 'la' in the original pinyin data is taken as the pinyin to be corrected, and the 'la' is corrected to the corresponding standard pinyin 'na', so that the pinyin data 'wo PRODSORT dao na le' to be matched is obtained.
S130, matching the pinyin data to be matched with a pre-established standard pinyin sequence, and determining text data corresponding to the voice data to be recognized according to a matching result.
In this embodiment, whether the pinyin data to be corrected is corrected accurately is determined by the standard pinyin sequence. Optionally, high-frequency error-prone sentences including high-frequency error-prone near-phonetic alphabets can be manually sorted in advance, standard descriptions of the high-frequency error-prone sentences are generalized and converted into data in a pinyin format, standard pinyin data of the high-frequency error-prone sentences are obtained, and a standard pinyin sequence composed of pinyin nodes is constructed based on the standard pinyin data of the high-frequency error-prone sentences.
In one embodiment, the pinyin data to be matched and the pre-established standard pinyin sequence can be matched, if the target standard pinyin sequence matched with the pinyin data to be matched can be matched, the correction of the original pinyin data is accurate, and the text data corresponding to the target standard pinyin sequence is used as the text data corresponding to the voice data to be recognized. And if the target standard pinyin sequence matched with the pinyin data to be matched cannot be matched, the correction of the original pinyin data is inaccurate, and the text data corresponding to the original pinyin data is taken as the text data corresponding to the voice data to be recognized.
The embodiment of the invention determines the original pinyin data corresponding to the voice data to be recognized by acquiring the voice data to be recognized; correcting the original pinyin data to obtain pinyin data to be matched; and matching the pinyin data to be matched with a pre-established standard pinyin sequence, determining text data corresponding to the voice data to be recognized according to a matching result, correcting the original pinyin data, and recognizing based on the corrected voice data, so that the accuracy of voice recognition is improved, and further the response accuracy of the voice intelligent customer service is improved.
Example two
Fig. 2 is a flowchart of a speech recognition method according to a second embodiment of the present invention. The present embodiment is optimized based on the above embodiments. As shown in fig. 2, the method includes:
s210, voice data to be recognized are obtained, and original pinyin data corresponding to the voice data to be recognized are determined.
S220, correcting the original pinyin data to obtain pinyin data to be matched.
And S230, determining a matching node in the pinyin data to be matched, and matching the matching node with a standard pinyin node in a standard pinyin sequence to determine a target standard pinyin sequence matched with the pinyin data to be matched.
In this embodiment, the standard pinyin sequence is a sequence formed by standard pinyin nodes, and in order to match the pinyin data to be matched with the standard pinyin sequence, matching nodes in the pinyin data to be matched need to be determined, and the matching nodes in the pinyin data to be matched are sequentially matched with the matching nodes in the standard pinyin sequence according to the node data, so as to obtain a target standard pinyin sequence matched with the pinyin data to be matched. Optionally, a first standard pinyin node matched with a first matching node in the pinyin data to be matched may be determined, then a second matching node behind the first matching node is matched with a standard pinyin node connected with the first standard pinyin node to obtain a second standard pinyin node matched with the second matching node, the second standard pinyin node and the standard pinyin node are sequentially matched to obtain standard pinyin nodes matched with each matching node in the pinyin data to be matched, and a sequence formed by the standard pinyin nodes is used as a target standard pinyin sequence matched with the pinyin data to be matched.
Illustratively, the target standard pinyin sequence matched with the pinyin data to be matched can be obtained in a complete matching manner. Assuming that the pinyin data to be matched is 'wo mai de PRODSORT dao na le', determining matching nodes such as 'wo', 'mai', 'de', 'PRODSORT', 'dao', 'na' and 'le' in the pinyin data to be matched, sequentially matching the matching nodes with the standard pinyin nodes, and determining the standard pinyin nodes which are matched with the matching nodes and are connected with the standard pinyin node matched with the last matching node of the matching nodes. Determining a standard pinyin node 'wo' matched with the matching node 'wo'; a standard pinyin node "mai" that matches the matching node "mai" and is connected to the standard pinyin node "wo"; a standard pinyin node 'de' which is matched with the matching node 'de' and is connected with the standard pinyin node 'mai'; a standard pinyin node PRODSORT which is matched with the matching node PRODSORT and is connected with the standard pinyin node de; a standard pinyin node "dao" matching the matching node "dao" and connected to the standard pinyin node "prodport"; a standard pinyin node "na" that matches the matching node "na" and is connected to the standard pinyin node "dao"; the standard pinyin node "le" matched with the matching node "le" and connected with the standard pinyin node "na" takes a sequence "wo mai de prodscort dao na le" formed by the standard pinyin nodes "wo", "mai", "de", "prodscort", "dao", "na" and "le" as a target standard pinyin sequence.
In view of the fact that the pinyin data to be matched is lost in comparison with the standard pinyin sequence when the pinyin data to be matched is used for expressing habits or voice input, in this embodiment, a bidirectional matching algorithm may be adopted, and the target standard pinyin sequence matched with the pinyin data to be matched is determined in a complementary matching manner. In an embodiment of the present invention, the determining a matching node in the pinyin data to be matched, and determining a target standard pinyin sequence matching the pinyin data to be matched by matching the matching node with a standard pinyin node in the standard pinyin sequence includes: taking each pinyin in the pinyin data to be matched as a matching node; and matching the matching node with a standard pinyin node in the standard pinyin sequence by using a bidirectional matching algorithm, and obtaining the target standard pinyin sequence according to a matching result.
Optionally, the bidirectional matching algorithm may perform node matching in the forward direction and/or the reverse direction, and generally, the standard pinyin node matched with the matching node in the pinyin data to be matched is determined by the forward matching. Assuming that the matching nodes include a first matching node, a second matching node and a third matching node, and that a first standard pinyin node matching the first matching node, a second standard pinyin node matching the second matching node and connected to the first standard pinyin node, but not matching to a third matching node matching the third matching node and connected to the second matching node, are obtained by forward matching, all first candidate pinyin nodes connected to the second matching node in the standard pinyin sequence may be searched, then a third standard pinyin node matching the third matching node among the standard pinyin nodes may be searched, then it is determined whether a fourth standard pinyin node connected to the third standard pinyin node exists among the first candidate pinyin nodes, if a fourth standard pinyin node connected to the third standard pinyin node exists among the first candidate pinyin nodes, then a standard pinyin sequence consisting of a first standard pinyin node, a second standard pinyin node, a fourth standard pinyin node and a third standard pinyin node in sequence is used as a target standard pinyin sequence matched with the pinyin data to be matched; if the fourth standard pinyin node connected with the third standard pinyin node does not exist in the first candidate pinyin node, acquiring a second candidate pinyin node connected with the third standard pinyin node in the standard pinyin sequence, judging whether a second candidate pinyin node connected with the standard pinyin node in the first candidate pinyin node exists, and if the fifth standard pinyin node in the first candidate pinyin node is connected with the sixth standard pinyin node in the second candidate pinyin node, taking a standard pinyin sequence formed by the first standard pinyin node, the second standard pinyin node, the fifth standard pinyin node, the sixth standard pinyin node and the third standard pinyin node in sequence as a target standard pinyin sequence matched with the pinyin data to be matched.
In an embodiment of the present invention, the matching node with a standard pinyin node in the standard pinyin sequence using a bidirectional matching algorithm, and obtaining the target standard pinyin sequence according to a matching result, including: matching the matching node with the standard pinyin node by using the bidirectional matching algorithm to obtain at least one candidate standard pinyin sequence; aiming at each candidate standard pinyin sequence, determining a weight of the candidate standard pinyin sequence according to a sequence heat value of the candidate standard pinyin sequence and a pinyin heat value of each pinyin in the candidate standard pinyin sequence, wherein the sequence heat value is used for representing the use frequency of the standard pinyin sequence, and the pinyin heat value is used for representing the use frequency of the pinyin; and taking the candidate standard pinyin sequence with the maximum weight as the target standard pinyin sequence.
In this embodiment, there may be a plurality of obtained standard pinyin sequences matching the pinyin data to be matched, and the standard pinyin sequence with the largest weight may be used as the target standard pinyin sequence matching the pinyin data to be matched by calculating the weight of each standard pinyin sequence. For example, assuming that the standard pinyin sequences matched with the pinyin data to be matched include a candidate standard pinyin sequence 1, a candidate standard pinyin sequence 2 and a candidate standard pinyin sequence 3, and the weight of the candidate standard pinyin sequence 1 is 0.89, the weight of the candidate standard pinyin sequence 2 is 0.65, and the weight of the candidate standard pinyin sequence 3 is 0.78, the candidate standard pinyin sequence 1 with the largest weight is taken as the target standard pinyin sequence.
Optionally, after obtaining a plurality of candidate standard pinyin sequences matched with the pinyin data to be matched, calculating the weight of each candidate standard pinyin sequence according to the sequence heat value of the candidate standard pinyin sequence and the pinyin heat value of each pinyin in the candidate standard pinyin sequence aiming at each candidate standard pinyin sequence. The sequence heat value of the candidate standard pinyin sequence may be the number of times that the candidate standard pinyin sequence is taken as the target standard pinyin sequence, and the pinyin heat value of the pinyin may be the number of times that the pinyin exists in the target standard pinyin sequence. Because the pinyin heat value can represent the use frequency of pinyin, and the sequence heat value can represent the use frequency of a standard pinyin sequence, the weight of a candidate standard pinyin sequence calculated based on the pinyin heat value and the sequence heat value can accurately screen a target standard pinyin sequence matched with pinyin data to be matched.
In this embodiment, the weight of each pinyin in the candidate standard pinyin sequence may be first calculated, and the weight of the candidate standard pinyin sequence may be calculated based on the weight of each pinyin in the candidate standard pinyin sequence. Illustratively, one may consider the ratio of f (i) ((HW +1)/h (i) +1) ((1 + log)10(h (i) +1)) calculating a weight value of each pinyin in the candidate standard pinyin sequence, and calculating the weight value of the candidate standard pinyin sequence by W ═ F (1) × F (2) × … … F (n). Wherein, F (i) represents the weight of the ith pinyin in the candidate standard pinyin sequence, HW represents the sequence heat value of the candidate standard pinyin sequence, H (i) represents the pinyin heat value of the ith pinyin in the candidate standard pinyin sequence, W represents the weight of the candidate standard pinyin sequence, and n is the total number of the pinyins in the candidate standard pinyin sequence.
In this embodiment, after determining the target pinyin sequence, the method further includes: and updating the pinyin heat value of each pinyin in the target standard pinyin sequence and the sequence heat value of the target standard pinyin sequence. In order to accurately calculate the weight of the candidate standard pinyin sequence, after the target standard pinyin sequence is determined, the pinyin heat value of each pinyin in the target standard pinyin sequence and the sequence heat value of the target standard pinyin sequence need to be updated. Specifically, the pinyin heat value of each pinyin in the target standard pinyin sequence is added with 1, and the sequence heat value of the target standard pinyin sequence is added with 1.
S240, taking the text data corresponding to the target standard pinyin sequence as the text data corresponding to the voice data to be recognized.
And after the target standard pinyin sequence is determined, using the text data corresponding to the target standard pinyin sequence as the text data corresponding to the voice data to be recognized. Illustratively, if the target standard pinyin sequence is "wo mai de PRODSORT dao nale", the text data "where my bought PRODSORT goes" corresponding to the target standard pinyin sequence is taken as the text data corresponding to the voice data to be recognized.
In one embodiment of the present invention, the method further comprises: and if the target standard pinyin sequence matched with the pinyin data to be matched does not exist in the standard pinyin sequence, using the text data corresponding to the original pinyin data as the text data corresponding to the voice data to be recognized.
And if the target standard pinyin sequence matched with the pinyin data to be matched cannot be obtained in a complete matching mode or a supplementary matching mode, namely the target standard pinyin sequence matched with the pinyin data to be matched does not exist in the standard pinyin sequence, the correction of the original pinyin data is wrong, and the text data corresponding to the original pinyin data is used as the text data corresponding to the voice data to be recognized.
The technical scheme of the embodiment of the invention comprises the steps of matching pinyin data to be matched with a pre-established standard pinyin sequence, determining text data corresponding to the voice data to be recognized according to a matching result, determining a matching node in the pinyin data to be matched, and determining a target standard pinyin sequence matched with the pinyin data to be matched by matching the matching node with the standard pinyin node in the standard pinyin sequence; and the text data corresponding to the target standard pinyin sequence is used as the text data corresponding to the voice data to be recognized, so that the matching result is more accurate, and the voice recognition result is improved.
On the basis of the above scheme, after obtaining at least one candidate standard pinyin sequence and before determining a weight of the candidate standard pinyin sequence for each candidate standard pinyin sequence, the method further includes:
aiming at each candidate standard pinyin sequence, comparing the candidate standard pinyin sequence with the pinyin data to be matched, and determining a difference value between the candidate standard pinyin sequence and the pinyin data to be matched; and if the difference value between the candidate standard pinyin sequence and the pinyin data to be matched is greater than a preset difference threshold value, deleting the candidate standard pinyin sequence.
Optionally, the candidate standard pinyin sequence matched with the pinyin data to be matched, which is obtained in a bidirectional matching manner, may have a larger difference with the pinyin data to be matched, and after the candidate standard pinyin sequence matched with the pinyin data to be matched is obtained, the candidate standard pinyin sequence may be screened. Specifically, a difference threshold value may be preset, after the candidate standard pinyin sequence is obtained, a difference value between the candidate standard pinyin sequence and the pinyin sequence to be matched is calculated, and the candidate standard pinyin sequence whose difference value with the pinyin sequence to be matched is greater than the difference threshold value is deleted. Wherein, the difference threshold value can be set according to the actual requirement. Alternatively, the difference threshold may be 0.5.
Illustratively, if the candidate standard pinyin sequences include a candidate standard pinyin sequence 1, a candidate standard pinyin sequence 2 and a candidate standard pinyin sequence 3, the difference threshold is 0.5, the difference value between the candidate standard pinyin sequence 1 and the pinyin sequence to be matched is 0.4, the difference value between the candidate standard pinyin sequence 2 and the pinyin sequence to be matched is 0.5, and the difference value between the candidate standard pinyin sequence 3 and the pinyin sequence to be matched is 0.7, the candidate standard pinyin sequence 3 with the difference value between the candidate standard pinyin sequence 1 and the pinyin sequence to be matched being greater than the difference threshold 0.5 is deleted.
Optionally, the difference value between the candidate standard pinyin sequence and the pinyin sequence to be matched may be calculated by C ═ m/n. Wherein, C is the difference value between the candidate standard pinyin sequence and the pinyin sequence to be matched, m is the number of different pinyins between the candidate standard pinyin sequence and the pinyin sequence to be matched, and n is the total number of pinyins of the pinyin sequence to be matched.
EXAMPLE III
Fig. 3a is a flowchart of a speech recognition method according to a third embodiment of the present invention. On the basis of the above embodiments, the present embodiment provides a preferred embodiment by taking voice intelligent customer service as an example. In this embodiment, based on the speech recognition result, a non-directional retrieval map (standard pinyin sequence) is generated by using a manually-combed near-phonetic pinyin dictionary and an easily-incorrect sentence, so that the speech recognition result is dynamically corrected, and the purpose of correctly recognizing the user's intention is achieved. The voice recognition method provided by the embodiment of the invention can be executed by an intelligent customer service system. Fig. 3b is a schematic structural diagram of an intelligent customer service system according to a third embodiment of the present invention. As shown in fig. 3b, the intelligent customer service system includes a Speech Recognition module (ASR) 310, a Recognition correction module 320, a Natural Speech Processing (NLP) 340, and a Speech synthesis module (Text To Speech, TTS), wherein the Speech synthesis module is not shown in the figure. The voice intelligent customer service is mainly realized as follows: the voice of the user is converted into characters through the automatic voice recognition technology of the voice recognition module 310, then the characters are transmitted into the recognition correction module 320 to obtain character recognition results, the character recognition results are transmitted into the natural language processing module to be processed and responded, and finally the characters of the response text are converted into voice through the voice synthesis module to be broadcasted and responded. The recognition and correction module 320 comprises a near pinyin dictionary 331, an undirected graph matching module 332, an error-prone sentence undirected graph 333 and a hotlist 334 of sentences and words. The near-phonetic alphabet dictionary 331 is used for establishing a near-phonetic alphabet dictionary, the error-prone sentence undirected graph 333 is used for establishing an error-prone sentence undirected retrieval graph, the hotlist 334 of sentences and words is used for storing the hotlist of sentences and words, and the undirected graph matching module 332 is used for matching the correct pinyin with the error-prone sentence undirected graph to obtain a matching result.
Fig. 3c is a schematic flow chart of an intelligent customer service response method according to a third embodiment of the present invention. As shown in fig. 3c, configuring high-frequency error-prone sentences, generalizing the high-frequency error-prone sentences, converting the sentences into pinyin, initializing undirected graphs, initializing hottables, generating hottables of sentences and words, and generating undirected search graphs. When receiving the spoken voice information of the user, the voice information is converted into characters through ASR, the characters are generalized, sentences are converted into correct pinyin based on a pre-constructed near-phonetic pinyin dictionary, and then matching results matched with the correct pinyin are searched in an undirected graph. Specifically, whether all pinyins contained in the correct pinyin exist in the undirected graph or not is searched in the undirected graph, if all pinyins contained in the correct pinyin do not exist in the undirected graph, the original sentence is returned, so that the NLU module performs intention identification and character response according to the original sentence, and the character response is converted into voice response through TTS and then fed back to the user. If all the pinyin contained in the correct pinyin exists in the undirected graph, a bidirectional matching algorithm is used for matching, all matching sequences containing the correct pinyin are obtained, the difference value between each matching sequence and the correct pinyin is calculated, the matching sequences with the difference values larger than a set threshold value are deleted, the rest matching sequences are used as matching results, the weight of each matching result is calculated, the matching result with the highest weight is output to the NLU module, the NLU module performs intention identification and character response according to the matching result with the highest weight, and the character response is converted into voice response through TTS and then fed back to a user.
The speech recognition method provided by the present embodiment will be described in detail below. As shown in fig. 3a, the speech recognition method provided in this embodiment includes:
s310, establishing a near-phonetic alphabet dictionary.
And manually sorting the high-frequency error-prone near-phonetic alphabets to sort out the mapping relation between the common error-word pinyin and the correct near-phonetic pinyin, storing the mapping relation into a near-phonetic pinyin dictionary, and putting the mapping relation into a database. The mapping relation contained in the near-phonetic alphabet dictionary is exemplarily shown in table 1. As shown in table 1, the pinyin for common wrong words includes "La" and "Wang", the correct proximal pinyin corresponding to "La" is "Na", and the correct proximal pinyin corresponding to "Wang" is "Huang".
TABLE 1
Common wrong word pinyin | Correct near sound phonetic alphabet |
La | Na |
Wang | Huang |
And S320, establishing an undirected retrieval map of the error-prone sentence.
The standard description of the high-frequency error-prone sentence is combed out manually, and after the standard description is generalized, all characters are converted into pinyin and stored in an undirected graph, for example: the tinypinn tool is used to convert the text to pinyin. The corresponding relationship between the original sentence, the generalized text and the pinyin is exemplarily shown in table 2.
And after the original sentence is converted into pinyin, the pinyin of a single character is used as a node, and the spliced pinyin short sentences are used for constructing a undirected retrieval graph according to the forward sequence. Fig. 3d is a schematic diagram of an undirected search graph in a speech recognition method according to a third embodiment of the present invention. As shown in fig. 3d, adjacent pinyin nodes in the pinyin short sentence are connected to form a undirected search graph containing connection relations.
TABLE 2
S330, initializing a hotlist of sentences and words.
And initializing the heat of the error-prone sentence and the heat of a single character in the sentence. Alternatively, the initial value of the sentence heat value per sentence and the word heat value per word may be set to 0. The heat value of a word in a sentence and the way of representing the heat value of the sentence are schematically shown in table 3. As shown in table 3, the sentence "when my bought prodport arrives" has a heat value of 1, in which the word "i" has a heat value of 3, "buy" has a heat value of 2, "buy" has a heat value of 3, "prodport" has a heat value of 3, "sh" has a heat value of 2, "how" has a heat value of 2, "hour" has a heat value of 2, "wait" has a heat value of 2, and "go" has a heat value of 2.
TABLE 3
And S340, converting the user question into the correct pinyin.
And after generalization, converting the user question into pinyin, matching a near-sound word list and acquiring correct near-sound pinyin. For example: and converting the characters into pinyin by using a TinyPinyin tool, and matching a near-sound pinyin list to obtain correct pinyin. Table 4 schematically shows the correspondence between the original sentence, the generalized sentence, the pinyin, and the correct pinyin.
TABLE 4
And S350, searching the undirected search graph by adopting a bidirectional matching algorithm to obtain at least one matching result.
Firstly, ensuring that all pinyins exist in the undirected search graph, and if any pinyin does not exist in the undirected search graph, failing in matching and directly returning to an original character string; if the correct pinyin sequences exist, traversing the undirected search graph in sequence according to a bidirectional matching algorithm, returning all matching results in a complete matching and complementary matching mode, and replacing the matched pinyin with character sequences in the undirected search graph.
In one embodiment, if the correct pinyin to be matched is "wo de PRODSORT dao na le". Firstly, checking whether the pinyin in the correct pinyin to be matched completely exists in the undirected search graph or not, wherein the checking result is that the pinyin in the correct pinyin to be matched completely exists in the undirected search graph; then, the matching is started from the head "wo" and the tail "le", and both the forward direction and the backward direction are completely matched to the undirected graph pinyin sequence "wo-de-PRODSORT-dao-na-le". At this point, check again, any combination containing the current sequence, list other whole sentence likelihood results, such as "wo de prodscort dao na le", "wo main de prodscortdao na le", and return the corresponding literal sequence "where my prodscort goes", "where does i buy the prodscort"; finally, the matching results with more than 50% of the changes are filtered, and the changes of the two results are not more than 50% (0/6, 1/6), and all the results are returned.
In one embodiment, if the correct pinyin to be matched is "wo de PRODSORT shen fa". Firstly, checking whether the pinyin in the correct pinyin to be matched completely exists in the undirected search graph or not, wherein the checking result is that the pinyin in the correct pinyin to be matched completely exists in the undirected search graph; then, matching is started from the head "wo" and the tail "fa" to find a connected sequence, and fig. 3e is a schematic diagram of a bidirectional matching method in a speech recognition method provided by the third embodiment of the present invention. Wherein the solid one-way arrow represents the forward matching process, the dashed one-way line head represents the reverse matching process, and the solid two-way arrow represents the successful matching of the forward matching and the reverse matching. As shown in fig. 3e, the forward matching wo-de-prodport-shell-me-fa ends, and the reverse matching fa-me ends; at this time, the sequence of forward matching selects the next node from the graph to try to match wo-de-PRODSORT-when-me-shi, and the sequence of reverse matching selects the next node from the graph to try to match fa-hou, and as a result, paths are connected between shi and hou, so that the sequence of the undirected graph is completely matched to the sequence of the connected undirected graph: wo-de-PRODSORT-shen-me-shi-hou-fa; if the connected sequence does not exist, the original source character string is directly returned; at this time, it is judged again whether any combination containing the current sequence exists, and other whole sentences are obtained: "wo mail de PRODSORT form me shi hou fa", "wo mail depDSORT form me shi hou fa huo", "wo de PRODSORT form me shi hou fa", "wo depDSORT form me shi hou fa", and "wo depDSORT form me shi hou fa huo" are returned, and the words "when my PRODSORT bought", "when my PRODSORT ships" are issued "corresponding to the pinyin sequence are returned. And finally, filtering and changing the matching result with the change of more than 50%, wherein the last pinyin sequence in the result is changed to 4/6 and is more than 50%, deleting the pinyin sequence, and returning all other sequences.
When searching for the connected sequence through forward and reverse matching, the tried nodes respectively try two layers in the forward and reverse directions, and each layer possibly has a plurality of nodes, all combinations need to be exhausted, matching is carried out for many times, all connected sequences are found out, if all combinations are not completely matched after trying, matching is directly abandoned, and the original text is returned. It should be noted that, the level tried here sets a threshold value, and at most two levels are tried, so that too many levels affect the matching performance; such as "shi" and "hou" in the above example, if there is no communication path, the next level nodes of "shi" and "hou" will not be further tried.
And S360, calculating the whole sentence weight of the matching result, and taking the matching result with the maximum weight as an output sentence.
May be expressed by f (i) ((HW +1)/h (i) +1) ((1 + log)10(h (i) +1)) calculating a weight value of each pinyin in the candidate standard pinyin sequence, and calculating the weight value of the candidate standard pinyin sequence by W ═ F (1) × F (2) × … … F (n). Wherein, F (i) represents the weight of the ith pinyin in the candidate standard pinyin sequence, HW represents the sequence heat value of the candidate standard pinyin sequence, H (i) represents the pinyin heat value of the ith pinyin in the candidate standard pinyin sequence, W represents the weight of the candidate standard pinyin sequence, and n is the total number of the pinyins in the candidate standard pinyin sequence.
TABLE 5
Table 5 schematically shows the correspondence between the original sentence, the matched sentence, the whole sentence weight calculation, and the output sentence. As shown in table 5, the matching sentence corresponding to the original sentence "my progress to pull" includes "my progress to which" and "my progress to which", "my progress to which" have a weight of 0.848, and "my progress to which" have a weight of 0.006, and then "my progress to which" is taken as its corresponding output sentence, and the answer mode is determined based on the output sentence. And S370, updating the word heat and the sentence heat.
And adding one to all the word heat degrees in the output sentence, and adding one to the heat degree of the output sentence. Table 6 schematically shows the updated heat degree table of the word heat degree and the sentence heat degree.
TABLE 6
Aiming at the problem that the user intention is wrong due to the fact that a phonetic-to-text word has a near-phonetic word recognition error and a word missing recognition error, a manually-configurable undirected search graph of high-frequency error-prone sentences is added, the problem of high-frequency speech recognition errors is solved through undirected graph bidirectional matching and a mode that matching results are sorted based on weights, the problem of changing after-sale scene requirements is met, the problem that training periods of speech recognition models in the prior art are long is solved, the embodiment can achieve the purpose of dynamic adaptation through adjustment of high-frequency error-prone sentence configuration, error correction of the high-frequency error-prone sentences is achieved, the purpose of improving the user intention recognition accuracy and recall rate is achieved, and user experience is improved.
Example four
Fig. 4 is a schematic structural diagram of a speech recognition apparatus according to a fourth embodiment of the present invention. The speech recognition means may be implemented in software and/or hardware, for example, the speech recognition means may be configured in a computer device. As shown in fig. 4, the apparatus includes a pinyin data acquisition module 410, a pinyin data calibration module 420, and a text data determination module 430, wherein:
a pinyin data obtaining module 410, configured to obtain voice data to be recognized, and determine original pinyin data corresponding to the voice data to be recognized;
a pinyin data calibration module 420, configured to correct the original pinyin data to obtain pinyin data to be matched;
and the text data determining module 430 is configured to match the pinyin data to be matched with a pre-established standard pinyin sequence, and determine text data corresponding to the voice data to be recognized according to a matching result.
The embodiment of the invention obtains the voice data to be recognized through a pinyin data obtaining module, and determines the original pinyin data corresponding to the voice data to be recognized; the pinyin data calibration module corrects the original pinyin data to obtain pinyin data to be matched; the text data determining module matches the pinyin data to be matched with a pre-established standard pinyin sequence, determines text data corresponding to the voice data to be recognized according to a matching result, corrects the original pinyin data, and recognizes based on the corrected voice data, so that the accuracy of voice recognition is improved, and further the response accuracy of the voice intelligent customer service is improved.
On the basis of the above scheme, the pinyin data calibration module 420 is specifically configured to:
determining wrong pinyin contained in the original pinyin data as pinyin to be corrected according to a preset pinyin near-sound table; wherein, the pinyin near sound table stores the corresponding relation between at least one wrong pinyin and a standard pinyin;
and correcting the pinyin to be corrected contained in the original pinyin data into the standard pinyin corresponding to the pinyin to be corrected, so as to obtain the pinyin data to be matched.
On the basis of the above scheme, the text data determining module 430 includes:
the target sequence determining unit is used for determining a matching node in the pinyin data to be matched and determining a target standard pinyin sequence matched with the pinyin data to be matched by matching the matching node with a standard pinyin node in the standard pinyin sequence;
and the text data determining unit is used for taking the text data corresponding to the target standard pinyin sequence as the text data corresponding to the voice data to be recognized.
On the basis of the above scheme, the target sequence determining unit includes:
a matching node determining subunit, configured to use each pinyin in the pinyin data to be matched as a matching node;
and the bidirectional matching subunit is used for matching the matching node with the standard pinyin node in the standard pinyin sequence by using a bidirectional matching algorithm and obtaining the target standard pinyin sequence according to a matching result.
On the basis of the above scheme, the bidirectional matching subunit is specifically configured to:
matching the matching node with the standard pinyin node by using the bidirectional matching algorithm to obtain at least one candidate standard pinyin sequence;
aiming at each candidate standard pinyin sequence, determining a weight of the candidate standard pinyin sequence according to a sequence heat value of the candidate standard pinyin sequence and a pinyin heat value of each pinyin in the candidate standard pinyin sequence, wherein the sequence heat value is used for representing the use frequency of the standard pinyin sequence, and the pinyin heat value is used for representing the use frequency of the pinyin;
and taking the candidate standard pinyin sequence with the maximum weight as the target standard pinyin sequence.
On the basis of the above scheme, the bidirectional matching subunit is further configured to:
after obtaining at least one candidate standard pinyin sequence and before determining the weight of the candidate standard pinyin sequence for each candidate standard pinyin sequence, comparing the candidate standard pinyin sequence with the pinyin data to be matched for each candidate standard pinyin sequence, and determining the difference value between the candidate standard pinyin sequence and the pinyin data to be matched;
and if the difference value between the candidate standard pinyin sequence and the pinyin data to be matched is greater than a preset difference threshold value, deleting the candidate standard pinyin sequence.
On the basis of the above scheme, the text data determining module 430 is further configured to:
and if the target standard pinyin sequence matched with the pinyin data to be matched does not exist in the standard pinyin sequence, using the text data corresponding to the original pinyin data as the text data corresponding to the voice data to be recognized.
On the basis of the above scheme, the apparatus further comprises:
and the heat value updating module is used for updating the pinyin heat value of each pinyin in the target standard pinyin sequence and the sequence heat value of the target standard pinyin sequence.
The voice recognition device provided by the embodiment of the invention can execute the voice recognition method provided by any embodiment, and has the corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary computer device 512 suitable for use in implementing embodiments of the present invention. The computer device 512 shown in FIG. 5 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 5, computer device 512 is in the form of a general purpose computing device. Components of computer device 512 may include, but are not limited to: one or more processors 516, a system memory 528, and a bus 518 that couples the various system components including the system memory 528 and the processors 516.
The system memory 528 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)530 and/or cache memory 532. The computer device 512 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 534 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD ROM, DVD ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 518 through one or more data media interfaces. Memory 528 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 540 having a set (at least one) of program modules 542, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored in, for example, the memory 528, each of which examples or some combination may include an implementation of a network environment. The program modules 542 generally perform the functions and/or methods of the described embodiments of the invention.
The computer device 512 may also communicate with one or more external devices 514 (e.g., keyboard, pointing device, display 524, etc.), with one or more devices that enable a user to interact with the computer device 512, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 512 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 522. Also, computer device 512 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 520. As shown, the network adapter 520 communicates with the other modules of the computer device 512 via the bus 518. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the computer device 512, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 516 executes various functional applications and data processing by executing programs stored in the system memory 528, for example, implementing a voice recognition method provided by an embodiment of the present invention, the method includes:
acquiring voice data to be recognized, and determining original pinyin data corresponding to the voice data to be recognized;
correcting the original pinyin data to obtain pinyin data to be matched;
and matching the pinyin data to be matched with a pre-established standard pinyin sequence, and determining text data corresponding to the voice data to be recognized according to a matching result.
Of course, those skilled in the art will understand that the processor may also implement the technical solution of the speech recognition method provided by any embodiment of the present invention.
EXAMPLE six
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a speech recognition method provided in an embodiment of the present invention, where the method includes:
acquiring voice data to be recognized, and determining original pinyin data corresponding to the voice data to be recognized;
correcting the original pinyin data to obtain pinyin data to be matched;
and matching the pinyin data to be matched with a pre-established standard pinyin sequence, and determining text data corresponding to the voice data to be recognized according to a matching result.
Of course, the computer program stored on the computer-readable storage medium provided by the embodiment of the present invention is not limited to the method operations described above, and may also perform related operations in the speech recognition method provided by any embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (11)
1. A speech recognition method, comprising:
acquiring voice data to be recognized, and determining original pinyin data corresponding to the voice data to be recognized;
correcting the original pinyin data to obtain pinyin data to be matched;
and matching the pinyin data to be matched with a pre-established standard pinyin sequence, and determining text data corresponding to the voice data to be recognized according to a matching result.
2. The method of claim 1, wherein the correcting the original pinyin data to obtain pinyin data to be matched comprises:
determining wrong pinyin contained in the original pinyin data as pinyin to be corrected according to a preset pinyin near-sound table; wherein, the pinyin near sound table stores the corresponding relation between at least one wrong pinyin and a standard pinyin;
and correcting the pinyin to be corrected contained in the original pinyin data into the standard pinyin corresponding to the pinyin to be corrected, so as to obtain the pinyin data to be matched.
3. The method as claimed in claim 1, wherein the matching the pinyin data to be matched with a pre-established standard pinyin sequence, and determining text data corresponding to the voice data to be recognized according to the matching result comprises:
determining a matching node in the pinyin data to be matched, and determining a target standard pinyin sequence matched with the pinyin data to be matched by matching the matching node with a standard pinyin node in the standard pinyin sequence;
and taking the text data corresponding to the target standard pinyin sequence as the text data corresponding to the voice data to be recognized.
4. The method as claimed in claim 3, wherein the determining the matching node in the pinyin data to be matched, and the determining the target pinyin sequence that matches the pinyin data to be matched by matching the matching node with a pinyin node that is a standard in the pinyin sequence, comprises:
taking each pinyin in the pinyin data to be matched as a matching node;
and matching the matching node with a standard pinyin node in the standard pinyin sequence by using a bidirectional matching algorithm, and obtaining the target standard pinyin sequence according to a matching result.
5. The method as claimed in claim 4, wherein the using a bidirectional matching algorithm to match the matching node with a standard pinyin node in the standard pinyin sequence and obtain the target standard pinyin sequence according to the matching result includes:
matching the matching node with the standard pinyin node by using the bidirectional matching algorithm to obtain at least one candidate standard pinyin sequence;
aiming at each candidate standard pinyin sequence, determining a weight of the candidate standard pinyin sequence according to a sequence heat value of the candidate standard pinyin sequence and a pinyin heat value of each pinyin in the candidate standard pinyin sequence, wherein the sequence heat value is used for representing the use frequency of the standard pinyin sequence, and the pinyin heat value is used for representing the use frequency of the pinyin;
and taking the candidate standard pinyin sequence with the maximum weight as the target standard pinyin sequence.
6. The method of claim 5, wherein after obtaining at least one candidate standard pinyin sequence and before determining a weight for the candidate standard pinyin sequence for each of the candidate standard pinyin sequences, further comprising:
aiming at each candidate standard pinyin sequence, comparing the candidate standard pinyin sequence with the pinyin data to be matched, and determining a difference value between the candidate standard pinyin sequence and the pinyin data to be matched;
and if the difference value between the candidate standard pinyin sequence and the pinyin data to be matched is greater than a preset difference threshold value, deleting the candidate standard pinyin sequence.
7. The method of claim 3, further comprising:
and if the target standard pinyin sequence matched with the pinyin data to be matched does not exist in the standard pinyin sequence, using the text data corresponding to the original pinyin data as the text data corresponding to the voice data to be recognized.
8. The method of claim 5, further comprising:
and updating the pinyin heat value of each pinyin in the target standard pinyin sequence and the sequence heat value of the target standard pinyin sequence.
9. A speech recognition apparatus, comprising:
the pinyin data acquisition module is used for acquiring the voice data to be recognized and determining original pinyin data corresponding to the voice data to be recognized;
the pinyin data calibration module is used for correcting the original pinyin data to obtain pinyin data to be matched;
and the text data determining module is used for matching the pinyin data to be matched with a pre-established standard pinyin sequence and determining text data corresponding to the voice data to be recognized according to a matching result.
10. A computer device, the device comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a speech recognition method as recited in any of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the speech recognition method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910710043.1A CN111739514B (en) | 2019-07-31 | 2019-07-31 | Voice recognition method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910710043.1A CN111739514B (en) | 2019-07-31 | 2019-07-31 | Voice recognition method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111739514A true CN111739514A (en) | 2020-10-02 |
CN111739514B CN111739514B (en) | 2023-11-14 |
Family
ID=72645844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910710043.1A Active CN111739514B (en) | 2019-07-31 | 2019-07-31 | Voice recognition method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111739514B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112417102A (en) * | 2020-11-26 | 2021-02-26 | 中国科学院自动化研究所 | Voice query method, device, server and readable storage medium |
CN112509566A (en) * | 2020-12-22 | 2021-03-16 | 北京百度网讯科技有限公司 | Voice recognition method, device, equipment, storage medium and program product |
CN112651854A (en) * | 2020-12-23 | 2021-04-13 | 讯飞智元信息科技有限公司 | Voice scheduling method and device, electronic equipment and storage medium |
CN112767923A (en) * | 2021-01-05 | 2021-05-07 | 上海微盟企业发展有限公司 | Voice recognition method and device |
CN113129894A (en) * | 2021-04-12 | 2021-07-16 | 阿波罗智联(北京)科技有限公司 | Speech recognition method, speech recognition device, electronic device and storage medium |
CN113158649A (en) * | 2021-05-27 | 2021-07-23 | 广州广电运通智能科技有限公司 | Error correction method, equipment, medium and product for subway station name recognition |
CN113744722A (en) * | 2021-09-13 | 2021-12-03 | 上海交通大学宁波人工智能研究院 | Off-line speech recognition matching device and method for limited sentence library |
CN117116267A (en) * | 2023-10-24 | 2023-11-24 | 科大讯飞股份有限公司 | Speech recognition method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006026908A1 (en) * | 2004-08-25 | 2006-03-16 | Dong Li | A chinese characters inputting method which uses continuous phonetic letters in a portable terminal |
CN101067780A (en) * | 2007-06-21 | 2007-11-07 | 腾讯科技(深圳)有限公司 | Character inputting system and method for intelligent equipment |
CN103377652A (en) * | 2012-04-25 | 2013-10-30 | 上海智臻网络科技有限公司 | Method, device and equipment for carrying out voice recognition |
CN105489220A (en) * | 2015-11-26 | 2016-04-13 | 小米科技有限责任公司 | Method and device for recognizing speech |
CN109597983A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | A kind of spelling error correction method and device |
CN109992765A (en) * | 2017-12-29 | 2019-07-09 | 北京京东尚科信息技术有限公司 | Text error correction method and device, storage medium and electronic equipment |
CN110019650A (en) * | 2018-09-04 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method, apparatus, storage medium and the electronic equipment of search associational word are provided |
-
2019
- 2019-07-31 CN CN201910710043.1A patent/CN111739514B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006026908A1 (en) * | 2004-08-25 | 2006-03-16 | Dong Li | A chinese characters inputting method which uses continuous phonetic letters in a portable terminal |
CN101067780A (en) * | 2007-06-21 | 2007-11-07 | 腾讯科技(深圳)有限公司 | Character inputting system and method for intelligent equipment |
CN103377652A (en) * | 2012-04-25 | 2013-10-30 | 上海智臻网络科技有限公司 | Method, device and equipment for carrying out voice recognition |
CN105489220A (en) * | 2015-11-26 | 2016-04-13 | 小米科技有限责任公司 | Method and device for recognizing speech |
CN109597983A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | A kind of spelling error correction method and device |
CN109992765A (en) * | 2017-12-29 | 2019-07-09 | 北京京东尚科信息技术有限公司 | Text error correction method and device, storage medium and electronic equipment |
CN110019650A (en) * | 2018-09-04 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method, apparatus, storage medium and the electronic equipment of search associational word are provided |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112417102A (en) * | 2020-11-26 | 2021-02-26 | 中国科学院自动化研究所 | Voice query method, device, server and readable storage medium |
CN112417102B (en) * | 2020-11-26 | 2024-03-22 | 中国科学院自动化研究所 | Voice query method, device, server and readable storage medium |
CN112509566A (en) * | 2020-12-22 | 2021-03-16 | 北京百度网讯科技有限公司 | Voice recognition method, device, equipment, storage medium and program product |
CN112509566B (en) * | 2020-12-22 | 2024-03-19 | 阿波罗智联(北京)科技有限公司 | Speech recognition method, device, equipment, storage medium and program product |
CN112651854A (en) * | 2020-12-23 | 2021-04-13 | 讯飞智元信息科技有限公司 | Voice scheduling method and device, electronic equipment and storage medium |
CN112767923A (en) * | 2021-01-05 | 2021-05-07 | 上海微盟企业发展有限公司 | Voice recognition method and device |
CN113129894A (en) * | 2021-04-12 | 2021-07-16 | 阿波罗智联(北京)科技有限公司 | Speech recognition method, speech recognition device, electronic device and storage medium |
CN113158649A (en) * | 2021-05-27 | 2021-07-23 | 广州广电运通智能科技有限公司 | Error correction method, equipment, medium and product for subway station name recognition |
CN113744722A (en) * | 2021-09-13 | 2021-12-03 | 上海交通大学宁波人工智能研究院 | Off-line speech recognition matching device and method for limited sentence library |
CN117116267A (en) * | 2023-10-24 | 2023-11-24 | 科大讯飞股份有限公司 | Speech recognition method and device, electronic equipment and storage medium |
CN117116267B (en) * | 2023-10-24 | 2024-02-13 | 科大讯飞股份有限公司 | Speech recognition method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111739514B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111739514B (en) | Voice recognition method, device, equipment and medium | |
US11322153B2 (en) | Conversation interaction method, apparatus and computer readable storage medium | |
CN110210029B (en) | Method, system, device and medium for correcting error of voice text based on vertical field | |
CN108170749B (en) | Dialog method, device and computer readable medium based on artificial intelligence | |
US10504010B2 (en) | Systems and methods for fast novel visual concept learning from sentence descriptions of images | |
CN107301865B (en) | Method and device for determining interactive text in voice input | |
CN109036391B (en) | Voice recognition method, device and system | |
CN1542736B (en) | Rules-based grammar for slots and statistical model for preterminals in natural language understanding system | |
TWI508057B (en) | Speech recognition system and method | |
CN109325091B (en) | Method, device, equipment and medium for updating attribute information of interest points | |
CN112036162B (en) | Text error correction adaptation method and device, electronic equipment and storage medium | |
CN112100354B (en) | Man-machine conversation method, device, equipment and storage medium | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
CN110415679B (en) | Voice error correction method, device, equipment and storage medium | |
CN111444329A (en) | Intelligent conversation method and device and electronic equipment | |
CN109299471B (en) | Text matching method, device and terminal | |
CN111382260A (en) | Method, device and storage medium for correcting retrieved text | |
CN110674255A (en) | Text content auditing method and device | |
CN112861521B (en) | Speech recognition result error correction method, electronic device and storage medium | |
WO2014036827A1 (en) | Text correcting method and user equipment | |
CN108595412B (en) | Error correction processing method and device, computer equipment and readable medium | |
TWI752406B (en) | Speech recognition method, speech recognition device, electronic equipment, computer-readable storage medium and computer program product | |
CN112214595A (en) | Category determination method, device, equipment and medium | |
CN112765985A (en) | Named entity identification method for specific field patent embodiment | |
CN113268452B (en) | Entity extraction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |