CN109903766B - Far-field voice instruction recognition method and device - Google Patents

Far-field voice instruction recognition method and device Download PDF

Info

Publication number
CN109903766B
CN109903766B CN201910237263.7A CN201910237263A CN109903766B CN 109903766 B CN109903766 B CN 109903766B CN 201910237263 A CN201910237263 A CN 201910237263A CN 109903766 B CN109903766 B CN 109903766B
Authority
CN
China
Prior art keywords
matching
result
pinyin
fuzzy
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910237263.7A
Other languages
Chinese (zh)
Other versions
CN109903766A (en
Inventor
邱建
王兴
佟彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aotewei Technology Co ltd
Original Assignee
Beijing Aotewei Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aotewei Technology Co ltd filed Critical Beijing Aotewei Technology Co ltd
Priority to CN201910237263.7A priority Critical patent/CN109903766B/en
Publication of CN109903766A publication Critical patent/CN109903766A/en
Application granted granted Critical
Publication of CN109903766B publication Critical patent/CN109903766B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a far-field voice instruction recognition method, which comprises the following steps: the method comprises the following steps: acquiring a far-field voice instruction signal to be recognized; step two: decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result; step three: and performing instruction matching on the decoding result obtained in the step two based on the Chinese pinyin and the context model to obtain a final recognition result. The method can convert the command which is not correctly recognized by the far-field voice control system into the control command which can be accurately recognized through the corresponding algorithm, improves the command recognition rate, enhances the user experience, and can be applied to voice recognition interaction scenes such as intelligent space and the like.

Description

Far-field voice instruction recognition method and device
Technical Field
The invention belongs to the technical field of far-field voice recognition, and particularly relates to a far-field voice instruction recognition method and device.
Background
As a current popular man-machine interaction mode, the voice technology is widely applied to various aspects in the field of intelligence in recent years. With the continuous development of the technology, the voice control technology is also continuously advanced. Compared with the prior art, the voice control method has the advantages that the voice control method can be used conveniently without manual control of an operator, so that the use of the voice control technology is more extensive. Among them, the premise of voice control is that a voice recognition technology is needed as a basis, and thus, the development of the voice recognition technology is gradually emphasized by those in the art. Based on the difference between the distance between the voice emitting position and the voice receiving device, the voice recognition technology is generally divided into far-field voice recognition and near-field voice recognition, and the far-field voice recognition can realize the recognition of a far-distance voice instruction, so that the far-field voice recognition technology is more regarded by technical experts.
At present, when a user performs far-field voice control, because the existing far-field voice recognition method has relatively fixed awakening words and control instructions, the change is less, the error rate is higher, particularly, the error rate is higher in a voice instruction control scene, and the user experience is poorer. Therefore, how to implement a speech recognition correction method and apparatus capable of accurately correcting a speech control command becomes a problem to be solved in the art.
Disclosure of Invention
The invention aims to provide a voice recognition correction method and a voice recognition correction device, which aim to solve the problem of low accuracy of far-field voice control instruction recognition.
The invention provides a far-field voice instruction recognition method, which comprises the following steps:
the method comprises the following steps: acquiring a far-field voice instruction signal to be recognized;
step two: decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;
step three: and performing instruction matching on the decoding result obtained in the step two based on the Chinese pinyin and the context model to obtain a final recognition result.
Further, the third step includes:
converting the decoding result of the second step into Chinese pinyin;
converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library;
and performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process.
Further, the third step further includes:
if the first-stage batch matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching;
if the second-stage matching is successful, directly returning a matching result and finishing the matching process;
and if the second-level batch matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-level matching.
Further, the third stage matching includes:
and converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library to obtain a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result.
Further, the third-level matching further comprises:
if the score of the recognition result is larger than the threshold value H, directly returning the result, and finishing the subsequent matching;
and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching.
The invention also provides a far-field voice instruction recognition device, which comprises:
the voice acquisition module is used for acquiring a far-field voice instruction signal;
the voice recognition module is used for decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;
and the matching module is used for performing instruction matching on the decoding result based on the Chinese pinyin and the context model to obtain a final recognition result.
Further, the matching module performs the following operations:
converting the decoding result into Chinese pinyin;
converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library; and
and performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process.
Further, the matching module performs the following operations:
if the first-stage batch matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching;
if the second-stage matching is successful, directly returning a matching result and finishing the matching process;
and if the second-level batch matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-level matching.
Further, the third stage matching includes:
and converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library to obtain a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result.
Further, the third-level matching further comprises:
if the score of the recognition result is larger than the threshold value H, directly returning the result, and finishing the subsequent matching;
and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching.
Compared with the prior art, the invention has the beneficial effects that: the instruction which is not correctly identified by the far-field voice control system can be converted into the control instruction which can be accurately identified through a corresponding algorithm, so that the instruction identification rate is improved, the user experience is enhanced, and the method can be applied to voice identification interaction scenes such as an intelligent space.
Drawings
FIG. 1 is a flow chart of the speech control command recognition correction of the present invention;
FIG. 2 is a block diagram of the voice control command recognition correcting device of the present invention.
Detailed Description
The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.
Referring to fig. 1, the present embodiment provides a far-field speech instruction recognition method, including:
step S1: acquiring a far-field voice instruction signal to be recognized; such as a voice command signal output by a microphone in a voice command control scenario. Generally, the distance between a sound source and a central reference point of a microphone array is far field when the distance is far larger than the signal wavelength, and the value is usually more than 3 meters in the field of speech recognition.
Step S2: decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;
step S3: and performing instruction matching on the decoding result obtained in the step S2 based on the Chinese pinyin and the context model to obtain a final recognition result.
By the far-field voice instruction recognition method, an instruction which is not correctly recognized by a far-field voice control system can be converted into a control instruction which can be accurately recognized through a corresponding algorithm, the instruction recognition rate is improved (the far-field language control instruction recognition rate can be improved by about 15% -20%), the user experience is enhanced, and the method can be applied to voice recognition interaction scenes such as a smart space.
In the present embodiment, step S3 includes:
converting the decoding result of the step S2 into Chinese pinyin;
converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library;
and performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process.
In this embodiment, step S3 further includes:
if the first-stage batch matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching;
if the second-stage matching is successful, directly returning a matching result and finishing the matching process;
and if the second-level batch matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-level matching. Wherein, aiming at a plurality of characters, two or more characters are set as a sliding block, and only one character is slid each time to perform the character-by-character segmentation.
In this embodiment, the third-stage matching includes:
and converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library to obtain a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result.
In this embodiment, the third-stage matching further includes:
if the score of the recognition result is larger than the threshold value H, directly returning the result, and finishing the subsequent matching;
and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching. If the score of the recognition result is still smaller than the threshold H, the recognition result is not considered as the final recognition result because the reliability of the recognition result is low.
Referring to fig. 2, the present embodiment further provides a far-field speech instruction recognition apparatus, including:
the voice acquisition module 10 is used for acquiring a far-field voice instruction signal;
the voice recognition module 20 is configured to decode the voice instruction signal based on a voice recognition engine to obtain a decoding result;
and the matching module 30 is used for performing instruction matching on the decoding result based on the pinyin and the context model to obtain a final recognition result.
Through the far-field voice command recognition device, an incorrectly recognized command of a far-field voice control system can be converted into a control command which can be recognized accurately through a corresponding algorithm, the command recognition rate is improved (the recognition rate of the far-field language control command can be improved by about 15% -20%), the user experience is enhanced, and the device can be applied to voice recognition interaction scenes such as a smart space.
In this embodiment, the matching module performs the following operations:
converting the decoding result into Chinese pinyin;
converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library; and
and performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process.
In this embodiment, the matching module further performs the following operations:
if the first-stage batch matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching;
if the second-stage matching is successful, directly returning a matching result and finishing the matching process;
and if the second-level batch matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-level matching. Wherein, aiming at a plurality of characters, two or more characters are set as a sliding block, and only one character is slid each time to perform the character-by-character segmentation.
In this embodiment, the third-stage matching includes:
and converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library to obtain a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result.
In this embodiment, the third-stage matching further includes:
if the score of the recognition result is larger than the threshold value H, directly returning the result, and finishing the subsequent matching;
and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching. If the score of the recognition result is still smaller than the threshold H, the recognition result is not considered as the final recognition result because the reliability of the recognition result is low.
The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (2)

1. A far-field speech command recognition method, comprising:
the method comprises the following steps: acquiring a far-field voice instruction signal to be recognized;
step two: decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;
step three: performing instruction matching on the decoding result of the step two based on the Chinese pinyin, the fuzzy pinyin and the context model to obtain a final recognition result;
the third step comprises:
converting the decoding result of the second step into Chinese pinyin;
converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library;
performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process;
if the first-stage matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching; if the second-stage matching is successful, directly returning a matching result and finishing the matching process;
if the second-stage matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-stage matching;
wherein the third stage matching comprises:
converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library, obtaining a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result;
if the score of the recognition result is larger than the threshold value H, directly returning the result, and finishing the subsequent matching;
and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching.
2. A far-field speech command recognition apparatus, comprising:
the voice acquisition module is used for acquiring a far-field voice instruction signal;
the voice recognition module is used for decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;
the matching module is used for performing instruction matching on the decoding result based on the Chinese pinyin, the fuzzy pinyin and the context model to obtain a final identification result;
the matching module performs the following operations:
converting the decoding result into Chinese pinyin;
converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library; and
performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process;
the matching module further performs the following operations:
if the first-stage matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching;
if the second-stage matching is successful, directly returning a matching result and finishing the matching process;
if the second-stage matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-stage matching;
wherein the third stage matching comprises:
converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library, obtaining a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result;
if the score of the recognition result is larger than the threshold value H, directly returning the result, and finishing the subsequent matching;
and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching.
CN201910237263.7A 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device Active CN109903766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910237263.7A CN109903766B (en) 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910237263.7A CN109903766B (en) 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device

Publications (2)

Publication Number Publication Date
CN109903766A CN109903766A (en) 2019-06-18
CN109903766B true CN109903766B (en) 2021-06-04

Family

ID=66953549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910237263.7A Active CN109903766B (en) 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device

Country Status (1)

Country Link
CN (1) CN109903766B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577548A (en) * 2013-10-12 2014-02-12 优视科技有限公司 Method and device for matching characters with close pronunciation
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN106953959A (en) * 2017-04-18 2017-07-14 深圳和家园网络科技有限公司 A kind of dialing method of telephone matched based on phonetic

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9922643B2 (en) * 2014-12-23 2018-03-20 Nice Ltd. User-aided adaptation of a phonetic dictionary

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103577548A (en) * 2013-10-12 2014-02-12 优视科技有限公司 Method and device for matching characters with close pronunciation
CN106953959A (en) * 2017-04-18 2017-07-14 深圳和家园网络科技有限公司 A kind of dialing method of telephone matched based on phonetic

Also Published As

Publication number Publication date
CN109903766A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
KR102648306B1 (en) Speech recognition error correction method, related devices, and readable storage medium
US10332507B2 (en) Method and device for waking up via speech based on artificial intelligence
CN111145728B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN110473531B (en) Voice recognition method, device, electronic equipment, system and storage medium
KR102371188B1 (en) Apparatus and method for speech recognition, and electronic device
CN107301865B (en) Method and device for determining interactive text in voice input
US20190005946A1 (en) Method and apparatus for correcting speech recognition result, device and computer-readable storage medium
CN102682763B (en) Method, device and terminal for correcting named entity vocabularies in voice input text
CN110838289A (en) Awakening word detection method, device, equipment and medium based on artificial intelligence
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
US10714076B2 (en) Initialization of CTC speech recognition with standard HMM
KR20010054622A (en) Method increasing recognition rate in voice recognition system
CN110473527B (en) Method and system for voice recognition
CN113948066B (en) Error correction method, system, storage medium and device for real-time translation text
CN111161726B (en) Intelligent voice interaction method, device, medium and system
CN111401259B (en) Model training method, system, computer readable medium and electronic device
CN102945673A (en) Continuous speech recognition method with speech command range changed dynamically
WO2023193394A1 (en) Voice wake-up model training method and apparatus, voice wake-up method and apparatus, device and storage medium
US9542939B1 (en) Duration ratio modeling for improved speech recognition
CN111754981A (en) Command word recognition method and system using mutual prior constraint model
CN109933773A (en) A kind of multiple semantic sentence analysis system and method
WO2022083165A1 (en) Transformer-based automatic speech recognition system incorporating time-reduction layer
CN109903766B (en) Far-field voice instruction recognition method and device
CN115104151A (en) Offline voice recognition method and device, electronic equipment and readable storage medium
WO2023155676A1 (en) Method and apparatus for processing translation model, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant