CN109903766A - Far field voice instruction recognition method and device - Google Patents
Far field voice instruction recognition method and device Download PDFInfo
- Publication number
- CN109903766A CN109903766A CN201910237263.7A CN201910237263A CN109903766A CN 109903766 A CN109903766 A CN 109903766A CN 201910237263 A CN201910237263 A CN 201910237263A CN 109903766 A CN109903766 A CN 109903766A
- Authority
- CN
- China
- Prior art keywords
- result
- matching
- far field
- phonetic alphabet
- chinese phonetic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The present invention relates to a kind of far field voice instruction recognition methods, comprising: step 1: obtaining phonetic order signal in far field to be identified;Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, obtains final recognition result.The instruction that the present invention can not correctly not identify far field speech control system is converted into the control instruction that can be accurately identified by respective algorithms, improves instruction identification rate, enhances user experience, can be applied to the speech recognitions interaction scenarios such as smart space.
Description
Technical field
The invention belongs to far field technical field of voice recognition, and in particular to a kind of far field voice instruction recognition method and dress
It sets.
Background technique
Voice technology is widely used in recent years as currently more popular man-machine interaction mode in each of smart field
A aspect.With the continuous development of technology, voice control technology is also constantly progressive.For more previous control, due to voice control
System can remove manually controlling for operator from, more convenient, thus make the use of voice control technology more extensive.Wherein,
Based on needing speech recognition technology due to the premise of voice control, the development of speech recognition technology as a result, also gradually
Paid attention to by personage in field.The difference of the distance between position and phonetic incepting equipment is issued based on voice, usual voice is known
Other technology is divided into far field speech recognition and near field voice identifies two kinds, since far field speech recognition can be realized longer-distance language
The identification of sound instruction, this allows for the attention of far field speech recognition technology more technical specialist.
Currently, when user carries out far field voice control, since existing far field audio recognition method wake-up word and control refer to
Enable it is relatively fixed, change it is less, error rate is higher, and especially higher in voice command control scene error rate, user experience is poor.
Therefore, how to realize a kind of speech recognition correcting method and device that can accurately correct phonetic control command, become this field
Interior urgent problem to be solved.
Summary of the invention
The purpose of the present invention is to provide a kind of speech recognition correction method and devices, to solve far field phonetic control command
The low problem of recognition accuracy.
The present invention provides a kind of far field voice instruction recognition methods, comprising:
Step 1: phonetic order signal in far field to be identified is obtained;
Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;
Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, is obtained most
Whole recognition result.
Further, the step 3 includes:
The decoding result of step 2 is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match,
Matching result is then directly returned, and terminates matching process.
Further, the step 3 further include:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will
The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch
Point, third level matching is carried out after sliding block word for word cutting by number of words.
Further, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity
Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
Further, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode
Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than
Threshold value H, then directly returning should be as a result, and terminates subsequent match.
The present invention also provides a kind of far field phonetic order identification devices, comprising:
Voice obtains module, for obtaining far field phonetic order signal;
Speech recognition module is decoded for being decoded based on speech recognition engine to the phonetic order signal
As a result;
Matching module is obtained for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model
Final recognition result.
Further, the matching module executes following operation:
The decoding result is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;And
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match,
Matching result is then directly returned, and terminates matching process.
Further, the matching module also executes following operation:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will
The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch
Point, third level matching is carried out after sliding block word for word cutting by number of words.
Further, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity
Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
Further, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode
Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than
Threshold value H, then directly returning should be as a result, and terminates subsequent match.
Compared with prior art the beneficial effects of the present invention are: the finger that can not correctly do not identify far field speech control system
It enables and is converted into the control instruction that can be accurately identified by respective algorithms, improve instruction identification rate, enhance user experience, it can
Applied to the speech recognitions interaction scenarios such as smart space.
Detailed description of the invention
Fig. 1 is the flow chart that phonetic control command identification of the present invention is corrected;
Fig. 2 is the structural block diagram of phonetic control command identification correcting device of the present invention.
Specific embodiment
The present invention is described in detail for each embodiment shown in reference to the accompanying drawing, but it should be stated that, these
Embodiment is not limitation of the present invention, those of ordinary skill in the art according to these embodiments made by function, method,
Or equivalent transformation or substitution in structure, all belong to the scope of protection of the present invention within.
Join shown in Fig. 1, present embodiments provide a kind of far field voice instruction recognition method, comprising:
Step S1: phonetic order signal in far field to be identified is obtained;Microphone in voice command control scene can such as be obtained
The phonetic order signal of output.It is generally acknowledged that sound source is when being much larger than signal wavelength with a distance from microphone array center reference point
Far field is 3 meters or more in the usual value of field of speech recognition.
Step S2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;
Step S3: instructions match is carried out to the decoding result of step S2 based on the Chinese phonetic alphabet and context model, is obtained most
Whole recognition result.
By the far field voice instruction recognition method, the instruction that can not correctly do not identify far field speech control system passes through
Respective algorithms are converted into the control instruction that can be accurately identified, and improving instruction identification rate (can be by far field language control instruction
Discrimination improves 15%-20% or so), user experience is enhanced, can be applied to the speech recognitions interaction scenarios such as smart space.
In the present embodiment, step S3 includes:
The decoding result of step S2 is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match,
Matching result is then directly returned, and terminates matching process.
In the present embodiment, step S3 further include:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will
The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch
Point, third level matching is carried out after sliding block word for word cutting by number of words.Wherein, for several words, two or more are set
Word as a sliding block, only a sliding word carries out word for word cutting every time.
In the present embodiment, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity
Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
In the present embodiment, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode
Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than
Threshold value H, then directly returning should be as a result, and terminates subsequent match.If the score of recognition result illustrates this still less than threshold value H
Recognition result is with a low credibility, not as final recognition result.
Join shown in Fig. 2, the present embodiment additionally provides a kind of far field phonetic order identification device, comprising:
Voice obtains module 10, for obtaining far field phonetic order signal;
Speech recognition module 20 is solved for being decoded based on speech recognition engine to the phonetic order signal
Code result;
Matching module 30 is obtained for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model
To final recognition result.
By the far field phonetic order identification device, the instruction that can not correctly do not identify far field speech control system passes through
Respective algorithms are converted into the control instruction that can be accurately identified, and improving instruction identification rate (can be by far field language control instruction
Discrimination improves 15%-20% or so), user experience is enhanced, can be applied to the speech recognitions interaction scenarios such as smart space.
In the present embodiment, the matching module executes following operation:
The decoding result is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;And
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match,
Matching result is then directly returned, and terminates matching process.
In the present embodiment, the matching module also executes following operation:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will
The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch
Point, third level matching is carried out after sliding block word for word cutting by number of words.Wherein, for several words, two or more are set
Word as a sliding block, only a sliding word carries out word for word cutting every time.
In the present embodiment, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity
Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
In the present embodiment, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode
Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than
Threshold value H, then directly returning should be as a result, and terminates subsequent match.If the score of recognition result illustrates this still less than threshold value H
Recognition result is with a low credibility, not as final recognition result.
The series of detailed descriptions listed above only for feasible embodiment of the invention specifically
Protection scope bright, that they are not intended to limit the invention, it is all without departing from equivalent implementations made by technical spirit of the present invention
Or change should all be included in the protection scope of the present invention.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included within the present invention.
Claims (10)
1. a kind of far field voice instruction recognition method characterized by comprising
Step 1: phonetic order signal in far field to be identified is obtained;
Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;
Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, is finally known
Other result.
2. far field voice instruction recognition method according to claim 1, which is characterized in that the step 3 includes:
The decoding result of step 2 is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, directly
Return matching result is connect, and terminates matching process.
3. far field voice instruction recognition method according to claim 2, which is characterized in that the step 3 further include:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will be described
Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the second level batch carries out cutting with not succeeding, by the Chinese character number of words in the decoding result reference object instruction set,
It is progress third level matching after sliding block word for word cutting by number of words.
4. far field voice instruction recognition method according to claim 3, which is characterized in that the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity mode,
Matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
5. far field voice instruction recognition method according to claim 4, which is characterized in that the third level matching is also wrapped
It includes:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the mould of the recognition result and upper one progress similarity mode
It pastes voice and forms context, carry out similarity mode with the fuzzy pinyin library, if the score of the recognition result is greater than threshold value
H, then directly returning should be as a result, and terminates subsequent match.
6. a kind of far field phonetic order identification device characterized by comprising
Voice obtains module, for obtaining far field phonetic order signal;
Speech recognition module obtains decoding result for being decoded based on speech recognition engine to the phonetic order signal;
Matching module obtains final for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model
Recognition result.
7. phonetic order identification device in far field according to claim 6, which is characterized in that the matching module executes following
Operation:
The decoding result is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;And
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, directly
Return matching result is connect, and terminates matching process.
8. phonetic order identification device in far field according to claim 7, which is characterized in that the matching module also execute with
Lower operation:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will be described
Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the second level batch carries out cutting with not succeeding, by the Chinese character number of words in the decoding result reference object instruction set,
It is progress third level matching after sliding block word for word cutting by number of words.
9. phonetic order identification device in far field according to claim 8, which is characterized in that the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity mode,
Matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
10. phonetic order identification device in far field according to claim 9, which is characterized in that the third level matching is also wrapped
It includes:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the mould of the recognition result and upper one progress similarity mode
It pastes voice and forms context, carry out similarity mode with the fuzzy pinyin library, if the score of the recognition result is greater than threshold value
H, then directly returning should be as a result, and terminates subsequent match.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910237263.7A CN109903766B (en) | 2019-03-27 | 2019-03-27 | Far-field voice instruction recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910237263.7A CN109903766B (en) | 2019-03-27 | 2019-03-27 | Far-field voice instruction recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109903766A true CN109903766A (en) | 2019-06-18 |
CN109903766B CN109903766B (en) | 2021-06-04 |
Family
ID=66953549
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910237263.7A Active CN109903766B (en) | 2019-03-27 | 2019-03-27 | Far-field voice instruction recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109903766B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577548A (en) * | 2013-10-12 | 2014-02-12 | 优视科技有限公司 | Method and device for matching characters with close pronunciation |
CN103903619A (en) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving accuracy of speech recognition |
US20160180835A1 (en) * | 2014-12-23 | 2016-06-23 | Nice-Systems Ltd | User-aided adaptation of a phonetic dictionary |
CN106953959A (en) * | 2017-04-18 | 2017-07-14 | 深圳和家园网络科技有限公司 | A kind of dialing method of telephone matched based on phonetic |
-
2019
- 2019-03-27 CN CN201910237263.7A patent/CN109903766B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103903619A (en) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving accuracy of speech recognition |
CN103577548A (en) * | 2013-10-12 | 2014-02-12 | 优视科技有限公司 | Method and device for matching characters with close pronunciation |
US20160180835A1 (en) * | 2014-12-23 | 2016-06-23 | Nice-Systems Ltd | User-aided adaptation of a phonetic dictionary |
CN106953959A (en) * | 2017-04-18 | 2017-07-14 | 深圳和家园网络科技有限公司 | A kind of dialing method of telephone matched based on phonetic |
Also Published As
Publication number | Publication date |
---|---|
CN109903766B (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102648306B1 (en) | Speech recognition error correction method, related devices, and readable storage medium | |
US11848008B2 (en) | Artificial intelligence-based wakeup word detection method and apparatus, device, and medium | |
CN107301865B (en) | Method and device for determining interactive text in voice input | |
CN111145728B (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
CN103971685B (en) | Method and system for recognizing voice commands | |
US20180158449A1 (en) | Method and device for waking up via speech based on artificial intelligence | |
US9280969B2 (en) | Model training for automatic speech recognition from imperfect transcription data | |
CN108710704B (en) | Method and device for determining conversation state, electronic equipment and storage medium | |
CN101727901B (en) | Method for recognizing Chinese-English bilingual voice of embedded system | |
US9330665B2 (en) | Automatic updating of confidence scoring functionality for speech recognition systems with respect to a receiver operating characteristic curve | |
CN110148399A (en) | A kind of control method of smart machine, device, equipment and medium | |
CN111539199B (en) | Text error correction method, device, terminal and storage medium | |
CN113380239B (en) | Training method of voice recognition model, voice recognition method, device and equipment | |
JP6875819B2 (en) | Acoustic model input data normalization device and method, and voice recognition device | |
CN112002311A (en) | Text error correction method and device, computer readable storage medium and terminal equipment | |
CN111883137A (en) | Text processing method and device based on voice recognition | |
US9542939B1 (en) | Duration ratio modeling for improved speech recognition | |
CN104240698A (en) | Voice recognition method | |
CN110600029A (en) | User-defined awakening method and device for intelligent voice equipment | |
CN113838452A (en) | Speech synthesis method, apparatus, device and computer storage medium | |
CN113380229A (en) | Voice response speed determination method, related device and computer program product | |
CN115104151A (en) | Offline voice recognition method and device, electronic equipment and readable storage medium | |
CN109903766A (en) | Far field voice instruction recognition method and device | |
CN104424942A (en) | Method for improving character speed input accuracy | |
CN113129869B (en) | Method and device for training and recognizing voice recognition model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |