CN109903766A - Far field voice instruction recognition method and device - Google Patents

Far field voice instruction recognition method and device Download PDF

Info

Publication number
CN109903766A
CN109903766A CN201910237263.7A CN201910237263A CN109903766A CN 109903766 A CN109903766 A CN 109903766A CN 201910237263 A CN201910237263 A CN 201910237263A CN 109903766 A CN109903766 A CN 109903766A
Authority
CN
China
Prior art keywords
result
matching
far field
phonetic alphabet
chinese phonetic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910237263.7A
Other languages
Chinese (zh)
Other versions
CN109903766B (en
Inventor
邱建
王兴
佟彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING BEYOND TECHNOLOGY Co Ltd
Original Assignee
BEIJING BEYOND TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING BEYOND TECHNOLOGY Co Ltd filed Critical BEIJING BEYOND TECHNOLOGY Co Ltd
Priority to CN201910237263.7A priority Critical patent/CN109903766B/en
Publication of CN109903766A publication Critical patent/CN109903766A/en
Application granted granted Critical
Publication of CN109903766B publication Critical patent/CN109903766B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to a kind of far field voice instruction recognition methods, comprising: step 1: obtaining phonetic order signal in far field to be identified;Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, obtains final recognition result.The instruction that the present invention can not correctly not identify far field speech control system is converted into the control instruction that can be accurately identified by respective algorithms, improves instruction identification rate, enhances user experience, can be applied to the speech recognitions interaction scenarios such as smart space.

Description

Far field voice instruction recognition method and device
Technical field
The invention belongs to far field technical field of voice recognition, and in particular to a kind of far field voice instruction recognition method and dress It sets.
Background technique
Voice technology is widely used in recent years as currently more popular man-machine interaction mode in each of smart field A aspect.With the continuous development of technology, voice control technology is also constantly progressive.For more previous control, due to voice control System can remove manually controlling for operator from, more convenient, thus make the use of voice control technology more extensive.Wherein, Based on needing speech recognition technology due to the premise of voice control, the development of speech recognition technology as a result, also gradually Paid attention to by personage in field.The difference of the distance between position and phonetic incepting equipment is issued based on voice, usual voice is known Other technology is divided into far field speech recognition and near field voice identifies two kinds, since far field speech recognition can be realized longer-distance language The identification of sound instruction, this allows for the attention of far field speech recognition technology more technical specialist.
Currently, when user carries out far field voice control, since existing far field audio recognition method wake-up word and control refer to Enable it is relatively fixed, change it is less, error rate is higher, and especially higher in voice command control scene error rate, user experience is poor. Therefore, how to realize a kind of speech recognition correcting method and device that can accurately correct phonetic control command, become this field Interior urgent problem to be solved.
Summary of the invention
The purpose of the present invention is to provide a kind of speech recognition correction method and devices, to solve far field phonetic control command The low problem of recognition accuracy.
The present invention provides a kind of far field voice instruction recognition methods, comprising:
Step 1: phonetic order signal in far field to be identified is obtained;
Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;
Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, is obtained most Whole recognition result.
Further, the step 3 includes:
The decoding result of step 2 is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, Matching result is then directly returned, and terminates matching process.
Further, the step 3 further include:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch Point, third level matching is carried out after sliding block word for word cutting by number of words.
Further, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
Further, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than Threshold value H, then directly returning should be as a result, and terminates subsequent match.
The present invention also provides a kind of far field phonetic order identification devices, comprising:
Voice obtains module, for obtaining far field phonetic order signal;
Speech recognition module is decoded for being decoded based on speech recognition engine to the phonetic order signal As a result;
Matching module is obtained for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model Final recognition result.
Further, the matching module executes following operation:
The decoding result is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;And
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, Matching result is then directly returned, and terminates matching process.
Further, the matching module also executes following operation:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch Point, third level matching is carried out after sliding block word for word cutting by number of words.
Further, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
Further, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than Threshold value H, then directly returning should be as a result, and terminates subsequent match.
Compared with prior art the beneficial effects of the present invention are: the finger that can not correctly do not identify far field speech control system It enables and is converted into the control instruction that can be accurately identified by respective algorithms, improve instruction identification rate, enhance user experience, it can Applied to the speech recognitions interaction scenarios such as smart space.
Detailed description of the invention
Fig. 1 is the flow chart that phonetic control command identification of the present invention is corrected;
Fig. 2 is the structural block diagram of phonetic control command identification correcting device of the present invention.
Specific embodiment
The present invention is described in detail for each embodiment shown in reference to the accompanying drawing, but it should be stated that, these Embodiment is not limitation of the present invention, those of ordinary skill in the art according to these embodiments made by function, method, Or equivalent transformation or substitution in structure, all belong to the scope of protection of the present invention within.
Join shown in Fig. 1, present embodiments provide a kind of far field voice instruction recognition method, comprising:
Step S1: phonetic order signal in far field to be identified is obtained;Microphone in voice command control scene can such as be obtained The phonetic order signal of output.It is generally acknowledged that sound source is when being much larger than signal wavelength with a distance from microphone array center reference point Far field is 3 meters or more in the usual value of field of speech recognition.
Step S2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;
Step S3: instructions match is carried out to the decoding result of step S2 based on the Chinese phonetic alphabet and context model, is obtained most Whole recognition result.
By the far field voice instruction recognition method, the instruction that can not correctly do not identify far field speech control system passes through Respective algorithms are converted into the control instruction that can be accurately identified, and improving instruction identification rate (can be by far field language control instruction Discrimination improves 15%-20% or so), user experience is enhanced, can be applied to the speech recognitions interaction scenarios such as smart space.
In the present embodiment, step S3 includes:
The decoding result of step S2 is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, Matching result is then directly returned, and terminates matching process.
In the present embodiment, step S3 further include:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch Point, third level matching is carried out after sliding block word for word cutting by number of words.Wherein, for several words, two or more are set Word as a sliding block, only a sliding word carries out word for word cutting every time.
In the present embodiment, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
In the present embodiment, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than Threshold value H, then directly returning should be as a result, and terminates subsequent match.If the score of recognition result illustrates this still less than threshold value H Recognition result is with a low credibility, not as final recognition result.
Join shown in Fig. 2, the present embodiment additionally provides a kind of far field phonetic order identification device, comprising:
Voice obtains module 10, for obtaining far field phonetic order signal;
Speech recognition module 20 is solved for being decoded based on speech recognition engine to the phonetic order signal Code result;
Matching module 30 is obtained for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model To final recognition result.
By the far field phonetic order identification device, the instruction that can not correctly do not identify far field speech control system passes through Respective algorithms are converted into the control instruction that can be accurately identified, and improving instruction identification rate (can be by far field language control instruction Discrimination improves 15%-20% or so), user experience is enhanced, can be applied to the speech recognitions interaction scenarios such as smart space.
In the present embodiment, the matching module executes following operation:
The decoding result is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;And
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, Matching result is then directly returned, and terminates matching process.
In the present embodiment, the matching module also executes following operation:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch Point, third level matching is carried out after sliding block word for word cutting by number of words.Wherein, for several words, two or more are set Word as a sliding block, only a sliding word carries out word for word cutting every time.
In the present embodiment, the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
In the present embodiment, the third level matching further include:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than Threshold value H, then directly returning should be as a result, and terminates subsequent match.If the score of recognition result illustrates this still less than threshold value H Recognition result is with a low credibility, not as final recognition result.
The series of detailed descriptions listed above only for feasible embodiment of the invention specifically Protection scope bright, that they are not intended to limit the invention, it is all without departing from equivalent implementations made by technical spirit of the present invention Or change should all be included in the protection scope of the present invention.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.

Claims (10)

1. a kind of far field voice instruction recognition method characterized by comprising
Step 1: phonetic order signal in far field to be identified is obtained;
Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result;
Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, is finally known Other result.
2. far field voice instruction recognition method according to claim 1, which is characterized in that the step 3 includes:
The decoding result of step 2 is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, directly Return matching result is connect, and terminates matching process.
3. far field voice instruction recognition method according to claim 2, which is characterized in that the step 3 further include:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will be described Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the second level batch carries out cutting with not succeeding, by the Chinese character number of words in the decoding result reference object instruction set, It is progress third level matching after sliding block word for word cutting by number of words.
4. far field voice instruction recognition method according to claim 3, which is characterized in that the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity mode, Matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
5. far field voice instruction recognition method according to claim 4, which is characterized in that the third level matching is also wrapped It includes:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the mould of the recognition result and upper one progress similarity mode It pastes voice and forms context, carry out similarity mode with the fuzzy pinyin library, if the score of the recognition result is greater than threshold value H, then directly returning should be as a result, and terminates subsequent match.
6. a kind of far field phonetic order identification device characterized by comprising
Voice obtains module, for obtaining far field phonetic order signal;
Speech recognition module obtains decoding result for being decoded based on speech recognition engine to the phonetic order signal;
Matching module obtains final for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model Recognition result.
7. phonetic order identification device in far field according to claim 6, which is characterized in that the matching module executes following Operation:
The decoding result is converted into the Chinese phonetic alphabet;
All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library;And
The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, directly Return matching result is connect, and terminates matching process.
8. phonetic order identification device in far field according to claim 7, which is characterized in that the matching module also execute with Lower operation:
If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will be described Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching;
If second level successful match, matching result is directly returned to, and terminate matching process;
If the second level batch carries out cutting with not succeeding, by the Chinese character number of words in the decoding result reference object instruction set, It is progress third level matching after sliding block word for word cutting by number of words.
9. phonetic order identification device in far field according to claim 8, which is characterized in that the third level, which matches, includes:
Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity mode, Matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.
10. phonetic order identification device in far field according to claim 9, which is characterized in that the third level matching is also wrapped It includes:
If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match;
If the score of the recognition result is less than threshold value H, by the mould of the recognition result and upper one progress similarity mode It pastes voice and forms context, carry out similarity mode with the fuzzy pinyin library, if the score of the recognition result is greater than threshold value H, then directly returning should be as a result, and terminates subsequent match.
CN201910237263.7A 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device Active CN109903766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910237263.7A CN109903766B (en) 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910237263.7A CN109903766B (en) 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device

Publications (2)

Publication Number Publication Date
CN109903766A true CN109903766A (en) 2019-06-18
CN109903766B CN109903766B (en) 2021-06-04

Family

ID=66953549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910237263.7A Active CN109903766B (en) 2019-03-27 2019-03-27 Far-field voice instruction recognition method and device

Country Status (1)

Country Link
CN (1) CN109903766B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577548A (en) * 2013-10-12 2014-02-12 优视科技有限公司 Method and device for matching characters with close pronunciation
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
US20160180835A1 (en) * 2014-12-23 2016-06-23 Nice-Systems Ltd User-aided adaptation of a phonetic dictionary
CN106953959A (en) * 2017-04-18 2017-07-14 深圳和家园网络科技有限公司 A kind of dialing method of telephone matched based on phonetic

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103577548A (en) * 2013-10-12 2014-02-12 优视科技有限公司 Method and device for matching characters with close pronunciation
US20160180835A1 (en) * 2014-12-23 2016-06-23 Nice-Systems Ltd User-aided adaptation of a phonetic dictionary
CN106953959A (en) * 2017-04-18 2017-07-14 深圳和家园网络科技有限公司 A kind of dialing method of telephone matched based on phonetic

Also Published As

Publication number Publication date
CN109903766B (en) 2021-06-04

Similar Documents

Publication Publication Date Title
KR102648306B1 (en) Speech recognition error correction method, related devices, and readable storage medium
US11848008B2 (en) Artificial intelligence-based wakeup word detection method and apparatus, device, and medium
CN107301865B (en) Method and device for determining interactive text in voice input
CN111145728B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN103971685B (en) Method and system for recognizing voice commands
US20180158449A1 (en) Method and device for waking up via speech based on artificial intelligence
US9280969B2 (en) Model training for automatic speech recognition from imperfect transcription data
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN101727901B (en) Method for recognizing Chinese-English bilingual voice of embedded system
US9330665B2 (en) Automatic updating of confidence scoring functionality for speech recognition systems with respect to a receiver operating characteristic curve
CN110148399A (en) A kind of control method of smart machine, device, equipment and medium
CN111539199B (en) Text error correction method, device, terminal and storage medium
CN113380239B (en) Training method of voice recognition model, voice recognition method, device and equipment
JP6875819B2 (en) Acoustic model input data normalization device and method, and voice recognition device
CN112002311A (en) Text error correction method and device, computer readable storage medium and terminal equipment
CN111883137A (en) Text processing method and device based on voice recognition
US9542939B1 (en) Duration ratio modeling for improved speech recognition
CN104240698A (en) Voice recognition method
CN110600029A (en) User-defined awakening method and device for intelligent voice equipment
CN113838452A (en) Speech synthesis method, apparatus, device and computer storage medium
CN113380229A (en) Voice response speed determination method, related device and computer program product
CN115104151A (en) Offline voice recognition method and device, electronic equipment and readable storage medium
CN109903766A (en) Far field voice instruction recognition method and device
CN104424942A (en) Method for improving character speed input accuracy
CN113129869B (en) Method and device for training and recognizing voice recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant