CN109903766A

CN109903766A - Far field voice instruction recognition method and device

Info

Publication number: CN109903766A
Application number: CN201910237263.7A
Authority: CN
Inventors: 邱建; 王兴; 佟彤
Original assignee: BEIJING BEYOND TECHNOLOGY Co Ltd
Current assignee: BEIJING BEYOND TECHNOLOGY Co Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2019-06-18
Anticipated expiration: 2039-03-27
Also published as: CN109903766B

Abstract

The present invention relates to a kind of far field voice instruction recognition methods, comprising: step 1: obtaining phonetic order signal in far field to be identified；Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result；Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, obtains final recognition result.The instruction that the present invention can not correctly not identify far field speech control system is converted into the control instruction that can be accurately identified by respective algorithms, improves instruction identification rate, enhances user experience, can be applied to the speech recognitions interaction scenarios such as smart space.

Description

Far field voice instruction recognition method and device

Technical field

The invention belongs to far field technical field of voice recognition, and in particular to a kind of far field voice instruction recognition method and dress It sets.

Background technique

Voice technology is widely used in recent years as currently more popular man-machine interaction mode in each of smart field A aspect.With the continuous development of technology, voice control technology is also constantly progressive.For more previous control, due to voice control System can remove manually controlling for operator from, more convenient, thus make the use of voice control technology more extensive.Wherein, Based on needing speech recognition technology due to the premise of voice control, the development of speech recognition technology as a result, also gradually Paid attention to by personage in field.The difference of the distance between position and phonetic incepting equipment is issued based on voice, usual voice is known Other technology is divided into far field speech recognition and near field voice identifies two kinds, since far field speech recognition can be realized longer-distance language The identification of sound instruction, this allows for the attention of far field speech recognition technology more technical specialist.

Currently, when user carries out far field voice control, since existing far field audio recognition method wake-up word and control refer to Enable it is relatively fixed, change it is less, error rate is higher, and especially higher in voice command control scene error rate, user experience is poor. Therefore, how to realize a kind of speech recognition correcting method and device that can accurately correct phonetic control command, become this field Interior urgent problem to be solved.

Summary of the invention

The purpose of the present invention is to provide a kind of speech recognition correction method and devices, to solve far field phonetic control command The low problem of recognition accuracy.

The present invention provides a kind of far field voice instruction recognition methods, comprising:

Step 1: phonetic order signal in far field to be identified is obtained；

Step 2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result；

Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, is obtained most Whole recognition result.

Further, the step 3 includes:

The decoding result of step 2 is converted into the Chinese phonetic alphabet；

All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library；

The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, Matching result is then directly returned, and terminates matching process.

Further, the step 3 further include:

If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will The Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching；

If second level successful match, matching result is directly returned to, and terminate matching process；

If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch Point, third level matching is carried out after sliding block word for word cutting by number of words.

Further, the third level, which matches, includes:

Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity Match, matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.

Further, the third level matching further include:

If the score of the recognition result is greater than threshold value H, directly returning should be as a result, and terminates subsequent match；

If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than Threshold value H, then directly returning should be as a result, and terminates subsequent match.

The present invention also provides a kind of far field phonetic order identification devices, comprising:

Voice obtains module, for obtaining far field phonetic order signal；

Speech recognition module is decoded for being decoded based on speech recognition engine to the phonetic order signal As a result；

Matching module is obtained for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model Final recognition result.

Further, the matching module executes following operation:

The decoding result is converted into the Chinese phonetic alphabet；

All target instruction target words that target instruction target word is concentrated are converted into the Chinese phonetic alphabet, obtain Chinese phonetic alphabet library；And

Further, the matching module also executes following operation:

Further, the third level, which matches, includes:

Further, the third level matching further include:

Compared with prior art the beneficial effects of the present invention are: the finger that can not correctly do not identify far field speech control system It enables and is converted into the control instruction that can be accurately identified by respective algorithms, improve instruction identification rate, enhance user experience, it can Applied to the speech recognitions interaction scenarios such as smart space.

Detailed description of the invention

Fig. 1 is the flow chart that phonetic control command identification of the present invention is corrected；

Fig. 2 is the structural block diagram of phonetic control command identification correcting device of the present invention.

Specific embodiment

The present invention is described in detail for each embodiment shown in reference to the accompanying drawing, but it should be stated that, these Embodiment is not limitation of the present invention, those of ordinary skill in the art according to these embodiments made by function, method, Or equivalent transformation or substitution in structure, all belong to the scope of protection of the present invention within.

Join shown in Fig. 1, present embodiments provide a kind of far field voice instruction recognition method, comprising:

Step S1: phonetic order signal in far field to be identified is obtained；Microphone in voice command control scene can such as be obtained The phonetic order signal of output.It is generally acknowledged that sound source is when being much larger than signal wavelength with a distance from microphone array center reference point Far field is 3 meters or more in the usual value of field of speech recognition.

Step S2: the phonetic order signal is decoded based on speech recognition engine, obtains decoding result；

Step S3: instructions match is carried out to the decoding result of step S2 based on the Chinese phonetic alphabet and context model, is obtained most Whole recognition result.

By the far field voice instruction recognition method, the instruction that can not correctly do not identify far field speech control system passes through Respective algorithms are converted into the control instruction that can be accurately identified, and improving instruction identification rate (can be by far field language control instruction Discrimination improves 15%-20% or so), user experience is enhanced, can be applied to the speech recognitions interaction scenarios such as smart space.

In the present embodiment, step S3 includes:

The decoding result of step S2 is converted into the Chinese phonetic alphabet；

In the present embodiment, step S3 further include:

If the Chinese character number of words in the decoding result reference object instruction set is cut with not succeeding in the second level batch Point, third level matching is carried out after sliding block word for word cutting by number of words.Wherein, for several words, two or more are set Word as a sliding block, only a sliding word carries out word for word cutting every time.

In the present embodiment, the third level, which matches, includes:

In the present embodiment, the third level matching further include:

If the score of the recognition result is less than threshold value H, by the recognition result and upper progress similarity mode Fuzzy voice form context, with the fuzzy pinyin library carry out similarity mode, if the score of the recognition result is greater than Threshold value H, then directly returning should be as a result, and terminates subsequent match.If the score of recognition result illustrates this still less than threshold value H Recognition result is with a low credibility, not as final recognition result.

Join shown in Fig. 2, the present embodiment additionally provides a kind of far field phonetic order identification device, comprising:

Voice obtains module 10, for obtaining far field phonetic order signal；

Speech recognition module 20 is solved for being decoded based on speech recognition engine to the phonetic order signal Code result；

Matching module 30 is obtained for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model To final recognition result.

By the far field phonetic order identification device, the instruction that can not correctly do not identify far field speech control system passes through Respective algorithms are converted into the control instruction that can be accurately identified, and improving instruction identification rate (can be by far field language control instruction Discrimination improves 15%-20% or so), user experience is enhanced, can be applied to the speech recognitions interaction scenarios such as smart space.

In the present embodiment, the matching module executes following operation:

The decoding result is converted into the Chinese phonetic alphabet；

In the present embodiment, the matching module also executes following operation:

In the present embodiment, the third level, which matches, includes:

In the present embodiment, the third level matching further include:

The series of detailed descriptions listed above only for feasible embodiment of the invention specifically Protection scope bright, that they are not intended to limit the invention, it is all without departing from equivalent implementations made by technical spirit of the present invention Or change should all be included in the protection scope of the present invention.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.

Claims

1. a kind of far field voice instruction recognition method characterized by comprising

Step 1: phonetic order signal in far field to be identified is obtained；

Step 3: instructions match is carried out to the decoding result of step 2 based on the Chinese phonetic alphabet and context model, is finally known Other result.

2. far field voice instruction recognition method according to claim 1, which is characterized in that the step 3 includes:

The Chinese phonetic alphabet of the decoding result is subjected to first order matching in the Chinese phonetic alphabet library, if successful match, directly Return matching result is connect, and terminates matching process.

3. far field voice instruction recognition method according to claim 2, which is characterized in that the step 3 further include:

If the first order batch matches no successful match, the Chinese phonetic alphabet of the decoding result is converted into fuzzy pinyin, it will be described Chinese phonetic alphabet library is converted to fuzzy pinyin library, and carries out second level matching；

If the second level batch carries out cutting with not succeeding, by the Chinese character number of words in the decoding result reference object instruction set, It is progress third level matching after sliding block word for word cutting by number of words.

4. far field voice instruction recognition method according to claim 3, which is characterized in that the third level, which matches, includes:

Cutting result is converted into fuzzy pinyin, and each fuzzy pinyin and the fuzzy pinyin library are subjected to similarity mode, Matching obtains a score C every time, and target instruction target word corresponding to the matching of highest scoring is recognition result.

5. far field voice instruction recognition method according to claim 4, which is characterized in that the third level matching is also wrapped It includes:

If the score of the recognition result is less than threshold value H, by the mould of the recognition result and upper one progress similarity mode It pastes voice and forms context, carry out similarity mode with the fuzzy pinyin library, if the score of the recognition result is greater than threshold value H, then directly returning should be as a result, and terminates subsequent match.

6. a kind of far field phonetic order identification device characterized by comprising

Voice obtains module, for obtaining far field phonetic order signal；

Speech recognition module obtains decoding result for being decoded based on speech recognition engine to the phonetic order signal；

Matching module obtains final for carrying out instructions match to the decoding result based on the Chinese phonetic alphabet and context model Recognition result.

7. phonetic order identification device in far field according to claim 6, which is characterized in that the matching module executes following Operation:

The decoding result is converted into the Chinese phonetic alphabet；

8. phonetic order identification device in far field according to claim 7, which is characterized in that the matching module also execute with Lower operation:

9. phonetic order identification device in far field according to claim 8, which is characterized in that the third level, which matches, includes:

10. phonetic order identification device in far field according to claim 9, which is characterized in that the third level matching is also wrapped It includes: