CN109903766B

CN109903766B - Far-field voice instruction recognition method and device

Info

Publication number: CN109903766B
Application number: CN201910237263.7A
Authority: CN
Inventors: 邱建; 王兴; 佟彤
Original assignee: Beijing Aotewei Technology Co ltd
Current assignee: Beijing Aotewei Technology Co ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2021-06-04
Anticipated expiration: 2039-03-27
Also published as: CN109903766A

Abstract

The invention relates to a far-field voice instruction recognition method, which comprises the following steps: the method comprises the following steps: acquiring a far-field voice instruction signal to be recognized; step two: decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result; step three: and performing instruction matching on the decoding result obtained in the step two based on the Chinese pinyin and the context model to obtain a final recognition result. The method can convert the command which is not correctly recognized by the far-field voice control system into the control command which can be accurately recognized through the corresponding algorithm, improves the command recognition rate, enhances the user experience, and can be applied to voice recognition interaction scenes such as intelligent space and the like.

Description

Far-field voice instruction recognition method and device

Technical Field

The invention belongs to the technical field of far-field voice recognition, and particularly relates to a far-field voice instruction recognition method and device.

Background

As a current popular man-machine interaction mode, the voice technology is widely applied to various aspects in the field of intelligence in recent years. With the continuous development of the technology, the voice control technology is also continuously advanced. Compared with the prior art, the voice control method has the advantages that the voice control method can be used conveniently without manual control of an operator, so that the use of the voice control technology is more extensive. Among them, the premise of voice control is that a voice recognition technology is needed as a basis, and thus, the development of the voice recognition technology is gradually emphasized by those in the art. Based on the difference between the distance between the voice emitting position and the voice receiving device, the voice recognition technology is generally divided into far-field voice recognition and near-field voice recognition, and the far-field voice recognition can realize the recognition of a far-distance voice instruction, so that the far-field voice recognition technology is more regarded by technical experts.

At present, when a user performs far-field voice control, because the existing far-field voice recognition method has relatively fixed awakening words and control instructions, the change is less, the error rate is higher, particularly, the error rate is higher in a voice instruction control scene, and the user experience is poorer. Therefore, how to implement a speech recognition correction method and apparatus capable of accurately correcting a speech control command becomes a problem to be solved in the art.

Disclosure of Invention

The invention aims to provide a voice recognition correction method and a voice recognition correction device, which aim to solve the problem of low accuracy of far-field voice control instruction recognition.

The invention provides a far-field voice instruction recognition method, which comprises the following steps:

the method comprises the following steps: acquiring a far-field voice instruction signal to be recognized;

step two: decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;

step three: and performing instruction matching on the decoding result obtained in the step two based on the Chinese pinyin and the context model to obtain a final recognition result.

Further, the third step includes:

converting the decoding result of the second step into Chinese pinyin;

converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library;

and performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process.

Further, the third step further includes:

if the first-stage batch matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching;

if the second-stage matching is successful, directly returning a matching result and finishing the matching process;

and if the second-level batch matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-level matching.

Further, the third stage matching includes:

and converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library to obtain a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result.

Further, the third-level matching further comprises:

if the score of the recognition result is larger than the threshold value H, directly returning the result, and finishing the subsequent matching;

and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching.

The invention also provides a far-field voice instruction recognition device, which comprises:

the voice acquisition module is used for acquiring a far-field voice instruction signal;

the voice recognition module is used for decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;

and the matching module is used for performing instruction matching on the decoding result based on the Chinese pinyin and the context model to obtain a final recognition result.

Further, the matching module performs the following operations:

converting the decoding result into Chinese pinyin;

converting all target instructions in the target instruction set into Chinese pinyin to obtain a Chinese pinyin library; and

Further, the matching module performs the following operations:

Further, the third stage matching includes:

Further, the third-level matching further comprises:

Compared with the prior art, the invention has the beneficial effects that: the instruction which is not correctly identified by the far-field voice control system can be converted into the control instruction which can be accurately identified through a corresponding algorithm, so that the instruction identification rate is improved, the user experience is enhanced, and the method can be applied to voice identification interaction scenes such as an intelligent space.

Drawings

FIG. 1 is a flow chart of the speech control command recognition correction of the present invention;

FIG. 2 is a block diagram of the voice control command recognition correcting device of the present invention.

Detailed Description

The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.

Referring to fig. 1, the present embodiment provides a far-field speech instruction recognition method, including:

step S1: acquiring a far-field voice instruction signal to be recognized; such as a voice command signal output by a microphone in a voice command control scenario. Generally, the distance between a sound source and a central reference point of a microphone array is far field when the distance is far larger than the signal wavelength, and the value is usually more than 3 meters in the field of speech recognition.

Step S2: decoding the voice instruction signal based on a voice recognition engine to obtain a decoding result;

step S3: and performing instruction matching on the decoding result obtained in the step S2 based on the Chinese pinyin and the context model to obtain a final recognition result.

By the far-field voice instruction recognition method, an instruction which is not correctly recognized by a far-field voice control system can be converted into a control instruction which can be accurately recognized through a corresponding algorithm, the instruction recognition rate is improved (the far-field language control instruction recognition rate can be improved by about 15% -20%), the user experience is enhanced, and the method can be applied to voice recognition interaction scenes such as a smart space.

In the present embodiment, step S3 includes:

converting the decoding result of the step S2 into Chinese pinyin;

In this embodiment, step S3 further includes:

and if the second-level batch matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-level matching. Wherein, aiming at a plurality of characters, two or more characters are set as a sliding block, and only one character is slid each time to perform the character-by-character segmentation.

In this embodiment, the third-stage matching includes:

In this embodiment, the third-stage matching further includes:

and if the score of the recognition result is smaller than a threshold value H, forming a context by the recognition result and the previous fuzzy speech subjected to similarity matching, carrying out similarity matching with the fuzzy pinyin library, and if the score of the recognition result is larger than the threshold value H, directly returning the result and finishing subsequent matching. If the score of the recognition result is still smaller than the threshold H, the recognition result is not considered as the final recognition result because the reliability of the recognition result is low.

Referring to fig. 2, the present embodiment further provides a far-field speech instruction recognition apparatus, including:

the voice acquisition module 10 is used for acquiring a far-field voice instruction signal;

the voice recognition module 20 is configured to decode the voice instruction signal based on a voice recognition engine to obtain a decoding result;

and the matching module 30 is used for performing instruction matching on the decoding result based on the pinyin and the context model to obtain a final recognition result.

Through the far-field voice command recognition device, an incorrectly recognized command of a far-field voice control system can be converted into a control command which can be recognized accurately through a corresponding algorithm, the command recognition rate is improved (the recognition rate of the far-field language control command can be improved by about 15% -20%), the user experience is enhanced, and the device can be applied to voice recognition interaction scenes such as a smart space.

In this embodiment, the matching module performs the following operations:

converting the decoding result into Chinese pinyin;

In this embodiment, the matching module further performs the following operations:

In this embodiment, the third-stage matching includes:

In this embodiment, the third-stage matching further includes:

The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A far-field speech command recognition method, comprising:

step three: performing instruction matching on the decoding result of the step two based on the Chinese pinyin, the fuzzy pinyin and the context model to obtain a final recognition result;

the third step comprises:

converting the decoding result of the second step into Chinese pinyin;

performing first-stage matching on the Chinese pinyin of the decoding result in the Chinese pinyin library, and if the matching is successful, directly returning a matching result and finishing the matching process;

if the first-stage matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching; if the second-stage matching is successful, directly returning a matching result and finishing the matching process;

if the second-stage matching is not successful, segmenting the decoding result by referring to the character number of the Chinese characters in the target instruction set, segmenting the Chinese characters one by taking the character number as a sliding block, and then performing third-stage matching;

wherein the third stage matching comprises:

converting the segmentation result into fuzzy pinyins, matching the similarity of each fuzzy pinyin with the fuzzy pinyins library, obtaining a score C each time, wherein the target instruction corresponding to the matching with the highest score is the identification result;

2. A far-field speech command recognition apparatus, comprising:

the matching module is used for performing instruction matching on the decoding result based on the Chinese pinyin, the fuzzy pinyin and the context model to obtain a final identification result;

the matching module performs the following operations:

converting the decoding result into Chinese pinyin;

the matching module further performs the following operations:

if the first-stage matching is not successful, converting the Chinese pinyin of the decoding result into fuzzy pinyin, converting the Chinese pinyin library into a fuzzy pinyin library, and performing second-stage matching;

wherein the third stage matching comprises: