CN116825108A

CN116825108A - Voice command word recognition method, device, equipment and medium

Info

Publication number: CN116825108A
Application number: CN202311077037.XA
Authority: CN
Inventors: 李�杰
Original assignee: Shenzhen Youjie Zhixin Technology Co ltd
Current assignee: Shenzhen Youjie Zhixin Technology Co ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-09-29
Anticipated expiration: 2043-08-25
Also published as: CN116825108B

Abstract

The application belongs to the technical field of voice recognition, and discloses a voice command word recognition method, a device, equipment and a medium, wherein a set command word is divided into a first-level command word and a second-level command word in advance, a relation between the first-level command word and the second-level command word is constructed, then the first-level command word with the highest score is determined from a first-level command word list in actual recognition, then the scores of all second-level voice command words corresponding to the first-level command word with the highest score are calculated, and the second-level voice command word with the highest score and meeting a set threshold value is taken as a voice recognition result.

Description

Voice command word recognition method, device, equipment and medium

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, a device, and a medium for recognizing a speech command word.

Background

Command word recognition belongs to voice recognition and is widely applied to the field of intelligent home, such as intelligent voice sound boxes, intelligent voice headphones, intelligent voice lamps, intelligent voice fans and the like. Due to cost consideration, compared with intelligent devices such as mobile phones, the embedded device has low calculation power, and small memory and flash. The general command word recognition method is to score all command words, and then select the command word with the highest score and meeting the set threshold value in the command word list as the recognition result of the voice command word. For example, assuming that the designed speech product is to control an electric fan, the following command words are designed: turning on the fan, turning off the fan, first, second, and third wind, scoring all command words herein refers to the need to calculate the scores of the command words on the fan, turning off the fan, first, second, and third wind, respectively. Then, the result with the highest score and exceeding the set threshold value is selected as the result of command word recognition. However, as the number of command words increases, the decoding time also increases linearly, which increases the processing time of speech recognition for low-resource devices, so that recognition may not be possible in real-time. That is, the existing command word recognition method results in a low number of voice command words supported for recognition by the low-resource device.

Disclosure of Invention

The application mainly aims to provide a voice command word recognition method, a device, equipment and a medium, and aims to solve the technical problem that the number of voice command words supported to be recognized by low-resource equipment is not high due to the existing command word recognition method.

In order to achieve the above object, a first aspect of the present application provides a method for recognizing a voice command word, the method comprising:

acquiring a voice signal to be recognized;

extracting the characteristics of the voice signal to be identified to generate a characteristic vector;

inputting the characteristic vector features into an acoustic model to obtain a phoneme probability matrix;

acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;

calculating the score of each first-level command word according to the phoneme probability matrix;

determining the first-level command words with highest scores according to the scores of the first-level command words;

judging whether the first-level command word with the highest score is larger than a preset first threshold value or not;

if not, re-acquiring the voice signal for recognition;

if yes, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word;

If the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result;

if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated;

calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;

determining the second-level command words with highest scores according to the scores of the second-level command words;

judging whether the second level command word with the highest score is larger than a set second threshold value or not;

if yes, taking the second level command word with the highest score as a voice command word recognition result;

if not, the voice signal is acquired again for recognition.

Further, the score of the first-level command word is calculated according to the following steps:

inserting blank symbols in the middle of each word of the first-level command word and in front of and behind the first-level command word to obtain a changed first-level command word;

Converting the changed first-level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;

according to the phoneme probability matrix, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm to obtain a total probability value;

and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the first-level command word.

Further, the step of calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm according to the phoneme probability matrix to obtain a total probability value includes:

and decoding by adopting a forward algorithm according to the phoneme probability matrix, setting the probability of the wildcard symbol to be 1 when the decoding walks to the wildcard symbol, and continuing the forward algorithm until the sum of the probabilities of all paths corresponding to the phoneme sequence is obtained, so as to obtain a total probability value.

Further, before the step of mapping the total probability value into a score according to a preset rule and taking the score as the score of the first-level command word, the method further includes:

normalizing the total probability value to obtain a normalized total probability value.

Further, the step of normalizing the total probability value to obtain a normalized total probability value includes:

calculating a normalized length according to the formula norm len = a-b-c; wherein norm len represents the normalized length, a represents the length of the decoded time interval, b represents the accumulated value of the blank probabilities within the length of the time interval, c represents the length of the wildcard

According to formula p _{Total =} exp (log p (y|x)/norm) calculates the normalized total probability value; where x represents the speech signal to be recognized, y represents the phoneme sequence, and p (y|x) represents the total probability value.

Further, the second level voice command word score is calculated according to the following steps:

inserting blank symbols in the middle of each word of the second-level command word and in front of and behind the second-level command word to obtain a changed second-level command word;

converting the changed second level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;

and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the second-level command word.

In one embodiment, the predetermined first level command word and/or the second level command word includes a command word of a discontinuous font.

In a second aspect, an embodiment of the present application provides a voice command word recognition apparatus, including:

the first acquisition module is used for acquiring a voice signal to be recognized;

the feature extraction module is used for extracting features of the voice signal to be identified and generating feature vectors;

the input module is used for inputting the characteristic vector characteristics into the acoustic model to obtain a phoneme probability matrix;

the second acquisition module is used for acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;

the first calculation module is used for calculating the score of each first-level command word according to the phoneme probability matrix;

the first determining module is used for determining the first-level command words with highest scores according to the scores of the first-level command words;

the first judging module is used for judging whether the first-level command word with the highest score is larger than a preset first threshold value or not;

the first acquisition module is further used for acquiring the voice signal again for recognition if not;

The second judging module is used for judging whether the first-level command word with the highest score has a corresponding second-level command word or not if yes; wherein the second level command word is a specific command word of the first level command word;

the voice command word recognition result determining module is used for taking the first-level command word with the highest score as a voice command word recognition result if the first-level command word with the highest score does not have the corresponding second-level command word;

the third acquisition module is used for acquiring each second level command word corresponding to the first level command word with the highest score as a second level command word to be calculated if the first level command word with the highest score has the corresponding second level command word;

the second calculation module is used for calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;

the second determining module is used for determining the second-level command words with highest scores according to the scores of the second-level command words;

the third judging module is used for judging whether the second level command word with the highest score is larger than a set second threshold value or not;

the voice command word recognition result determining module is further used for taking the second-level command word with the highest score as a voice command word recognition result if the voice command word recognition result determining module is used for determining the voice command word recognition result;

The first acquisition module is further configured to reacquire the voice signal for recognition if not.

A third aspect of the application proposes a computer device comprising a memory and a processor, said memory having stored therein a computer program, said processor implementing the steps of the speech command word recognition method as defined in any one of the preceding claims when said computer program is executed.

A fourth aspect of the application proposes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech command word recognition method as described in any of the preceding claims.

The beneficial effects are that:

according to the embodiment of the application, the set command words are divided into different levels in advance and are divided into the first-level command words and the second-level command words, the first-level command words are the first-calculated command words, the second-level command words are the second-calculated command words, and the second-level command words are the specific command words of the first-level command words, so that in the process of determining the command word recognition result, the first-level command words with the highest score are determined from the first-level command word list, then the scores of all the second-level voice command words corresponding to the first-level command words with the highest score are calculated, and the second-level voice command words with the highest score and meeting the set threshold value are used as voice recognition results.

Drawings

FIG. 1 is a flowchart of a method for recognizing a voice command word according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a voice command word recognition device according to an embodiment of the present application;

fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, modules, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, modules, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any module and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Referring to fig. 1, an embodiment of the present application provides a voice command word recognition method, which includes:

s1, acquiring a voice signal to be recognized.

S2, extracting features of the voice signals to be identified, and generating feature vectors;

s3, inputting the feature vector features into an acoustic model to obtain a phoneme probability matrix;

s4, acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;

s5, calculating the score of each first-level command word according to the phoneme probability matrix;

s6, determining the first-level command words with highest scores according to the scores of the first-level command words;

S7, judging whether the first-level command word with the highest score is larger than a preset first threshold value;

s8, if not, acquiring the voice signal again for recognition;

s9, if so, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word;

s10, if the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result;

s11, if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated;

s12, calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;

s13, determining the second-level command words with highest scores according to the scores of the second-level command words;

s14, judging whether a second level command word with the highest score is larger than a set second threshold value;

s15, if yes, taking the second-level command word with the highest score as a voice command word recognition result;

S16, if not, the voice signal is acquired again for recognition.

In the embodiment of the application, the execution subject of the voice command word recognition method is intelligent voice electronic equipment, and the intelligent voice electronic equipment can recognize the voice command word and execute corresponding actions according to the recognized voice command word. The intelligent voice electronic equipment can be an intelligent voice sound box, an intelligent voice earphone, an intelligent voice lamp, an intelligent voice fan and the like. Before the intelligent voice electronic device executes the steps S1 to S16, the preset command words are required to be divided into different levels, into the first level command words and the second level command words, and the corresponding relationship between the first level command words and the second level command words is constructed and stored in the memory, so that the intelligent voice electronic device can recognize the voice command words by executing the steps S1 to S16.

For ease of understanding, the following description will be given by way of example. It is assumed that there is an electric fan with a voice recognition function, which has a music playing function, a head shaking function, and an air volume control function. The electric fan can control volume, air quantity and shaking head according to voice command words spoken by a user. If the existing voice command word recognition method is used, setting command words as volume up, volume down, maximum volume, minimum volume, open head shaking, close head shaking, stop head shaking, open head shaking, close head shaking, stop head shaking, and open head shaking opening up and down shaking, closing up and down shaking, stopping up and down shaking, increasing air volume, reducing air volume, increasing air volume, opening first gear, opening second gear, opening third gear, opening fourth gear, opening fifth gear, opening sixth gear, opening seventh gear, opening eighth gear, opening ninth gear, opening tenth gear, the little wind classmates (the little wind classmates are wake words, which belong to one kind of command words). And then calculating the scores of all the command words, screening out the command words with the highest scores and meeting the command words with the set threshold. With the method of the present application, it is also necessary to design the first level command words according to the command words, for example, find the volume from the command words of increasing the volume, decreasing the volume, maximizing the volume, and minimizing the volume as the first level command words of the command words. From opening shaking head, closing shaking head, stopping shaking head, opening left and right shaking head, closing left and right shaking head, stopping left and right shaking head and (3) opening the upper and lower shaking heads, closing the upper and lower shaking heads, and stopping the upper and lower shaking heads to find shaking heads from the command words as first-level command words of the command words. The air volume is found from the command words of increasing air volume, reducing air volume, increasing air volume as the first-level command words of the command words. The volume, the air volume and the shaking head are regarded as actions of different categories, namely, the command words are classified according to the action categories. Because the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth steps are not continuous words, unlike the volume, air volume, and shaking head above, the first-level command words of the command words are needed to be inserted between the first and second steps, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth steps. Since some command words have no strong clustering property, such as wake words, may be only or few, and have no similar command words, such as a little-wind classmates, the command words are individually classified into the first-level command words as one type of the first-level command words. That is, in this example, the first level command words are: volume, shaking head, air volume, start gear and little wind classmates; the second level command words are: increasing volume, decreasing volume, maximum volume, minimum volume, opening and closing closing and shaking the head, stopping shaking the head, opening and shaking the head left and right, closing and shaking the head left and right, stopping and shaking the head left and right, opening and shaking the head up and down opening up and down shaking head, closing up and down shaking head, stopping up and down shaking head, increasing air volume, reducing air volume, increasing air volume, opening first gear, opening second gear, opening third gear, opening fourth gear, opening fifth gear, opening sixth gear, opening seventh gear, opening eighth gear, opening ninth gear, opening tenth gear, little wind classmates. The relationship between the first level command word and the second level command word is as follows:

TABLE 1

And (5) continuing the table:

the table needs to be stored in memory. When identifying command words, calculating the score of a first-level command word, screening out the command words with the highest score in the first-level command words, and if the score of the command word with the highest score is not greater than a set first threshold value, re-acquiring a voice signal for identification, namely returning to the step S1, if so, judging whether the command word A1 with the highest score has a corresponding command word with a second level, and if not, taking the command word with the first level with the highest score as a voice command word identification result; if so, calculating the scores of all the second-level command words corresponding to the command words with the highest scores, screening out the command words A2 with the highest scores in the second-level command words, if the score of the command word A2 with the highest scores is larger than a set second threshold value, taking the command word A2 as a recognition result of the voice command word, and if not, re-acquiring the voice signal for recognition, namely returning to the step S1. For example, if the score of the volume is recognized to be highest in the first-level command words and is larger than the set first threshold, the scores of increasing the volume, decreasing the volume and decreasing the volume are calculated respectively, and if the score of the volume is recognized to be highest and is larger than the set second threshold, the volume is recognized as the recognition result of the voice command words. The benefits of the present application over existing speech command word recognition methods are obvious, for example, assuming 100 command words, namely the general method for supporting 100 speech control commands by the product is: looking at the scores of the command words 1-100, finding out the maximum value, and then looking at whether the maximum value meets a set threshold value to judge whether to recognize. The method calculates 100 scores, and the classification recognition of the application is to look at the scores of 10 such as volume, shaking head, air volume and the like first, and the classification recognition is to recognize the volume first if the score is the largest. Then, 10 scores are calculated for 10 command words below the volume class, and the maximum value is found to judge whether the command words are recognized or not, and the score is calculated for 20 times.

Therefore, in the embodiment of the application, the set command words are divided into different levels in advance, the first level command words and the second level command words are divided, the first level command words are the command words calculated first, the second level command words are the command words calculated later, and the second level command words are the specific command words of the first level command words, so that in the process of determining the command word recognition result, the first level command words with the highest score are determined from the first level command word list, then the scores of all the second level voice command words corresponding to the first level command words with the highest score are calculated, the second level voice command words with the highest score and meeting the set threshold value are used as voice recognition results, and as the result of not calculating the scores of all the command words, the voice recognition result can be obtained.

In step S1, the speech signal to be recognized is obtained by microphone acquisition, and in step S2, the extracted feature vector may include Fbank features, fbank (FilterBank): fbank is a front-end processing algorithm, and the audio is processed in a mode similar to human ears, so that the voice recognition performance can be improved. The general steps for obtaining Fbank features of a speech signal are: pre-emphasis, framing, windowing, short Time Fourier Transform (STFT), mel filtering, etc. In step S3, the acoustic model is a model that outputs a phoneme probability matrix based on the feature sequence, where the phoneme probability matrix may be considered as a sequence of phoneme probability distributions output for each small period of time, and thus a sequence of audio frequencies corresponds to a plurality of sequences of phoneme probability distributions, and forms a decoding matrix, where the decoding matrix is the phoneme probability matrix. In steps S4-S16, it can be understood from the above examples, where, whether the scores of the first level command words or the second level command words are calculated, the scores are calculated according to a phoneme probability matrix, and the specific calculation method can be calculated by adopting a CTC decoding method. The application does not change the network architecture, and only needs to adapt at the decoding position, thereby improving the migration capability of the algorithm.

As can be seen from the above, if the first-level command word has a discontinuous character shape such as an opening, a method capable of calculating the score of the command word is required to be designed for calculating the score of the command word, and the method is applicable to not only the discontinuous character shape such as an opening, but also command words such as volume and colleagues, that is, the method for calculating the command word in steps S51-S54 has strong generalization capability. Specifically, in some embodiments, the score of the first level command word is preferably calculated according to the following steps S51-S54:

s51, inserting blank symbols in the middle of each word of the first-level command word and in the front and the back of the first-level command word to obtain a changed first-level command word;

s52, converting the changed first-level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;

s53, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm according to the phoneme probability matrix to obtain a total probability value;

and S54, mapping the total probability value into a score according to a preset rule, and taking the score as the score of the first-level command word.

In step S51, assuming that the command word is "turn on lamp", after inserting the blank symbol, it becomes:

b turns on b lamp b. In step S53, since the multiple paths may be mapped to the same phoneme sequence, the probability sum of all paths corresponding to the phoneme sequence needs to be calculated to obtain a total probability value, which is the probability value of the phoneme sequence. Specifically, the probability value of the phoneme sequence is calculated by a forward algorithm, namely a dynamic programming method, namely the probability sum of all paths of the phoneme sequence is calculated equivalently. In step S54, the total probability value is a number of 0 to 1, and 0 to 1 may be mapped to 0 to 100 minutes according to a set rule.

In some embodiments, the step of calculating the sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm according to the phoneme probability matrix to obtain a total probability value includes:

In the embodiment of the application, the forward algorithm is adopted to calculate the probability of all paths corresponding to the phoneme sequence and the actual decoding process, wherein the forward algorithm can be the forward algorithm in the CTC decoding method, when the decoding is carried out, the probability is set to be 1, and the forward algorithm is continued, so that only the start and the stop in the start x-stop can be guaranteed to be focused, and the problem of discontinuous characters is solved.

In some embodiments, before the step of mapping the total probability value into a score according to a preset rule and taking the score as the score of the first-level command word, the method further includes:

In the embodiment of the application, the total probability value refers to the probability of which command is identified under the input of the voice, normalization of the total probability value is the correction of the total probability value, and the length factors of different command words and the length factors of the decoding matrix are required to be considered during specific normalization, so that a more accurate total probability value can be obtained, and further a more accurate score can be obtained.

In some embodiments, the step of normalizing the total probability value to obtain a normalized total probability value includes:

In the embodiment of the application, the normalization of the total probability value is realized according to the above formula, and the more accurate total probability value can be obtained due to the consideration of the lengths of different command words (namely dividing norm) and the length factors of the decoding matrix, so that more accurate score can be obtained. The command words of different lengths have different lengths, and even if the same length or the same command word is decoded, the length of the decoding window and the length of the blank accumulated sum are related.

In some embodiments, the second level voice command word score is calculated according to the following steps:

s121, inserting blank symbols in the middle of each word of the second-level command word and in front of and behind the second-level command word to obtain a changed second-level command word;

s122, converting the changed second level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;

s123, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm according to the phoneme probability matrix to obtain a total probability value;

and S124, mapping the total probability value into a score according to a preset rule, and taking the score as the score of the second-level command word.

In the embodiment of the present application, since the second level command word may have a discontinuous font, the second level command word is calculated by using the same method as the method for calculating the score of the first level voice command word, and the method is the same, so that the embodiment of the present application will not be repeated. However, it should be noted that, in step S122, the phoneme sequence is not the phoneme sequence corresponding to the first level command word, and in order to facilitate distinction, the phoneme sequence in step S122, that is, the phoneme sequence corresponding to the second level command word, is used as the second phoneme sequence, the phoneme sequence corresponding to the first level command word is used as the first phoneme sequence, the total probability value corresponding to the first level command word is used as the first total probability value, and the phoneme sequence corresponding to the second level command word is used as the second total probability value.

In some embodiments, the step of calculating, according to the phoneme probability matrix, a sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm to obtain a total probability value includes:

In some embodiments, before the step of mapping the total probability value to a score according to a preset rule and taking the score as the score of the second level command word, the method further includes:

Note that the total probability value in this embodiment refers to the total probability value of the second level command word.

In the embodiment of the application, the command words of the discontinuous fonts refer to the command words of the discontinuous fonts such as a first gear, a second gear and the like.

Referring to fig. 2, the embodiment of the application further provides a voice command word recognition device, which includes:

a first obtaining module 1, configured to obtain a speech signal to be identified;

the feature extraction module 2 is used for extracting features of the voice signal to be identified and generating a feature vector;

the input module 3 is used for inputting the characteristic vector characteristics into an acoustic model to obtain a phoneme probability matrix;

the second obtaining module 4 is used for obtaining a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;

a first calculation module 5, configured to calculate a score of each first-level command word according to the phoneme probability matrix;

a first determining module 6, configured to determine a first level command word with the highest score according to the score of each first level command word;

the first judging module 7 is used for judging whether the first-level command word with the highest score is larger than a preset first threshold value;

the first obtaining module 1 is further configured to re-obtain a voice signal for recognition if not;

The second judging module 8 is configured to judge whether the first level command word with the highest score has a corresponding second level command word if yes; wherein the second level command word is a specific command word of the first level command word;

the voice command word recognition result determining module 9 is configured to use the first-level command word with the highest score as a voice command word recognition result if the first-level command word with the highest score does not have the corresponding second-level command word;

a third obtaining module 10, configured to obtain, if the first level command word with the highest score has a corresponding second level command word, each second level command word corresponding to the first level command word with the highest score as a second level command word to be calculated;

a second calculation module 11, configured to calculate a score of each second level command word to be calculated according to the phoneme probability matrix;

a second determining module 12, configured to determine a second level command word with the highest score according to the score of each second level command word;

a third judging module 13, configured to judge whether the second level command word with the highest score is greater than a set second threshold;

the voice command word recognition result determining module 9 is further configured to, if yes, use the second level command word with the highest score as a voice command word recognition result;

The first obtaining module 1 is further configured to re-obtain the voice signal for recognition if not.

In some embodiments, the score of the first level command word is calculated according to the following steps:

In some embodiments, mapping the total probability value into a score according to a preset rule, and before using the score as the score of the first level command word, further includes:

Note that the total probability value in this embodiment refers to the total probability value of the first-level command word.

In some embodiments, normalizing the total probability value, and obtaining the normalized total probability value includes:

Referring to fig. 3, an embodiment of the present application further provides a computer device, and an internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating device, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing a voice command word recognition method and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. Further, the above-mentioned computer apparatus may be further provided with an input device, a display screen, and the like. The computer program is executed by a processor to realize a voice command word recognition method, and comprises the following steps: acquiring a voice signal to be recognized; extracting the characteristics of the voice signal to be identified to generate a characteristic vector; inputting the characteristic vector features into an acoustic model to obtain a phoneme probability matrix; acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words; calculating the score of each first-level command word according to the phoneme probability matrix; determining the first-level command words with highest scores according to the scores of the first-level command words; judging whether the first-level command word with the highest score is larger than a preset first threshold value or not; if not, re-acquiring the voice signal for recognition; if yes, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word; if the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result; if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated; calculating the score of each second-level command word to be calculated according to the phoneme probability matrix; determining the second-level command words with highest scores according to the scores of the second-level command words; judging whether the second level command word with the highest score is larger than a set second threshold value or not; if yes, taking the second level command word with the highest score as a voice command word recognition result; if not, the voice signal is acquired again for recognition. It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.

An embodiment of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a voice command word recognition method, comprising the steps of: acquiring a voice signal to be recognized; extracting the characteristics of the voice signal to be identified to generate a characteristic vector; inputting the characteristic vector features into an acoustic model to obtain a phoneme probability matrix; acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words; calculating the score of each first-level command word according to the phoneme probability matrix; determining the first-level command words with highest scores according to the scores of the first-level command words; judging whether the first-level command word with the highest score is larger than a preset first threshold value or not; if not, re-acquiring the voice signal for recognition; if yes, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word; if the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result; if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated; calculating the score of each second-level command word to be calculated according to the phoneme probability matrix; determining the second-level command words with highest scores according to the scores of the second-level command words; judging whether the second level command word with the highest score is larger than a set second threshold value or not; if yes, taking the second level command word with the highest score as a voice command word recognition result; if not, the voice signal is acquired again for recognition. It is understood that the computer readable storage medium in this embodiment may be a volatile readable storage medium or a nonvolatile readable storage medium.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the application.

Claims

1. A method for recognizing a voice command word, the method comprising:

acquiring a voice signal to be recognized;

if not, re-acquiring the voice signal for recognition;

if not, the voice signal is acquired again for recognition.

2. The voice command word recognition method according to claim 1, wherein the score of the first level command word is calculated according to the steps of:

3. The method for recognizing speech command words according to claim 2, wherein the step of calculating the sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm based on the phoneme probability matrix to obtain a total probability value comprises:

4. The voice command word recognition method according to claim 2, wherein the step of mapping the total probability value into a score according to a predetermined rule, and using the score as the score of the first-level command word, further comprises:

5. The method of claim 4, wherein normalizing the total probability value to obtain a normalized total probability value comprises:

6. The method of claim 1, wherein the second level of speech command word score is calculated according to the steps of:

7. The voice command word recognition method according to claim 1, wherein the preset first level command word and/or the second level command word includes a command word of a discontinuous font.

8. A voice command word recognition apparatus, comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the speech command word recognition method according to any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the speech command word recognition method according to any of claims 1 to 7.