CN116825108A - Voice command word recognition method, device, equipment and medium - Google Patents

Voice command word recognition method, device, equipment and medium Download PDF

Info

Publication number
CN116825108A
CN116825108A CN202311077037.XA CN202311077037A CN116825108A CN 116825108 A CN116825108 A CN 116825108A CN 202311077037 A CN202311077037 A CN 202311077037A CN 116825108 A CN116825108 A CN 116825108A
Authority
CN
China
Prior art keywords
command word
level command
score
level
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311077037.XA
Other languages
Chinese (zh)
Other versions
CN116825108B (en
Inventor
李�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Youjie Zhixin Technology Co ltd
Original Assignee
Shenzhen Youjie Zhixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Youjie Zhixin Technology Co ltd filed Critical Shenzhen Youjie Zhixin Technology Co ltd
Priority to CN202311077037.XA priority Critical patent/CN116825108B/en
Publication of CN116825108A publication Critical patent/CN116825108A/en
Application granted granted Critical
Publication of CN116825108B publication Critical patent/CN116825108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The application belongs to the technical field of voice recognition, and discloses a voice command word recognition method, a device, equipment and a medium, wherein a set command word is divided into a first-level command word and a second-level command word in advance, a relation between the first-level command word and the second-level command word is constructed, then the first-level command word with the highest score is determined from a first-level command word list in actual recognition, then the scores of all second-level voice command words corresponding to the first-level command word with the highest score are calculated, and the second-level voice command word with the highest score and meeting a set threshold value is taken as a voice recognition result.

Description

Voice command word recognition method, device, equipment and medium
Technical Field
The present application relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, a device, and a medium for recognizing a speech command word.
Background
Command word recognition belongs to voice recognition and is widely applied to the field of intelligent home, such as intelligent voice sound boxes, intelligent voice headphones, intelligent voice lamps, intelligent voice fans and the like. Due to cost consideration, compared with intelligent devices such as mobile phones, the embedded device has low calculation power, and small memory and flash. The general command word recognition method is to score all command words, and then select the command word with the highest score and meeting the set threshold value in the command word list as the recognition result of the voice command word. For example, assuming that the designed speech product is to control an electric fan, the following command words are designed: turning on the fan, turning off the fan, first, second, and third wind, scoring all command words herein refers to the need to calculate the scores of the command words on the fan, turning off the fan, first, second, and third wind, respectively. Then, the result with the highest score and exceeding the set threshold value is selected as the result of command word recognition. However, as the number of command words increases, the decoding time also increases linearly, which increases the processing time of speech recognition for low-resource devices, so that recognition may not be possible in real-time. That is, the existing command word recognition method results in a low number of voice command words supported for recognition by the low-resource device.
Disclosure of Invention
The application mainly aims to provide a voice command word recognition method, a device, equipment and a medium, and aims to solve the technical problem that the number of voice command words supported to be recognized by low-resource equipment is not high due to the existing command word recognition method.
In order to achieve the above object, a first aspect of the present application provides a method for recognizing a voice command word, the method comprising:
acquiring a voice signal to be recognized;
extracting the characteristics of the voice signal to be identified to generate a characteristic vector;
inputting the characteristic vector features into an acoustic model to obtain a phoneme probability matrix;
acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;
calculating the score of each first-level command word according to the phoneme probability matrix;
determining the first-level command words with highest scores according to the scores of the first-level command words;
judging whether the first-level command word with the highest score is larger than a preset first threshold value or not;
if not, re-acquiring the voice signal for recognition;
if yes, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word;
If the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result;
if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated;
calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;
determining the second-level command words with highest scores according to the scores of the second-level command words;
judging whether the second level command word with the highest score is larger than a set second threshold value or not;
if yes, taking the second level command word with the highest score as a voice command word recognition result;
if not, the voice signal is acquired again for recognition.
Further, the score of the first-level command word is calculated according to the following steps:
inserting blank symbols in the middle of each word of the first-level command word and in front of and behind the first-level command word to obtain a changed first-level command word;
Converting the changed first-level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
according to the phoneme probability matrix, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm to obtain a total probability value;
and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the first-level command word.
Further, the step of calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm according to the phoneme probability matrix to obtain a total probability value includes:
and decoding by adopting a forward algorithm according to the phoneme probability matrix, setting the probability of the wildcard symbol to be 1 when the decoding walks to the wildcard symbol, and continuing the forward algorithm until the sum of the probabilities of all paths corresponding to the phoneme sequence is obtained, so as to obtain a total probability value.
Further, before the step of mapping the total probability value into a score according to a preset rule and taking the score as the score of the first-level command word, the method further includes:
normalizing the total probability value to obtain a normalized total probability value.
Further, the step of normalizing the total probability value to obtain a normalized total probability value includes:
calculating a normalized length according to the formula norm len = a-b-c; wherein norm len represents the normalized length, a represents the length of the decoded time interval, b represents the accumulated value of the blank probabilities within the length of the time interval, c represents the length of the wildcard
According to formula p Total = exp (log p (y|x)/norm) calculates the normalized total probability value; where x represents the speech signal to be recognized, y represents the phoneme sequence, and p (y|x) represents the total probability value.
Further, the second level voice command word score is calculated according to the following steps:
inserting blank symbols in the middle of each word of the second-level command word and in front of and behind the second-level command word to obtain a changed second-level command word;
converting the changed second level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
according to the phoneme probability matrix, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm to obtain a total probability value;
and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the second-level command word.
In one embodiment, the predetermined first level command word and/or the second level command word includes a command word of a discontinuous font.
In a second aspect, an embodiment of the present application provides a voice command word recognition apparatus, including:
the first acquisition module is used for acquiring a voice signal to be recognized;
the feature extraction module is used for extracting features of the voice signal to be identified and generating feature vectors;
the input module is used for inputting the characteristic vector characteristics into the acoustic model to obtain a phoneme probability matrix;
the second acquisition module is used for acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;
the first calculation module is used for calculating the score of each first-level command word according to the phoneme probability matrix;
the first determining module is used for determining the first-level command words with highest scores according to the scores of the first-level command words;
the first judging module is used for judging whether the first-level command word with the highest score is larger than a preset first threshold value or not;
the first acquisition module is further used for acquiring the voice signal again for recognition if not;
The second judging module is used for judging whether the first-level command word with the highest score has a corresponding second-level command word or not if yes; wherein the second level command word is a specific command word of the first level command word;
the voice command word recognition result determining module is used for taking the first-level command word with the highest score as a voice command word recognition result if the first-level command word with the highest score does not have the corresponding second-level command word;
the third acquisition module is used for acquiring each second level command word corresponding to the first level command word with the highest score as a second level command word to be calculated if the first level command word with the highest score has the corresponding second level command word;
the second calculation module is used for calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;
the second determining module is used for determining the second-level command words with highest scores according to the scores of the second-level command words;
the third judging module is used for judging whether the second level command word with the highest score is larger than a set second threshold value or not;
the voice command word recognition result determining module is further used for taking the second-level command word with the highest score as a voice command word recognition result if the voice command word recognition result determining module is used for determining the voice command word recognition result;
The first acquisition module is further configured to reacquire the voice signal for recognition if not.
A third aspect of the application proposes a computer device comprising a memory and a processor, said memory having stored therein a computer program, said processor implementing the steps of the speech command word recognition method as defined in any one of the preceding claims when said computer program is executed.
A fourth aspect of the application proposes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech command word recognition method as described in any of the preceding claims.
The beneficial effects are that:
according to the embodiment of the application, the set command words are divided into different levels in advance and are divided into the first-level command words and the second-level command words, the first-level command words are the first-calculated command words, the second-level command words are the second-calculated command words, and the second-level command words are the specific command words of the first-level command words, so that in the process of determining the command word recognition result, the first-level command words with the highest score are determined from the first-level command word list, then the scores of all the second-level voice command words corresponding to the first-level command words with the highest score are calculated, and the second-level voice command words with the highest score and meeting the set threshold value are used as voice recognition results.
Drawings
FIG. 1 is a flowchart of a method for recognizing a voice command word according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a voice command word recognition device according to an embodiment of the present application;
fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, modules, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, modules, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any module and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Referring to fig. 1, an embodiment of the present application provides a voice command word recognition method, which includes:
s1, acquiring a voice signal to be recognized.
S2, extracting features of the voice signals to be identified, and generating feature vectors;
s3, inputting the feature vector features into an acoustic model to obtain a phoneme probability matrix;
s4, acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;
s5, calculating the score of each first-level command word according to the phoneme probability matrix;
s6, determining the first-level command words with highest scores according to the scores of the first-level command words;
S7, judging whether the first-level command word with the highest score is larger than a preset first threshold value;
s8, if not, acquiring the voice signal again for recognition;
s9, if so, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word;
s10, if the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result;
s11, if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated;
s12, calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;
s13, determining the second-level command words with highest scores according to the scores of the second-level command words;
s14, judging whether a second level command word with the highest score is larger than a set second threshold value;
s15, if yes, taking the second-level command word with the highest score as a voice command word recognition result;
S16, if not, the voice signal is acquired again for recognition.
In the embodiment of the application, the execution subject of the voice command word recognition method is intelligent voice electronic equipment, and the intelligent voice electronic equipment can recognize the voice command word and execute corresponding actions according to the recognized voice command word. The intelligent voice electronic equipment can be an intelligent voice sound box, an intelligent voice earphone, an intelligent voice lamp, an intelligent voice fan and the like. Before the intelligent voice electronic device executes the steps S1 to S16, the preset command words are required to be divided into different levels, into the first level command words and the second level command words, and the corresponding relationship between the first level command words and the second level command words is constructed and stored in the memory, so that the intelligent voice electronic device can recognize the voice command words by executing the steps S1 to S16.
For ease of understanding, the following description will be given by way of example. It is assumed that there is an electric fan with a voice recognition function, which has a music playing function, a head shaking function, and an air volume control function. The electric fan can control volume, air quantity and shaking head according to voice command words spoken by a user. If the existing voice command word recognition method is used, setting command words as volume up, volume down, maximum volume, minimum volume, open head shaking, close head shaking, stop head shaking, open head shaking, close head shaking, stop head shaking, and open head shaking opening up and down shaking, closing up and down shaking, stopping up and down shaking, increasing air volume, reducing air volume, increasing air volume, opening first gear, opening second gear, opening third gear, opening fourth gear, opening fifth gear, opening sixth gear, opening seventh gear, opening eighth gear, opening ninth gear, opening tenth gear, the little wind classmates (the little wind classmates are wake words, which belong to one kind of command words). And then calculating the scores of all the command words, screening out the command words with the highest scores and meeting the command words with the set threshold. With the method of the present application, it is also necessary to design the first level command words according to the command words, for example, find the volume from the command words of increasing the volume, decreasing the volume, maximizing the volume, and minimizing the volume as the first level command words of the command words. From opening shaking head, closing shaking head, stopping shaking head, opening left and right shaking head, closing left and right shaking head, stopping left and right shaking head and (3) opening the upper and lower shaking heads, closing the upper and lower shaking heads, and stopping the upper and lower shaking heads to find shaking heads from the command words as first-level command words of the command words. The air volume is found from the command words of increasing air volume, reducing air volume, increasing air volume as the first-level command words of the command words. The volume, the air volume and the shaking head are regarded as actions of different categories, namely, the command words are classified according to the action categories. Because the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth steps are not continuous words, unlike the volume, air volume, and shaking head above, the first-level command words of the command words are needed to be inserted between the first and second steps, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth steps. Since some command words have no strong clustering property, such as wake words, may be only or few, and have no similar command words, such as a little-wind classmates, the command words are individually classified into the first-level command words as one type of the first-level command words. That is, in this example, the first level command words are: volume, shaking head, air volume, start gear and little wind classmates; the second level command words are: increasing volume, decreasing volume, maximum volume, minimum volume, opening and closing closing and shaking the head, stopping shaking the head, opening and shaking the head left and right, closing and shaking the head left and right, stopping and shaking the head left and right, opening and shaking the head up and down opening up and down shaking head, closing up and down shaking head, stopping up and down shaking head, increasing air volume, reducing air volume, increasing air volume, opening first gear, opening second gear, opening third gear, opening fourth gear, opening fifth gear, opening sixth gear, opening seventh gear, opening eighth gear, opening ninth gear, opening tenth gear, little wind classmates. The relationship between the first level command word and the second level command word is as follows:
TABLE 1
And (5) continuing the table:
the table needs to be stored in memory. When identifying command words, calculating the score of a first-level command word, screening out the command words with the highest score in the first-level command words, and if the score of the command word with the highest score is not greater than a set first threshold value, re-acquiring a voice signal for identification, namely returning to the step S1, if so, judging whether the command word A1 with the highest score has a corresponding command word with a second level, and if not, taking the command word with the first level with the highest score as a voice command word identification result; if so, calculating the scores of all the second-level command words corresponding to the command words with the highest scores, screening out the command words A2 with the highest scores in the second-level command words, if the score of the command word A2 with the highest scores is larger than a set second threshold value, taking the command word A2 as a recognition result of the voice command word, and if not, re-acquiring the voice signal for recognition, namely returning to the step S1. For example, if the score of the volume is recognized to be highest in the first-level command words and is larger than the set first threshold, the scores of increasing the volume, decreasing the volume and decreasing the volume are calculated respectively, and if the score of the volume is recognized to be highest and is larger than the set second threshold, the volume is recognized as the recognition result of the voice command words. The benefits of the present application over existing speech command word recognition methods are obvious, for example, assuming 100 command words, namely the general method for supporting 100 speech control commands by the product is: looking at the scores of the command words 1-100, finding out the maximum value, and then looking at whether the maximum value meets a set threshold value to judge whether to recognize. The method calculates 100 scores, and the classification recognition of the application is to look at the scores of 10 such as volume, shaking head, air volume and the like first, and the classification recognition is to recognize the volume first if the score is the largest. Then, 10 scores are calculated for 10 command words below the volume class, and the maximum value is found to judge whether the command words are recognized or not, and the score is calculated for 20 times.
Therefore, in the embodiment of the application, the set command words are divided into different levels in advance, the first level command words and the second level command words are divided, the first level command words are the command words calculated first, the second level command words are the command words calculated later, and the second level command words are the specific command words of the first level command words, so that in the process of determining the command word recognition result, the first level command words with the highest score are determined from the first level command word list, then the scores of all the second level voice command words corresponding to the first level command words with the highest score are calculated, the second level voice command words with the highest score and meeting the set threshold value are used as voice recognition results, and as the result of not calculating the scores of all the command words, the voice recognition result can be obtained.
In step S1, the speech signal to be recognized is obtained by microphone acquisition, and in step S2, the extracted feature vector may include Fbank features, fbank (FilterBank): fbank is a front-end processing algorithm, and the audio is processed in a mode similar to human ears, so that the voice recognition performance can be improved. The general steps for obtaining Fbank features of a speech signal are: pre-emphasis, framing, windowing, short Time Fourier Transform (STFT), mel filtering, etc. In step S3, the acoustic model is a model that outputs a phoneme probability matrix based on the feature sequence, where the phoneme probability matrix may be considered as a sequence of phoneme probability distributions output for each small period of time, and thus a sequence of audio frequencies corresponds to a plurality of sequences of phoneme probability distributions, and forms a decoding matrix, where the decoding matrix is the phoneme probability matrix. In steps S4-S16, it can be understood from the above examples, where, whether the scores of the first level command words or the second level command words are calculated, the scores are calculated according to a phoneme probability matrix, and the specific calculation method can be calculated by adopting a CTC decoding method. The application does not change the network architecture, and only needs to adapt at the decoding position, thereby improving the migration capability of the algorithm.
As can be seen from the above, if the first-level command word has a discontinuous character shape such as an opening, a method capable of calculating the score of the command word is required to be designed for calculating the score of the command word, and the method is applicable to not only the discontinuous character shape such as an opening, but also command words such as volume and colleagues, that is, the method for calculating the command word in steps S51-S54 has strong generalization capability. Specifically, in some embodiments, the score of the first level command word is preferably calculated according to the following steps S51-S54:
s51, inserting blank symbols in the middle of each word of the first-level command word and in the front and the back of the first-level command word to obtain a changed first-level command word;
s52, converting the changed first-level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
s53, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm according to the phoneme probability matrix to obtain a total probability value;
and S54, mapping the total probability value into a score according to a preset rule, and taking the score as the score of the first-level command word.
In step S51, assuming that the command word is "turn on lamp", after inserting the blank symbol, it becomes:
b turns on b lamp b. In step S53, since the multiple paths may be mapped to the same phoneme sequence, the probability sum of all paths corresponding to the phoneme sequence needs to be calculated to obtain a total probability value, which is the probability value of the phoneme sequence. Specifically, the probability value of the phoneme sequence is calculated by a forward algorithm, namely a dynamic programming method, namely the probability sum of all paths of the phoneme sequence is calculated equivalently. In step S54, the total probability value is a number of 0 to 1, and 0 to 1 may be mapped to 0 to 100 minutes according to a set rule.
In some embodiments, the step of calculating the sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm according to the phoneme probability matrix to obtain a total probability value includes:
and decoding by adopting a forward algorithm according to the phoneme probability matrix, setting the probability of the wildcard symbol to be 1 when the decoding walks to the wildcard symbol, and continuing the forward algorithm until the sum of the probabilities of all paths corresponding to the phoneme sequence is obtained, so as to obtain a total probability value.
In the embodiment of the application, the forward algorithm is adopted to calculate the probability of all paths corresponding to the phoneme sequence and the actual decoding process, wherein the forward algorithm can be the forward algorithm in the CTC decoding method, when the decoding is carried out, the probability is set to be 1, and the forward algorithm is continued, so that only the start and the stop in the start x-stop can be guaranteed to be focused, and the problem of discontinuous characters is solved.
In some embodiments, before the step of mapping the total probability value into a score according to a preset rule and taking the score as the score of the first-level command word, the method further includes:
normalizing the total probability value to obtain a normalized total probability value.
In the embodiment of the application, the total probability value refers to the probability of which command is identified under the input of the voice, normalization of the total probability value is the correction of the total probability value, and the length factors of different command words and the length factors of the decoding matrix are required to be considered during specific normalization, so that a more accurate total probability value can be obtained, and further a more accurate score can be obtained.
In some embodiments, the step of normalizing the total probability value to obtain a normalized total probability value includes:
calculating a normalized length according to the formula norm len = a-b-c; wherein norm len represents the normalized length, a represents the length of the decoded time interval, b represents the accumulated value of the blank probabilities within the length of the time interval, c represents the length of the wildcard
According to formula p Total = exp (log p (y|x)/norm) calculates the normalized total probability value; where x represents the speech signal to be recognized, y represents the phoneme sequence, and p (y|x) represents the total probability value.
In the embodiment of the application, the normalization of the total probability value is realized according to the above formula, and the more accurate total probability value can be obtained due to the consideration of the lengths of different command words (namely dividing norm) and the length factors of the decoding matrix, so that more accurate score can be obtained. The command words of different lengths have different lengths, and even if the same length or the same command word is decoded, the length of the decoding window and the length of the blank accumulated sum are related.
In some embodiments, the second level voice command word score is calculated according to the following steps:
s121, inserting blank symbols in the middle of each word of the second-level command word and in front of and behind the second-level command word to obtain a changed second-level command word;
s122, converting the changed second level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
s123, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm according to the phoneme probability matrix to obtain a total probability value;
and S124, mapping the total probability value into a score according to a preset rule, and taking the score as the score of the second-level command word.
In the embodiment of the present application, since the second level command word may have a discontinuous font, the second level command word is calculated by using the same method as the method for calculating the score of the first level voice command word, and the method is the same, so that the embodiment of the present application will not be repeated. However, it should be noted that, in step S122, the phoneme sequence is not the phoneme sequence corresponding to the first level command word, and in order to facilitate distinction, the phoneme sequence in step S122, that is, the phoneme sequence corresponding to the second level command word, is used as the second phoneme sequence, the phoneme sequence corresponding to the first level command word is used as the first phoneme sequence, the total probability value corresponding to the first level command word is used as the first total probability value, and the phoneme sequence corresponding to the second level command word is used as the second total probability value.
In some embodiments, the step of calculating, according to the phoneme probability matrix, a sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm to obtain a total probability value includes:
and decoding by adopting a forward algorithm according to the phoneme probability matrix, setting the probability of the wildcard symbol to be 1 when the decoding walks to the wildcard symbol, and continuing the forward algorithm until the sum of the probabilities of all paths corresponding to the phoneme sequence is obtained, so as to obtain a total probability value.
In some embodiments, before the step of mapping the total probability value to a score according to a preset rule and taking the score as the score of the second level command word, the method further includes:
normalizing the total probability value to obtain a normalized total probability value.
Note that the total probability value in this embodiment refers to the total probability value of the second level command word.
In some embodiments, the step of normalizing the total probability value to obtain a normalized total probability value includes:
calculating a normalized length according to the formula norm len = a-b-c; wherein norm len represents the normalized length, a represents the length of the decoded time interval, b represents the accumulated value of the blank probabilities within the length of the time interval, c represents the length of the wildcard
According to formula p Total = exp (log p (y|x)/norm) calculates the normalized total probability value; where x represents the speech signal to be recognized, y represents the phoneme sequence, and p (y|x) represents the total probability value.
Note that the total probability value in this embodiment refers to the total probability value of the second level command word.
In one embodiment, the predetermined first level command word and/or the second level command word includes a command word of a discontinuous font.
In the embodiment of the application, the command words of the discontinuous fonts refer to the command words of the discontinuous fonts such as a first gear, a second gear and the like.
Referring to fig. 2, the embodiment of the application further provides a voice command word recognition device, which includes:
a first obtaining module 1, configured to obtain a speech signal to be identified;
the feature extraction module 2 is used for extracting features of the voice signal to be identified and generating a feature vector;
the input module 3 is used for inputting the characteristic vector characteristics into an acoustic model to obtain a phoneme probability matrix;
the second obtaining module 4 is used for obtaining a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;
a first calculation module 5, configured to calculate a score of each first-level command word according to the phoneme probability matrix;
a first determining module 6, configured to determine a first level command word with the highest score according to the score of each first level command word;
the first judging module 7 is used for judging whether the first-level command word with the highest score is larger than a preset first threshold value;
the first obtaining module 1 is further configured to re-obtain a voice signal for recognition if not;
The second judging module 8 is configured to judge whether the first level command word with the highest score has a corresponding second level command word if yes; wherein the second level command word is a specific command word of the first level command word;
the voice command word recognition result determining module 9 is configured to use the first-level command word with the highest score as a voice command word recognition result if the first-level command word with the highest score does not have the corresponding second-level command word;
a third obtaining module 10, configured to obtain, if the first level command word with the highest score has a corresponding second level command word, each second level command word corresponding to the first level command word with the highest score as a second level command word to be calculated;
a second calculation module 11, configured to calculate a score of each second level command word to be calculated according to the phoneme probability matrix;
a second determining module 12, configured to determine a second level command word with the highest score according to the score of each second level command word;
a third judging module 13, configured to judge whether the second level command word with the highest score is greater than a set second threshold;
the voice command word recognition result determining module 9 is further configured to, if yes, use the second level command word with the highest score as a voice command word recognition result;
The first obtaining module 1 is further configured to re-obtain the voice signal for recognition if not.
In some embodiments, the score of the first level command word is calculated according to the following steps:
inserting blank symbols in the middle of each word of the first-level command word and in front of and behind the first-level command word to obtain a changed first-level command word;
converting the changed first-level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
according to the phoneme probability matrix, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm to obtain a total probability value;
and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the first-level command word.
In some embodiments, the step of calculating the sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm according to the phoneme probability matrix to obtain a total probability value includes:
and decoding by adopting a forward algorithm according to the phoneme probability matrix, setting the probability of the wildcard symbol to be 1 when the decoding walks to the wildcard symbol, and continuing the forward algorithm until the sum of the probabilities of all paths corresponding to the phoneme sequence is obtained, so as to obtain a total probability value.
In some embodiments, mapping the total probability value into a score according to a preset rule, and before using the score as the score of the first level command word, further includes:
normalizing the total probability value to obtain a normalized total probability value.
Note that the total probability value in this embodiment refers to the total probability value of the first-level command word.
In some embodiments, normalizing the total probability value, and obtaining the normalized total probability value includes:
calculating a normalized length according to the formula norm len = a-b-c; wherein norm len represents the normalized length, a represents the length of the decoded time interval, b represents the accumulated value of the blank probabilities within the length of the time interval, c represents the length of the wildcard
According to formula p Total = exp (log p (y|x)/norm) calculates the normalized total probability value; where x represents the speech signal to be recognized, y represents the phoneme sequence, and p (y|x) represents the total probability value.
Note that the total probability value in this embodiment refers to the total probability value of the first-level command word.
In some embodiments, the second level voice command word score is calculated according to the following steps:
Inserting blank symbols in the middle of each word of the second-level command word and in front of and behind the second-level command word to obtain a changed second-level command word;
converting the changed second level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
according to the phoneme probability matrix, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm to obtain a total probability value;
and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the second-level command word.
In some embodiments, the step of calculating the sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm according to the phoneme probability matrix to obtain a total probability value includes:
and decoding by adopting a forward algorithm according to the phoneme probability matrix, setting the probability of the wildcard symbol to be 1 when the decoding walks to the wildcard symbol, and continuing the forward algorithm until the sum of the probabilities of all paths corresponding to the phoneme sequence is obtained, so as to obtain a total probability value.
Note that the total probability value in this embodiment refers to the total probability value of the second level command word.
In some embodiments, before the step of mapping the total probability value to a score according to a preset rule and taking the score as the score of the second level command word, the method further includes:
normalizing the total probability value to obtain a normalized total probability value.
Note that the total probability value in this embodiment refers to the total probability value of the second level command word.
In some embodiments, the step of normalizing the total probability value to obtain a normalized total probability value includes:
calculating a normalized length according to the formula norm len = a-b-c; wherein norm len represents the normalized length, a represents the length of the decoded time interval, b represents the accumulated value of the blank probabilities within the length of the time interval, c represents the length of the wildcard
According to formula p Total = exp (log p (y|x)/norm) calculates the normalized total probability value; where x represents the speech signal to be recognized, y represents the phoneme sequence, and p (y|x) represents the total probability value.
Note that the total probability value in this embodiment refers to the total probability value of the second level command word.
Referring to fig. 3, an embodiment of the present application further provides a computer device, and an internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating device, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing a voice command word recognition method and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. Further, the above-mentioned computer apparatus may be further provided with an input device, a display screen, and the like. The computer program is executed by a processor to realize a voice command word recognition method, and comprises the following steps: acquiring a voice signal to be recognized; extracting the characteristics of the voice signal to be identified to generate a characteristic vector; inputting the characteristic vector features into an acoustic model to obtain a phoneme probability matrix; acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words; calculating the score of each first-level command word according to the phoneme probability matrix; determining the first-level command words with highest scores according to the scores of the first-level command words; judging whether the first-level command word with the highest score is larger than a preset first threshold value or not; if not, re-acquiring the voice signal for recognition; if yes, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word; if the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result; if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated; calculating the score of each second-level command word to be calculated according to the phoneme probability matrix; determining the second-level command words with highest scores according to the scores of the second-level command words; judging whether the second level command word with the highest score is larger than a set second threshold value or not; if yes, taking the second level command word with the highest score as a voice command word recognition result; if not, the voice signal is acquired again for recognition. It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.
An embodiment of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a voice command word recognition method, comprising the steps of: acquiring a voice signal to be recognized; extracting the characteristics of the voice signal to be identified to generate a characteristic vector; inputting the characteristic vector features into an acoustic model to obtain a phoneme probability matrix; acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words; calculating the score of each first-level command word according to the phoneme probability matrix; determining the first-level command words with highest scores according to the scores of the first-level command words; judging whether the first-level command word with the highest score is larger than a preset first threshold value or not; if not, re-acquiring the voice signal for recognition; if yes, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word; if the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result; if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated; calculating the score of each second-level command word to be calculated according to the phoneme probability matrix; determining the second-level command words with highest scores according to the scores of the second-level command words; judging whether the second level command word with the highest score is larger than a set second threshold value or not; if yes, taking the second level command word with the highest score as a voice command word recognition result; if not, the voice signal is acquired again for recognition. It is understood that the computer readable storage medium in this embodiment may be a volatile readable storage medium or a nonvolatile readable storage medium.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the application.

Claims (10)

1. A method for recognizing a voice command word, the method comprising:
acquiring a voice signal to be recognized;
extracting the characteristics of the voice signal to be identified to generate a characteristic vector;
Inputting the characteristic vector features into an acoustic model to obtain a phoneme probability matrix;
acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;
calculating the score of each first-level command word according to the phoneme probability matrix;
determining the first-level command words with highest scores according to the scores of the first-level command words;
judging whether the first-level command word with the highest score is larger than a preset first threshold value or not;
if not, re-acquiring the voice signal for recognition;
if yes, judging whether the first-level command word with the highest score has a corresponding second-level command word; wherein the second level command word is a specific command word of the first level command word;
if the first-level command word with the highest score does not have the corresponding second-level command word, taking the first-level command word with the highest score as a voice command word recognition result;
if the first-level command word with the highest score has a corresponding second-level command word, acquiring each second-level command word corresponding to the first-level command word with the highest score as a second-level command word to be calculated;
Calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;
determining the second-level command words with highest scores according to the scores of the second-level command words;
judging whether the second level command word with the highest score is larger than a set second threshold value or not;
if yes, taking the second level command word with the highest score as a voice command word recognition result;
if not, the voice signal is acquired again for recognition.
2. The voice command word recognition method according to claim 1, wherein the score of the first level command word is calculated according to the steps of:
inserting blank symbols in the middle of each word of the first-level command word and in front of and behind the first-level command word to obtain a changed first-level command word;
converting the changed first-level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
according to the phoneme probability matrix, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm to obtain a total probability value;
and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the first-level command word.
3. The method for recognizing speech command words according to claim 2, wherein the step of calculating the sum of probabilities of all paths corresponding to the phoneme sequence by using a forward algorithm based on the phoneme probability matrix to obtain a total probability value comprises:
and decoding by adopting a forward algorithm according to the phoneme probability matrix, setting the probability of the wildcard symbol to be 1 when the decoding walks to the wildcard symbol, and continuing the forward algorithm until the sum of the probabilities of all paths corresponding to the phoneme sequence is obtained, so as to obtain a total probability value.
4. The voice command word recognition method according to claim 2, wherein the step of mapping the total probability value into a score according to a predetermined rule, and using the score as the score of the first-level command word, further comprises:
normalizing the total probability value to obtain a normalized total probability value.
5. The method of claim 4, wherein normalizing the total probability value to obtain a normalized total probability value comprises:
calculating a normalized length according to the formula norm len = a-b-c; wherein norm len represents the normalized length, a represents the length of the decoded time interval, b represents the accumulated value of the blank probabilities within the length of the time interval, c represents the length of the wildcard
According to formula p Total = exp (log p (y|x)/norm) calculates the normalized total probability value; where x represents the speech signal to be recognized, y represents the phoneme sequence, and p (y|x) represents the total probability value.
6. The method of claim 1, wherein the second level of speech command word score is calculated according to the steps of:
inserting blank symbols in the middle of each word of the second-level command word and in front of and behind the second-level command word to obtain a changed second-level command word;
converting the changed second level command word into a phoneme sequence according to the relation between the words and the phonemes in the dictionary;
according to the phoneme probability matrix, calculating the probability sum of all paths corresponding to the phoneme sequence by adopting a forward algorithm to obtain a total probability value;
and mapping the total probability value into a score according to a preset rule, and taking the score as the score of the second-level command word.
7. The voice command word recognition method according to claim 1, wherein the preset first level command word and/or the second level command word includes a command word of a discontinuous font.
8. A voice command word recognition apparatus, comprising:
the first acquisition module is used for acquiring a voice signal to be recognized;
the feature extraction module is used for extracting features of the voice signal to be identified and generating feature vectors;
the input module is used for inputting the characteristic vector characteristics into the acoustic model to obtain a phoneme probability matrix;
the second acquisition module is used for acquiring a preset first-level command word list; wherein the first-level command word list comprises a plurality of first-level command words;
the first calculation module is used for calculating the score of each first-level command word according to the phoneme probability matrix;
the first determining module is used for determining the first-level command words with highest scores according to the scores of the first-level command words;
the first judging module is used for judging whether the first-level command word with the highest score is larger than a preset first threshold value or not;
the first acquisition module is further used for acquiring the voice signal again for recognition if not;
the second judging module is used for judging whether the first-level command word with the highest score has a corresponding second-level command word or not if yes; wherein the second level command word is a specific command word of the first level command word;
The voice command word recognition result determining module is used for taking the first-level command word with the highest score as a voice command word recognition result if the first-level command word with the highest score does not have the corresponding second-level command word;
the third acquisition module is used for acquiring each second level command word corresponding to the first level command word with the highest score as a second level command word to be calculated if the first level command word with the highest score has the corresponding second level command word;
the second calculation module is used for calculating the score of each second-level command word to be calculated according to the phoneme probability matrix;
the second determining module is used for determining the second-level command words with highest scores according to the scores of the second-level command words;
the third judging module is used for judging whether the second level command word with the highest score is larger than a set second threshold value or not;
the voice command word recognition result determining module is further used for taking the second-level command word with the highest score as a voice command word recognition result if the voice command word recognition result determining module is used for determining the voice command word recognition result;
the first acquisition module is further configured to reacquire the voice signal for recognition if not.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the speech command word recognition method according to any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the speech command word recognition method according to any of claims 1 to 7.
CN202311077037.XA 2023-08-25 2023-08-25 Voice command word recognition method, device, equipment and medium Active CN116825108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311077037.XA CN116825108B (en) 2023-08-25 2023-08-25 Voice command word recognition method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311077037.XA CN116825108B (en) 2023-08-25 2023-08-25 Voice command word recognition method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN116825108A true CN116825108A (en) 2023-09-29
CN116825108B CN116825108B (en) 2023-12-08

Family

ID=88118737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311077037.XA Active CN116825108B (en) 2023-08-25 2023-08-25 Voice command word recognition method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116825108B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118098206A (en) * 2024-04-18 2024-05-28 深圳市友杰智新科技有限公司 Command word score calculating method, device, equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8914286B1 (en) * 2011-04-14 2014-12-16 Canyon IP Holdings, LLC Speech recognition with hierarchical networks
CN105931639A (en) * 2016-05-31 2016-09-07 杨若冲 Speech interaction method capable of supporting multi-hierarchy command words
JP2018087935A (en) * 2016-11-30 2018-06-07 日本電信電話株式会社 Voice language identification device, and method and program thereof
CN110232912A (en) * 2018-03-06 2019-09-13 通用汽车环球科技运作有限责任公司 Speech recognition arbitrated logic
CN113744736A (en) * 2021-09-08 2021-12-03 北京声智科技有限公司 Command word recognition method and device, electronic equipment and storage medium
CN113889098A (en) * 2021-11-11 2022-01-04 厦门亿联网络技术股份有限公司 Command word recognition method and device, mobile terminal and readable storage medium
CN115148197A (en) * 2021-03-31 2022-10-04 华为技术有限公司 Voice wake-up method, device, storage medium and system
CN116189677A (en) * 2023-02-28 2023-05-30 苏州奇梦者科技有限公司 Method, system, equipment and storage medium for identifying multi-model voice command words

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8914286B1 (en) * 2011-04-14 2014-12-16 Canyon IP Holdings, LLC Speech recognition with hierarchical networks
CN105931639A (en) * 2016-05-31 2016-09-07 杨若冲 Speech interaction method capable of supporting multi-hierarchy command words
JP2018087935A (en) * 2016-11-30 2018-06-07 日本電信電話株式会社 Voice language identification device, and method and program thereof
CN110232912A (en) * 2018-03-06 2019-09-13 通用汽车环球科技运作有限责任公司 Speech recognition arbitrated logic
CN115148197A (en) * 2021-03-31 2022-10-04 华为技术有限公司 Voice wake-up method, device, storage medium and system
CN113744736A (en) * 2021-09-08 2021-12-03 北京声智科技有限公司 Command word recognition method and device, electronic equipment and storage medium
CN113889098A (en) * 2021-11-11 2022-01-04 厦门亿联网络技术股份有限公司 Command word recognition method and device, mobile terminal and readable storage medium
CN116189677A (en) * 2023-02-28 2023-05-30 苏州奇梦者科技有限公司 Method, system, equipment and storage medium for identifying multi-model voice command words

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118098206A (en) * 2024-04-18 2024-05-28 深圳市友杰智新科技有限公司 Command word score calculating method, device, equipment and medium

Also Published As

Publication number Publication date
CN116825108B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
CN108564940B (en) Speech recognition method, server and computer-readable storage medium
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
US11030998B2 (en) Acoustic model training method, speech recognition method, apparatus, device and medium
US8972260B2 (en) Speech recognition using multiple language models
KR100655491B1 (en) Two stage utterance verification method and device of speech recognition system
CN103971685B (en) Method and system for recognizing voice commands
CN110610707B (en) Voice keyword recognition method and device, electronic equipment and storage medium
CN116825108B (en) Voice command word recognition method, device, equipment and medium
CN109272991B (en) Voice interaction method, device, equipment and computer-readable storage medium
EP3989217A1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN110019741B (en) Question-answering system answer matching method, device, equipment and readable storage medium
CN110428853A (en) Voice activity detection method, Voice activity detection device and electronic equipment
CN112233651B (en) Dialect type determining method, device, equipment and storage medium
CN116825109B (en) Processing method, device, equipment and medium for voice command misrecognition
CN111145748B (en) Audio recognition confidence determining method, device, equipment and storage medium
CN115831100B (en) Voice command word recognition method, device, equipment and storage medium
WO2020073839A1 (en) Voice wake-up method, apparatus and system, and electronic device
CN111369992A (en) Instruction execution method and device, storage medium and electronic equipment
CN111696524B (en) Character-overlapping voice recognition method and system
CN115910049A (en) Voice control method and system based on voiceprint, electronic device and storage medium
US9928832B2 (en) Method and apparatus for classifying lexical stress
CN112802476B (en) Speech recognition method and device, server and computer readable storage medium
CN115881097B (en) Speech recognition result confirmation method and device, computer equipment and storage medium
CN116030799B (en) Audio recognition model training method, device, computer equipment and storage medium
KR102392992B1 (en) User interfacing device and method for setting wake-up word activating speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant