CN110060662B - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
CN110060662B
CN110060662B CN201910293280.2A CN201910293280A CN110060662B CN 110060662 B CN110060662 B CN 110060662B CN 201910293280 A CN201910293280 A CN 201910293280A CN 110060662 B CN110060662 B CN 110060662B
Authority
CN
China
Prior art keywords
voice
recognition
mode
belongs
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910293280.2A
Other languages
Chinese (zh)
Other versions
CN110060662A (en
Inventor
马赛
杜念冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910293280.2A priority Critical patent/CN110060662B/en
Publication of CN110060662A publication Critical patent/CN110060662A/en
Application granted granted Critical
Publication of CN110060662B publication Critical patent/CN110060662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a voice recognition method and a voice recognition device, wherein the method comprises the following steps: obtaining the voice to be recognized and parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information; extracting a feature vector corresponding to the voice; determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number; if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result; and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result. Because the internal and external noise information, the direction information and the like are collected, the accuracy of voice recognition is improved, the voice in the music field can be automatically and accurately recognized in a single awakening multi-recognition mode, and the follow-up high-quality music service is conveniently provided for users.

Description

Voice recognition method and device
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a speech recognition method and apparatus.
Background
With the development of artificial intelligence technology, more and more conversational artificial intelligence products emerge. Taking a conversational artificial intelligence product as an example of an intelligent sound box, a voice awakening technology and a voice recognition technology are core technologies of the intelligent sound box, and interaction experience of a user and the intelligent sound box is directly influenced.
In an actual application scene, the smart sound box faces a complex environment, for example, various internal and external noises (such as sound of an indoor television program, outdoor car sound, and the like), speaking sounds of other people except for a user, stumbled speaking modes, and the like can interfere with a voice recognition effect, and parameters such as internal and external noise information and direction information are not collected when voice is collected at present, so that a voice recognition result is prone to deviation, the accuracy is poor, and the interaction effect is influenced.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present invention is to provide a speech recognition method, which is used to solve the problems in the prior art that the speech recognition result is prone to have deviation, the accuracy is poor, and the interaction effect is affected.
A second object of the present invention is to provide a speech recognition apparatus.
A third object of the present invention is to propose another speech recognition apparatus.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a speech recognition method, including:
acquiring voice to be recognized and parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information;
extracting a feature vector corresponding to the voice;
determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number;
if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result;
and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result.
Further, the determining whether the voice belongs to a non-primary voice in a single wake-up multiple recognition mode according to the current mode and the recognition serial number includes:
judging whether the current mode is a single-time awakening multi-time identification mode;
and if the current mode is a single-time awakening multi-time recognition mode, determining whether the voice is non-primary voice according to the recognition serial number.
Further, the obtaining a speech recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the speech belongs to the music field according to the acoustic judgment result and the semantic judgment result includes:
inputting the internal and external noise information, the azimuth information and the feature vector into an acoustic recognition model to obtain the voice recognition result and an acoustic judgment result;
determining whether the voice belongs to the music field acoustically according to the acoustic judgment result;
if the voice belongs to the music field acoustically, inputting the voice recognition result, the internal and external noise information and the orientation information into a semantic recognition model to obtain the semantic judgment result;
determining whether the voice belongs to the music field semantically according to the semantic judgment result;
and if the voice semantically belongs to the music field, determining that the voice belongs to the music field.
Further, the obtaining a speech recognition result, an acoustic determination result, and a semantic determination result according to the parameter information and the feature vector, and determining whether the speech belongs to the music field according to the acoustic determination result and the semantic determination result further includes:
and if the voice does not belong to the music field acoustically or semantically, determining that the voice does not belong to the music field.
Further, the method further comprises:
if the current mode is a single-time awakening single recognition mode or a passenger mode, or the voice belongs to the first voice in the single-time awakening multi-time recognition mode, acquiring a voice recognition result according to the parameter information and the feature vector;
and determining corresponding instructions and/or resources according to the voice recognition result.
Further, after determining the instruction and/or resource corresponding to the voice according to the voice recognition result, the method further includes:
executing the instructions, and/or providing the resources to a user of the smart sound box.
The voice recognition method of the embodiment of the invention obtains the voice to be recognized and the parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information; extracting a feature vector corresponding to the voice; determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number; if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result; and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result. Because the internal and external noise information, the direction information and the like are collected, the accuracy of voice recognition is improved, the voice in the music field can be automatically and accurately recognized in a single awakening multi-recognition mode, and the follow-up high-quality music service is conveniently provided for users.
In order to achieve the above object, a second embodiment of the present invention provides a speech recognition apparatus, including:
the acquisition module is used for acquiring the voice to be recognized and the parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information;
the extraction module is used for extracting the feature vector corresponding to the voice;
the determining module is used for determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode according to the current mode and the recognition serial number;
the determining module is further configured to, when the speech belongs to a non-primary speech in a single-wake multi-recognition mode, obtain a speech recognition result, an acoustic determination result, and a semantic determination result according to the parameter information and the feature vector, and determine whether the speech belongs to the music field according to the acoustic determination result and the semantic determination result;
and the determining module is further used for determining the instruction and/or the resource corresponding to the voice according to the voice recognition result when the voice belongs to the music field.
Further, the determining module is specifically configured to,
judging whether the current mode is a single-time awakening multi-time identification mode;
and if the current mode is a single-time awakening multi-time recognition mode, determining whether the voice is non-primary voice according to the recognition serial number.
Further, the determining module is specifically configured to,
inputting the internal and external noise information, the azimuth information and the feature vector into an acoustic recognition model to obtain the voice recognition result and an acoustic judgment result;
determining whether the voice belongs to the music field acoustically according to the acoustic judgment result;
if the voice belongs to the music field acoustically, inputting the voice recognition result, the internal and external noise information and the orientation information into a semantic recognition model to obtain the semantic judgment result;
determining whether the voice belongs to the music field semantically according to the semantic judgment result;
and if the voice semantically belongs to the music field, determining that the voice belongs to the music field.
Further, the determining module is specifically further configured to,
and if the voice does not belong to the music field acoustically or semantically, determining that the voice does not belong to the music field.
Further, the obtaining module is further configured to obtain a voice recognition result according to the parameter information and the feature vector when the current mode is a single-time awakening single recognition mode or a guest mode, or the voice belongs to a first voice in a single-time awakening multiple recognition mode;
the determining module is further configured to determine a corresponding instruction and/or resource according to the voice recognition result.
Further, the apparatus further comprises: and the execution module is used for executing the instruction and/or providing the resource for a user of the intelligent sound box.
The voice recognition device of the embodiment of the invention obtains the voice to be recognized and the parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information; extracting a feature vector corresponding to the voice; determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number; if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result; and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result. Because the internal and external noise information, the direction information and the like are collected, the accuracy of voice recognition is improved, the voice in the music field can be automatically and accurately recognized in a single awakening multi-recognition mode, and the follow-up high-quality music service is conveniently provided for users.
In order to achieve the above object, a third embodiment of the present invention provides another speech recognition apparatus, including: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the speech recognition method as described above when executing the program.
In order to achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the speech recognition method as described above.
In order to achieve the above object, a fifth embodiment of the present invention provides a computer program product, which when executed by an instruction processor in the computer program product, implements the speech recognition method as described above.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another speech recognition method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another speech recognition apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
A speech recognition method and apparatus according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 1 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention. As shown in fig. 1, the speech recognition method includes the steps of:
s101, voice to be recognized and parameter information are obtained.
The execution main body of the voice recognition method provided by the invention is a voice recognition device, and the voice recognition device can be hardware equipment such as terminal equipment and a server, or software installed on the hardware equipment. The voice recognition device has a conversational artificial intelligence function, and has basic functions of weather check, information, audio listening, alarm setting, dressing guide, traffic, stock market quotation and the like besides controlling the intelligent household equipment by performing voice interaction with a user.
In this embodiment, after the voice recognition device is awakened, the user can talk with the voice recognition device. Generally, a voice recognition device receives a first voice as a voice of a user; and for received non-first speech, it may not be the user's speech.
Taking a voice recognition device as an example of an intelligent sound box, in an actual application scenario, the intelligent sound box faces a complex environment, for example, various internal and external noises (such as sound of an indoor television program, outdoor car sound, etc.), speaking sounds of other people except for a user, and stumbled speaking modes can interfere with a voice processing effect. Generally, after the user wakes up the smart speaker, the first voice received by the smart speaker can be understood as the user voice reflecting the real needs of the user. For example, a user wants to listen to a song, and immediately says that "i want to listen to a song that i listen to frequently" after waking up the smart speaker, the smart speaker plays the corresponding song to meet the user's requirement. After the smart speaker plays the corresponding song, if the user turns on the television to watch the television program, the second voice received by the smart speaker may be the sound of the television program, or the second voice is still the voice of the user, but the second voice is doped with a lot of noise such as the sound of the television program, and at this time, the voice to be recognized received by the smart speaker needs to be discriminated.
In this embodiment, the parameter information of the speech to be recognized includes the current mode, the recognition sequence number, the internal and external noise information, and the orientation information of the speech, and by acquiring the internal and external noise information, the orientation information, and other information, the accuracy of the speech recognition can be improved, and the interactive experience can be improved.
In this embodiment, the current mode is any one of the following modes, but not limited to, a single-wake single-recognition mode, a single-wake multi-recognition mode, and a super guest mode.
Wherein, the single-time awakening single-time identification mode can be understood as: when a user interacts with the voice recognition device, the user needs to speak the voice containing the awakening word to awaken the voice recognition device and then speak the interactive voice. Taking the voice recognition device as an intelligent sound box as an example, the voice recognition device activates the single-time awakening recognition mode, and the voice containing the awakening words is in a small degree and small degree mode, and the single-time awakening recognition mode is activated. The first time the user wants to check the weather, the user needs to say ' small, small ' first, activate the single awakening single recognition mode ', and the intelligent sound box enters the sleep mode after executing the weather checking task. When the user checks the stock next time, the user still needs to speak the 'small degree (awakening word) first and activate the single awakening single recognition mode' to awaken the smart sound box.
Wherein, the single wake-up multiple recognition mode can be understood as: the user can speak the voice to the voice recognition device a plurality of times after speaking the "voice including the wakeup word" to wake up the voice recognition device each time the user interacts with the voice recognition device. Taking the voice recognition device as an intelligent sound box as an example, the voice recognition device activates the single-wake-up multiple-recognition mode, and the voice containing the wake-up word is the small degree (wake-up word) and activates the single-wake-up multiple-recognition mode. The first time the user wants to check weather, the second time the user wants to listen to songs, and the third time the user wants to check stocks, when the user interacts with the voice recognition device for three times, the voice recognition device is always in an awakening mode.
The jike mode can be understood as that when a user interacts with the voice recognition device every time, the user can have continuous conversations with the voice recognition device for many times in a short time without speaking voice containing awakening words every time. Taking the voice recognition device as an intelligent sound box as an example, the voice including the wake-up word in the active guest mode is "small degree (wake-up word), and the active guest mode is activated". The first time the user wants to check weather, the second time the user wants to listen to songs, and the third time the user wants to check stocks, when the user interacts with the voice recognition device for three times, the voice recognition device is always in an awakening mode. The difference between the extremely-guest mode and the single-wake-up multi-recognition mode is that the single-wake-up multi-recognition mode can continuously collect a plurality of voices, and other voices cannot be collected before the extremely-guest mode completes processing of a certain voice.
In this embodiment, the identification sequence number may be understood as representing that the received voice to be identified is the voice of the second time after the voice recognition apparatus wakes up. For example, the recognition serial number is 1, which represents that the voice to be recognized is the first voice; the recognition serial number is 2, and the voice to be recognized is represented as a second voice; the recognition serial number is 3, which represents that the voice to be recognized is the third voice. In this embodiment, the internal and external noise information may be understood as the environmental noise of the environment where the speech recognition apparatus is located. Taking the voice recognition device as an example of the intelligent sound box, when the intelligent sound box is placed in a living room, a bedroom, a kitchen and the like, the intelligent sound box can recognize the environmental noise of the living room, the bedroom, the kitchen and the like firstly, and when the subsequent voice to be recognized is processed, the voice to be recognized can be subjected to noise processing according to the internal and external noise information, so that the recognition accuracy of the voice is improved.
In this embodiment, the azimuth information may be understood as sound source position information, and the speech recognition device may recognize the sound source position information of the speech to be recognized by using a sound source localization technology, so as to improve the speech recognition accuracy.
It should be noted that the parameters of the speech to be recognized are not limited to the current mode, the recognition serial number of the speech, the internal and external noise information, and the orientation information, for example, the parameters of the speech to be recognized may also include a recognition identifier. The identification mark may be understood as the number of activations used to characterize the current mode. Taking the current mode as a single-wake-up multi-recognition mode as an example, the recognition identifier is 1, and the representation that the single-wake-up multi-recognition mode is activated once. The identification mark is 2, which represents that the single-time awakening multi-time identification mode is activated twice. The identification mark is 3, which represents that the identification mode is activated three times in a single awakening and multiple times.
And S102, extracting the feature vector corresponding to the voice.
In this embodiment, the feature vector of the speech to be recognized is the basis of the speech recognition. Specifically, feature vectors of the speech to be recognized are obtained by performing feature extraction on the speech to be recognized. The method for extracting the features of the speech to be recognized can be any one of the following methods: linear Prediction analysis (LPC), Perceptual Linear Prediction Coefficients (PLP), Linear Prediction Cepstrum Coefficients (LPCC), Mel-Frequency Cepstrum Coefficients (MFCC), but is not limited to the above feature extraction method.
S103, determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number.
In this embodiment, the process of the voice recognition device executing step S103 may specifically be to determine whether the current mode is a single-wake-up multiple-recognition mode; and if the current mode is a single-time awakening multi-time recognition mode, determining whether the voice is non-primary voice according to the recognition serial number.
It should be noted that if the current mode is a single-time awakening single recognition mode or a visitor mode, or the voice belongs to a first voice in a single-time awakening multi-time recognition mode, a voice recognition result is obtained according to the parameter information and the feature vector; and determining corresponding instructions and/or resources according to the voice recognition result.
In this embodiment, the instruction and/or resource corresponding to the voice is set according to the actual situation. For example, if the voice to be recognized is "turn on the air conditioner", the smart sound box execution instruction is an air conditioner starting instruction; and if the voice to be recognized is 'on light', the intelligent sound box execution instruction is a light-on instruction. If the voice to be recognized is 'want to listen to a song of a singer', the intelligent sound box executes the song of which the resource is the singer.
S104, if the voice belongs to non-primary voice in a single-awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result.
In this embodiment, when the current mode of the speech recognition apparatus is a single-wake-up multiple-recognition mode, the received non-primary speech is discriminated, and only when the acoustically and semantically affiliated fields of the non-primary speech are consistent, the non-primary speech is considered to be the speech of the user, and it is stated that the user wants to obtain the service in the field again.
In this embodiment, according to the parameter information and the feature vector of the speech to be recognized, the speech recognition result, the acoustic determination result, and the semantic determination result may be obtained, and then, according to the acoustic determination result and the semantic determination result, it is determined whether the speech to be recognized belongs to the music field.
The voice recognition result is obtained by performing voice recognition on the voice to be recognized.
The acoustic determination result is used for representing the field to which the acoustic feature of the voice to be recognized belongs, for example, the acoustic determination result is 1, the field to which the voice to be recognized belongs is a music field, the acoustic determination result is 2, the field to which the voice to be recognized belongs is a weather query field, the acoustic determination result is 3, the field to which the voice to be recognized belongs is a stock query field, the acoustic determination result is 4, and the field to which the voice to be recognized belongs is an intelligent home equipment control field.
The semantic judgment result is used for representing the field to which the semantic features of the voice to be recognized belong. For example, the semantic determination result is 1, the domain to which the speech to be recognized belongs is a music domain, the semantic determination result is 2, the domain to which the speech to be recognized belongs is a weather query domain, the semantic determination result is 3, the domain to which the speech to be recognized belongs is a stock query domain, the semantic determination result is 4, the domain to which the speech to be recognized belongs is an intelligent home equipment control domain, and the like.
And S105, if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result.
Further, after step S105, the method may further include the steps of: executing the instructions, and/or providing the resources to a user of the smart sound box.
In this embodiment, the instructions and/or resources corresponding to the voice are set according to the actual situation. For example, if the voice to be recognized is "the song sound is too small or too large", the smart sound box executes the instruction as a sound adjusting instruction; and if the voice to be recognized is 'change a song', the intelligent sound box execution instruction is a song switching instruction. If the voice to be recognized is 'want to listen to a song of a singer', the intelligent sound box executes the song of which the resource is the singer.
The voice recognition method of the embodiment of the invention obtains the voice to be recognized and the parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information; extracting a feature vector corresponding to the voice; determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number; if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result; and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result. Because the internal and external noise information, the direction information and the like are collected, the accuracy of voice recognition is improved, the voice in the music field can be automatically and accurately recognized in a single awakening multi-recognition mode, and the follow-up high-quality music service is conveniently provided for users.
Fig. 2 is a flowchart illustrating another speech recognition method according to an embodiment of the present invention. As shown in fig. 2, based on the embodiment shown in fig. 1, a specific implementation manner of "obtaining a speech recognition result, an acoustic determination result, and a semantic determination result according to the parameter information and the feature vector, and determining whether the speech belongs to the music field according to the acoustic determination result and the semantic determination result" includes the following steps:
s1041, inputting the internal and external noise information, the direction information and the feature vector into an acoustic recognition model, obtaining the voice recognition result and an acoustic judgment result, and executing the step S1042.
In this embodiment, a massive training sample is used to train any one of a Deep Neural Network (DNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and the like, so as to obtain an acoustic recognition model. It should be noted that, when the acoustic recognition model is trained, information such as internal and external noise information and orientation information is considered, so that the accuracy of speech recognition can be improved.
The training samples are feature vectors of known voice, internal and external noise information of the known voice, orientation information of the known voice and the field to which the known voice belongs acoustically. The known voice is, for example, "i want to listen to a happy song", and the known voice belongs acoustically to the field, for example, "the known voice belongs to the field of music". The known speech is, for example, "what is the weather today", and the known speech belongs acoustically to the field, for example, "the known speech belongs to the field of weather enquiry". The known speech is, for example, "how the stock market is today", and the known speech belongs acoustically to the field, for example, "the known speech belongs to the field of stock market inquiry". The known internal and external noise information of the voice can be obtained by the existing noise detection technology, and the known azimuth information of the voice can be obtained by the existing sound source positioning technology. For a specific model training method, more details are given in the related art, and are not further described here.
S1042, determining whether the voice belongs to the music field acoustically according to the acoustic judgment result, if so, executing step S1043, and if not, executing step S1046.
S1043, if the voice belongs to the music field acoustically, inputting the voice recognition result, the internal and external noise information and the direction information into a semantic recognition model, obtaining the semantic judgment result, and executing the step S1044.
In this embodiment, after determining that the speech to be recognized belongs to the music field according to the acoustic determination result, the semantic determination result of the speech to be recognized is obtained, and when determining that the speech to be recognized belongs to the music field according to the semantic determination result, the speech to be recognized is considered to belong to the music field.
In this embodiment, a semantic recognition model is obtained by training any one of a Deep Neural Network (DNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and the like with a large number of training samples. The training sample is a speech recognition result of known speech, internal and external noise information of the known speech, orientation information of the known speech and a field to which the known speech belongs semantically. The result of speech recognition of a known speech is, for example, "i want to listen to a happy song", the field to which the known speech semantically belongs is the music field; the speech recognition result of the known speech is, for example, "how much weather is today", and the field to which the known speech semantically belongs is the field of weather query; the speech recognition result of the known speech is, for example, "what is there today, and the field to which the known speech semantically belongs is the field of stock inquiry. The known internal and external noise information of the voice can be obtained by the existing noise detection technology, and the known azimuth information of the voice can be obtained by the existing sound source positioning technology. For a specific model training method, more details are given in the related art, and are not further described here.
S1044, determining whether the voice semantically belongs to the music field according to the semantic judgment result, if so, executing a step S1045, and if not, executing a step S1046.
S1045, if the voice semantically belongs to the music field, determining that the voice belongs to the music field.
S1046, if the voice does not belong to the music field acoustically or does not belong to the music field semantically, determining that the voice does not belong to the music field.
The voice recognition method of the embodiment of the invention obtains the voice to be recognized and the parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information; extracting a feature vector corresponding to the voice; determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number; if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result; and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result. Because the internal and external noise information, the direction information and the like are collected, the accuracy of voice recognition is improved, the voice in the music field can be automatically and accurately recognized in a single awakening multi-recognition mode, and the follow-up high-quality music service is conveniently provided for users.
Fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention. As shown in fig. 3, the method includes: an acquisition module 31, an extraction module 32, and a determination module 33.
An obtaining module 31, configured to obtain a voice to be recognized and parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information;
an extracting module 32, configured to extract a feature vector corresponding to the speech;
a determining module 33, configured to determine whether the voice belongs to a non-primary voice in a single wake-up multiple recognition mode according to the current mode and the recognition serial number;
the determining module 33 is further configured to, when the speech belongs to a non-primary speech in a single-wake multi-recognition mode, obtain a speech recognition result, an acoustic determination result, and a semantic determination result according to the parameter information and the feature vector, and determine whether the speech belongs to the music field according to the acoustic determination result and the semantic determination result;
the determining module 33 is further configured to determine, according to the speech recognition result, an instruction and/or a resource corresponding to the speech when the speech belongs to the music field.
Further, the determining module 33 is specifically configured to,
judging whether the current mode is a single-time awakening multi-time identification mode;
and if the current mode is a single-time awakening multi-time recognition mode, determining whether the voice is non-primary voice according to the recognition serial number.
Further, the determining module 33 is specifically configured to,
inputting the internal and external noise information, the azimuth information and the feature vector into an acoustic recognition model to obtain the voice recognition result and an acoustic judgment result;
determining whether the voice belongs to the music field acoustically according to the acoustic judgment result;
if the voice belongs to the music field acoustically, inputting the voice recognition result, the internal and external noise information and the orientation information into a semantic recognition model to obtain the semantic judgment result;
determining whether the voice belongs to the music field semantically according to the semantic judgment result;
and if the voice semantically belongs to the music field, determining that the voice belongs to the music field.
Further, the determining module 33 is specifically further configured to,
and if the voice does not belong to the music field acoustically or semantically, determining that the voice does not belong to the music field.
Further, the obtaining module 31 is further configured to obtain a speech recognition result according to the parameter information and the feature vector when the current mode is a single-time awakening single recognition mode or a guest mode, or the speech belongs to a first speech in a single-time awakening multiple recognition mode;
the determining module 33 is further configured to determine a corresponding instruction and/or resource according to the voice recognition result.
Further, the apparatus further comprises: and the execution module is used for executing the instruction and/or providing the resource for a user of the intelligent sound box.
It should be noted that the foregoing explanation of the embodiment of the speech recognition method is also applicable to the speech recognition apparatus of the embodiment, and is not repeated herein.
The voice recognition device of the embodiment of the invention obtains the voice to be recognized and the parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information; extracting a feature vector corresponding to the voice; determining whether the voice belongs to non-primary voice in a single awakening multi-recognition mode or not according to the current mode and the recognition serial number; if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result; and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result. Because the internal and external noise information, the direction information and the like are collected, the accuracy of voice recognition is improved, the voice in the music field can be automatically and accurately recognized in a single awakening multi-recognition mode, and the follow-up high-quality music service is conveniently provided for users.
Fig. 4 is a schematic structural diagram of another speech recognition apparatus according to an embodiment of the present invention. The speech recognition apparatus includes:
memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.
The processor 1002, when executing the program, implements the speech recognition method provided in the above-described embodiments.
Further, the speech recognition apparatus further includes:
a communication interface 1003 for communicating between the memory 1001 and the processor 1002.
A memory 1001 for storing computer programs that may be run on the processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (e.g., at least one disk memory).
The processor 1002 is configured to implement the speech recognition method according to the foregoing embodiment when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.
The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The invention also provides a non-transitory computer-readable storage medium on which a computer program is stored which, when executed by a processor, implements a speech recognition method as described above.
The invention also provides a computer program product, which when executed by an instruction processor in the computer program product, implements the speech recognition method as described above.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (14)

1. A speech recognition method, comprising:
acquiring voice to be recognized and parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information;
extracting a feature vector corresponding to the voice;
determining whether the voice belongs to non-first voice in a single awakening multi-recognition mode according to the current mode and the recognition serial number, wherein the single awakening multi-recognition mode is an interaction mode in which the user can speak voice for a plurality of times to a voice recognition device after speaking the voice containing awakening words to awaken the voice recognition device each time when interacting with the voice recognition device;
if the voice belongs to non-primary voice in a single awakening multi-recognition mode, acquiring a voice recognition result, an acoustic judgment result and a semantic judgment result according to the parameter information and the feature vector, and determining whether the voice belongs to the music field according to the acoustic judgment result and the semantic judgment result;
and if the voice belongs to the music field, determining the instruction and/or the resource corresponding to the voice according to the voice recognition result.
2. The method according to claim 1, wherein the determining whether the speech belongs to a non-first speech in a single wake-up multiple recognition mode according to the current mode and the recognition sequence number comprises:
judging whether the current mode is a single-time awakening multi-time identification mode;
and if the current mode is a single-time awakening multi-time recognition mode, determining whether the voice is non-primary voice according to the recognition serial number.
3. The method according to claim 1, wherein the obtaining a speech recognition result, an acoustic determination result, and a semantic determination result according to the parameter information and the feature vector, and determining whether the speech belongs to the music field according to the acoustic determination result and the semantic determination result comprises:
inputting the internal and external noise information, the azimuth information and the feature vector into an acoustic recognition model to obtain the voice recognition result and an acoustic judgment result;
determining whether the voice belongs to the music field acoustically according to the acoustic judgment result;
if the voice belongs to the music field acoustically, inputting the voice recognition result, the internal and external noise information and the orientation information into a semantic recognition model to obtain the semantic judgment result;
determining whether the voice belongs to the music field semantically according to the semantic judgment result;
and if the voice semantically belongs to the music field, determining that the voice belongs to the music field.
4. The method according to claim 3, wherein the obtaining a speech recognition result, an acoustic determination result, and a semantic determination result according to the parameter information and the feature vector, and determining whether the speech belongs to the music field according to the acoustic determination result and the semantic determination result further comprises:
and if the voice does not belong to the music field acoustically or semantically, determining that the voice does not belong to the music field.
5. The method of claim 1, further comprising:
if the current mode is a single awakening single recognition mode or a passenger mode, or the voice belongs to the first voice in a single awakening multi-recognition mode, acquiring a voice recognition result according to the parameter information and the feature vector, wherein the single awakening single recognition mode refers to an interactive mode in which a user needs to firstly speak a 'voice containing awakening words' to awaken a voice recognition device and then speak an interactive voice when interacting with the voice recognition device each time; the extremely guest mode is an interactive mode which can carry out continuous and repeated conversations with the voice recognition device in a short time without speaking the voice containing the awakening word each time when the user interacts with the voice recognition device each time;
and determining corresponding instructions and/or resources according to the voice recognition result.
6. The method according to claim 1, wherein after determining the instruction and/or resource corresponding to the speech according to the speech recognition result, further comprising:
executing the instructions, and/or providing the resources to a user of the smart sound box.
7. A speech recognition apparatus, comprising:
the acquisition module is used for acquiring the voice to be recognized and the parameter information; the parameter information includes: the current mode, the recognition serial number of the voice, the internal and external noise information and the direction information;
the extraction module is used for extracting the feature vector corresponding to the voice;
a determining module, configured to determine whether the voice belongs to a non-primary voice in a single wake-up multi-recognition mode according to the current mode and the recognition serial number, where the single wake-up multi-recognition mode is an interaction mode in which a user can speak a voice for multiple times to a voice recognition apparatus after speaking a "voice including a wake-up word" to wake up the voice recognition apparatus each time when interacting with the voice recognition apparatus;
the determining module is further configured to, when the speech belongs to a non-primary speech in a single-wake multi-recognition mode, obtain a speech recognition result, an acoustic determination result, and a semantic determination result according to the parameter information and the feature vector, and determine whether the speech belongs to the music field according to the acoustic determination result and the semantic determination result;
and the determining module is further used for determining the instruction and/or the resource corresponding to the voice according to the voice recognition result when the voice belongs to the music field.
8. The apparatus of claim 7, wherein the means for determining is configured to,
judging whether the current mode is a single-time awakening multi-time identification mode;
and if the current mode is a single-time awakening multi-time recognition mode, determining whether the voice is non-primary voice according to the recognition serial number.
9. The apparatus of claim 7, wherein the means for determining is configured to,
inputting the internal and external noise information, the azimuth information and the feature vector into an acoustic recognition model to obtain the voice recognition result and an acoustic judgment result;
determining whether the voice belongs to the music field acoustically according to the acoustic judgment result;
if the voice belongs to the music field acoustically, inputting the voice recognition result, the internal and external noise information and the orientation information into a semantic recognition model to obtain the semantic judgment result;
determining whether the voice belongs to the music field semantically according to the semantic judgment result;
and if the voice semantically belongs to the music field, determining that the voice belongs to the music field.
10. The apparatus of claim 9, wherein the determination module is further specifically configured to,
and if the voice does not belong to the music field acoustically or semantically, determining that the voice does not belong to the music field.
11. The apparatus of claim 7,
the obtaining module is further configured to obtain a voice recognition result according to the parameter information and the feature vector when the current mode is a single-awakening single recognition mode or a visitor mode, or the voice belongs to a first voice in a single-awakening multi-recognition mode, where the single-awakening single recognition mode is an interaction mode in which a user needs to speak a "voice including an awakening word" to awaken the voice recognition device and then speak an interactive voice each time the user interacts with the voice recognition device; the extremely guest mode is an interactive mode which can carry out continuous and repeated conversations with the voice recognition device in a short time without speaking the voice containing the awakening word each time when the user interacts with the voice recognition device each time;
the determining module is further configured to determine a corresponding instruction and/or resource according to the voice recognition result.
12. The apparatus of claim 7, further comprising: and the execution module is used for executing the instruction and/or providing the resource for a user of the intelligent sound box.
13. A speech recognition apparatus, comprising:
memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the speech recognition method according to any of claims 1-6 when executing the program.
14. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the speech recognition method according to any one of claims 1 to 6.
CN201910293280.2A 2019-04-12 2019-04-12 Voice recognition method and device Active CN110060662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910293280.2A CN110060662B (en) 2019-04-12 2019-04-12 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910293280.2A CN110060662B (en) 2019-04-12 2019-04-12 Voice recognition method and device

Publications (2)

Publication Number Publication Date
CN110060662A CN110060662A (en) 2019-07-26
CN110060662B true CN110060662B (en) 2021-02-23

Family

ID=67318947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910293280.2A Active CN110060662B (en) 2019-04-12 2019-04-12 Voice recognition method and device

Country Status (1)

Country Link
CN (1) CN110060662B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735394B (en) * 2020-12-16 2022-12-30 青岛海尔科技有限公司 Semantic parsing method and device for voice

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9922650B1 (en) * 2013-12-20 2018-03-20 Amazon Technologies, Inc. Intent-specific automatic speech recognition result generation
CN107464564B (en) * 2017-08-21 2023-05-26 腾讯科技(深圳)有限公司 Voice interaction method, device and equipment
CN109036411A (en) * 2018-09-05 2018-12-18 深圳市友杰智新科技有限公司 A kind of intelligent terminal interactive voice control method and device
CN109509470B (en) * 2018-12-11 2024-05-07 平安科技(深圳)有限公司 Voice interaction method and device, computer readable storage medium and terminal equipment
CN109545219A (en) * 2019-01-09 2019-03-29 北京新能源汽车股份有限公司 Vehicle-mounted voice exchange method, system, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110060662A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
US20230352022A1 (en) Voice activated device for use with a voice-based digital assistant
US11875820B1 (en) Context driven device arbitration
US11138977B1 (en) Determining device groups
US11734326B2 (en) Profile disambiguation
CN102568478B (en) Video play control method and system based on voice recognition
CN111508474B (en) Voice interruption method, electronic equipment and storage device
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
CN104575504A (en) Method for personalized television voice wake-up by voiceprint and voice identification
CN109841214B (en) Voice wakeup processing method and device and storage medium
US20200265843A1 (en) Speech broadcast method, device and terminal
US11205428B1 (en) Deleting user data using keys
US10991364B1 (en) Obtaining context data
US20240013784A1 (en) Speaker recognition adaptation
CN110706707A (en) Method, apparatus, device and computer-readable storage medium for voice interaction
CN116417003A (en) Voice interaction system, method, electronic device and storage medium
US11335346B1 (en) Natural language understanding processing
CN110060662B (en) Voice recognition method and device
CN111477226A (en) Control method, intelligent device and storage medium
US20210249033A1 (en) Speech processing method, information device, and computer program product
CN113990312A (en) Equipment control method and device, electronic equipment and storage medium
US20230186939A1 (en) Acoustic event detection
CN112509556B (en) Voice awakening method and device
CN113129902A (en) Voice processing method and device, electronic equipment and storage medium
WO2020167385A1 (en) Wakeword detection using a secondary microphone
CN111767083A (en) Method for collecting false wake-up audio data, playing device, electronic device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant