CN108831477B - Voice recognition method, device, equipment and storage medium - Google Patents

Voice recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN108831477B
CN108831477B CN201810615353.0A CN201810615353A CN108831477B CN 108831477 B CN108831477 B CN 108831477B CN 201810615353 A CN201810615353 A CN 201810615353A CN 108831477 B CN108831477 B CN 108831477B
Authority
CN
China
Prior art keywords
word
voice information
awakening
played
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810615353.0A
Other languages
Chinese (zh)
Other versions
CN108831477A (en
Inventor
许超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mobvoi Information Technology Co Ltd
Original Assignee
Mobvoi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mobvoi Information Technology Co Ltd filed Critical Mobvoi Information Technology Co Ltd
Priority to CN201810615353.0A priority Critical patent/CN108831477B/en
Publication of CN108831477A publication Critical patent/CN108831477A/en
Application granted granted Critical
Publication of CN108831477B publication Critical patent/CN108831477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The embodiment of the invention discloses a voice recognition method, a voice recognition device, voice recognition equipment and a storage medium. Wherein, the method comprises the following steps: when terminal equipment determines that voice information to be played is to be played in a sleeping process, acquiring a word set corresponding to the voice information to be played, and starting a wake-up word wake-up function in advance by the terminal equipment; and the terminal equipment detects the awakening words of the received voice information according to the similarity between the word set and the preset awakening words. According to the technical scheme of the embodiment of the invention, when the voice information to be played is specifically shielded according to the voice information to be played, the condition of mistaken awakening is avoided, the awakening identification is optimized, and the user experience is improved.

Description

Voice recognition method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to an intelligent terminal technology, in particular to a voice recognition method, a voice recognition device, voice recognition equipment and a storage medium.
Background
With the continuous progress of science and technology, the speech control technology is gradually popularized. Most intelligent terminals basically have a dialogue system that enables voice interaction. The voice interaction is carried out through the dialogue system with the intelligent terminal, so that the operation of the intelligent terminal becomes simpler and more convenient.
In the prior art, before interaction with the dialog system, a fixed wakeup word is used to wake up the dialog system each time, and after the system enters a wakeup state, voice interaction is performed.
In the process of implementing the invention, the inventor finds that the prior art has the following defects: the user may play the voice information through the smart terminal or the associated smart terminal, for example, play the audio electronic book through the smart terminal or the associated smart terminal. When the played voice message contains contents similar to the awakening words, the condition of false awakening is easy to occur. That is, when the user does not have a requirement for waking up the dialog system of the intelligent terminal, the content similar to the wake-up word in the voice information played in the environment is recognized as the wake-up word, the dialog system is woken up by mistake, and the dialog system performs voice interaction, so that the user is disturbed.
Disclosure of Invention
The invention provides a voice recognition method, a voice recognition device, voice recognition equipment and a storage medium, which are used for shielding voice information played in an environment in a targeted manner when an intelligent terminal is awakened and recognized, so that the condition of mistaken awakening is avoided, and the user experience is improved.
In a first aspect, an embodiment of the present invention provides a speech recognition method, including:
the method comprises the steps that when terminal equipment determines that voice information to be played is to be played in a sleeping process, a word set corresponding to the voice information to be played is obtained;
and the terminal equipment detects the awakening words of the received voice information according to the similarity between the word set and the preset awakening words.
In a second aspect, an embodiment of the present invention further provides a speech recognition apparatus, including:
the word set acquisition module is used for acquiring a word set corresponding to the voice information to be played when the terminal equipment determines that the voice information to be played is to be played in the sleeping process;
and the awakening word detection module is used for detecting the awakening words of the received voice information by the terminal equipment according to the similarity between the word set and the preset awakening words.
In a third aspect, an embodiment of the present invention further provides an apparatus, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the speech recognition method provided by the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the speech recognition method provided by the embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, when the terminal equipment determines that the voice information to be played is to be played in the sleeping process, the word set corresponding to the voice information to be played is obtained, the awakening word detection is carried out on the received voice information according to the similarity degree between the word set and the preset awakening word, the voice information played in the environment can be shielded in a targeted manner according to the voice information to be played when the awakening recognition is carried out, the condition of mistaken awakening is avoided, the awakening recognition is optimized, and the user experience is improved.
Drawings
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention;
fig. 2 is a flowchart of a speech recognition method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a speech recognition method according to a third embodiment of the present invention;
fig. 4 is a flowchart of a speech recognition method according to a fourth embodiment of the present invention;
fig. 5 is a block diagram of a speech recognition apparatus according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an apparatus according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention, where the present embodiment is applicable to a case of recognizing a speech signal, and the method can be executed by a speech recognition apparatus, where the apparatus is executed by software and/or hardware, and can be generally integrated in a terminal device. The terminal devices include, but are not limited to, computers and the like. Illustratively, the terminal device may be a smart watch, a smart phone, a smart bracelet, a smart speaker, a smart television, or the like. As shown in fig. 1, it specifically includes the following steps:
step 101, when the terminal device determines that the voice information to be played is to be played in the sleeping process, a word set corresponding to the voice information to be played is obtained.
The terminal equipment is in a sleep state when the user does not use the terminal equipment, and the terminal equipment is awakened when the user needs to use the terminal equipment, so that the terminal equipment enters a working state from the sleep state.
The wake-up word is a word or a plurality of words preset by the user or set by the system, and may be, for example: you get a question. Specifically, the wake-up word is not detachable and is continuous. For example, if the user inputs speech: if you are good, the user learns well, and asks questions, the voice input by the user does not have the awakening word.
Optionally, the terminal device is provided with a wake-up word wake-up function. Before the terminal equipment enters a sleep process, after a user manually starts a wake-up word wake-up function of the terminal equipment, wake-up word detection can be carried out on received voice information through a preset wake-up word, and the terminal equipment is awakened according to a detection result. The received voice information is the voice information obtained by the terminal equipment. Specifically, the terminal device starts a wake-up word wake-up function in advance, when the terminal device determines that voice information to be played is to be played in the sleeping process, a word set corresponding to the voice information to be played is obtained, then the terminal device detects the wake-up word of the received voice information according to the similarity degree between the word set and a preset wake-up word, and if the terminal device determines that the wake-up word is recognized in the voice information, the terminal device is woken up; and when the fact that the awakening words are not recognized in the voice information is determined, the terminal equipment is not awakened.
Optionally, when the terminal device detects that the terminal device enters the sleep process, the wake-up function of the wake-up word is automatically started.
Before the terminal equipment enters the sleep process, when the terminal equipment does not start the awakening word awakening function, awakening word detection cannot be performed on the received voice information through the preset awakening word, and the terminal equipment is awakened according to the detection result.
Optionally, when the user does not use the preset application of the terminal device, the preset application of the terminal device is in a sleep state, and when the user needs to use the preset application of the terminal device, the preset application of the terminal device is awakened, so that the preset application of the terminal device enters a working state from the sleep state. For example, when the user does not use the dialog system of the terminal device, the dialog system is in a sleep state, and when the user needs to use the dialog system, the dialog system is woken up, so that the dialog system enters a working state from the sleep state to perform voice interaction with the user.
Before the dialog system of the terminal equipment enters a sleep process, after the terminal equipment starts a wake-up word wake-up function, the received voice information can be subjected to wake-up word detection through a preset wake-up word, and the dialog system of the terminal equipment is awakened according to a detection result. Specifically, the terminal device starts a wake-up word wake-up function in advance, when the terminal device determines that voice information to be played is to be played in the sleeping process, a word set corresponding to the voice information to be played is obtained, then the terminal device detects the wake-up word of the received voice information according to the similarity degree between the word set and a preset wake-up word, and when the wake-up word is determined to be recognized in the voice information, a dialog system of the terminal device is woken up; and when the fact that the awakening words are not recognized in the voice information is determined, the dialog system of the terminal equipment is not awakened.
When the terminal equipment detects that the set audio file is to be played in the set playing application program, the terminal equipment determines that the voice information to be played is to be played. And setting the playing application program as an application program capable of playing the audio file. The setting audio file may include a music file and a talking electronic book file.
And the word set corresponding to the voice information to be played is a set of common words of the voice information to be played. Common words of the voice information to be played can be obtained according to the set audio file, and therefore a word set corresponding to the voice information to be played is generated.
And 102, the terminal equipment detects the awakening words of the received voice information according to the similarity between the word set and the preset awakening words.
And calculating the similarity degree between each word in the word set and a preset awakening word according to a preset word similarity algorithm. And then determining whether the similar words of the awakening words and the same words of the awakening words are contained in the word set or not according to the similarity between each word and the preset awakening words and a preset similarity threshold value. Specifically, the preset similarity threshold includes a similar word threshold and a same word threshold. The similar term threshold is less than the same term threshold. When the similarity degree between the words and the preset awakening words is larger than the similar word threshold value and smaller than the same word threshold value, determining the words as the similar words of the awakening words; and when the similarity between the words and the preset awakening words is greater than the same word threshold value, determining the words as the same words of the awakening words.
And the terminal equipment detects the awakening words of the received voice information according to the similarity between the word set and the preset awakening words. The similarity between the word set and the preset awakening word comprises the following steps: the word set comprises similar words of the awakening words and does not comprise the same words of the awakening words; the terminal equipment determines that the word set does not contain similar words of the awakening words and does not contain the same words of the awakening words according to the similarity degree; and the terminal equipment determines that the similar words of the awakening words are not contained in the word set and the same words of the awakening words are not contained in the word set according to the similarity degree.
Specifically, when it is determined that the word set includes similar words of the wakeup word and does not include the same words of the wakeup word, the terminal device needs to determine whether to recognize the wakeup word in the voice information according to the matching score between the recognition result corresponding to the voice information and the wakeup word and the matching score between the recognition result corresponding to the voice information and the similar words. When the matching score between the recognition result corresponding to the voice information and the awakening word is larger than or equal to the matching score between the recognition result corresponding to the voice information and the similar word, determining that the awakening word is recognized in the voice information; when the matching score between the recognition result corresponding to the voice information and the awakening word is smaller than the matching score between the recognition result corresponding to the voice information and the similar word, the awakening word is determined not to be recognized in the voice information, the terminal equipment or the preset application of the terminal equipment is not awakened, and the condition of mistaken awakening is avoided. And when the terminal equipment determines that the similar words of the awakening words are not contained in the word set and the same words of the awakening words are not contained in the word set according to the similarity degree, the terminal equipment directly detects the awakening words according to the preset awakening words. When the terminal equipment determines that the same words containing the awakening words in the word set according to the similarity degree, the terminal equipment needs to perform identity verification on the user according to the voiceprint characteristics so as to determine whether the received voice information is the voice information of the user. And when the received voice information is confirmed to be the voice information of the user, directly detecting the awakening words according to the preset awakening words.
When the awakening words are identified in the voice information, awakening the terminal equipment or the preset application of the terminal equipment; and when the fact that the awakening words are not recognized in the voice information is determined, the terminal equipment or the preset application of the terminal equipment is not awakened.
According to the voice recognition method provided by the embodiment, when the terminal device determines that the voice information to be played is to be played in the sleeping process, the word set corresponding to the voice information to be played is obtained, the awakening word detection is performed on the received voice information according to the similarity degree between the word set and the preset awakening word, the voice information played in the environment can be shielded according to the voice information to be played in a targeted manner when the awakening recognition is performed, the condition of mistaken awakening is avoided, the awakening recognition is optimized, and the user experience is improved.
Example two
Fig. 2 is a flowchart of a speech recognition method according to a second embodiment of the present invention, where the present embodiment optimizes step 102 based on the foregoing embodiments: the terminal equipment detects the awakening words of the received voice information according to the similarity degree between the word set and the preset awakening words, and the method comprises the following steps: when the terminal equipment determines that the word set contains similar words of the awakening words and does not contain the same words of the awakening words according to the similarity degree, whether voice signals corresponding to the awakening words exist in the received voice information or not is detected; when voice signals corresponding to awakening words exist in the voice information, acquiring first matching scores of the voice signals corresponding to the awakening words and second matching scores of the voice signals corresponding to the awakening words and similar words, and comparing the first matching scores with the second matching scores; and when the first matching score is larger than or equal to the second matching score, determining that the awakening word is recognized in the voice message.
As shown in fig. 2, the method includes:
step 201, when the terminal device determines that the voice information to be played is to be played in the sleeping process, a word set corresponding to the voice information to be played is acquired.
Optionally, the determining, by the terminal device, that the voice information to be played is to be played in the sleep process includes: the terminal equipment regularly detects the set playing application program in the local system and/or the associated equipment, and determines that the voice information to be played is to be played when the set audio file to be played in the set playing application program is detected; and/or the terminal equipment determines that the voice information to be played is to be played when receiving the audio file playing prompt information sent by the local system and/or the associated equipment.
The terminal equipment regularly detects the set playing application program in the local system according to a preset period, and determines that the voice information to be played is to be played when the set audio file to be played in the set playing application program is detected. And setting the playing application program as an application program capable of playing the audio file. The setting audio file may include a music file and a talking electronic book file.
Optionally, the terminal device periodically detects a set playing application program in the associated device, and determines that the voice information to be played is to be played when it is detected that the set audio file is to be played in the set playing application program. The associated device may be another terminal device connected to the same server as the terminal device. Optionally, the associated device may be another terminal device sharing the same user account with the terminal device. The user account is used to record the user's username and password, affiliated groups, accessible network resources, and the user's personal files and settings.
Optionally, the terminal device periodically detects a set playing application program in the local system and the associated device, and determines that the voice information to be played is to be played when it is detected that a set audio file is to be played in the set playing application program.
Optionally, when receiving the audio file playing prompt message sent by the local system, the terminal device determines that the voice message to be played is to be played.
When the set playing application program in the system is about to play the set audio file, the audio file playing prompt message is sent. The terminal equipment can determine that the set audio file is to be played by the set playing application program in the local system according to the received audio file playing prompt information sent by the local system, namely, determine that the voice information to be played is to be played.
Optionally, when receiving the audio file playing prompt message sent by the associated device, the terminal device determines that the voice message to be played is to be played.
When the set playing application program in the associated equipment is about to play the set audio file, sending audio file playing prompt information. The terminal device can determine that the set audio file is to be played by the set playing application program in the associated device, that is, determine that the voice information to be played is to be played, according to the received audio file playing prompt information sent by the associated device.
Optionally, when receiving the audio file playing prompt information sent by the local system and the associated device, the terminal device determines that the audio information to be played is to be played.
Optionally, the terminal device periodically detects a set playing application program in the local system and the associated device, and when it is detected that a set audio file is to be played in the set playing application program, it determines that the voice information to be played is to be played, and when receiving audio file playing prompt information sent by the local system and the associated device, it determines that the voice information to be played is to be played.
Optionally, obtaining a word set corresponding to the voice information to be played includes: acquiring introduction information of voice information to be played; and acquiring common words of the voice information to be played according to the introduction information, and generating a word set corresponding to the voice information to be played.
The introduction information of the voice information to be played is preset information used for introducing the content of the set audio file to be played corresponding to the voice information to be played. After the introduction information of the voice information to be played is obtained, the common words of the voice information to be played are obtained according to the introduction information, and a word set corresponding to the voice information to be played is generated. Specifically, the statistical characteristics of each word in the introduction information are extracted through a preset statistical algorithm, then the word with the occurrence frequency reaching a preset frequency threshold is screened out according to the statistical characteristics, and the word is determined as the common word of the voice information to be played. And generating a word set corresponding to the voice information to be played according to all the determined common words.
Optionally, obtaining a word set corresponding to the voice information to be played includes: acquiring a set audio file to be played corresponding to the voice information to be played; and acquiring common words of the voice information to be played according to the set audio file, and generating a word set corresponding to the voice information to be played.
The statistical characteristics of each word in the set audio file are extracted through a preset statistical algorithm, then the word with the occurrence frequency reaching a preset frequency threshold value is screened out according to the statistical characteristics, and the word is determined as a common word of the voice information to be played. And generating a word set corresponding to the voice information to be played according to all the determined common words.
Step 202, when the terminal device determines that the word set contains similar words of the awakening word and does not contain the same words of the awakening word according to the similarity degree, detecting whether a voice signal corresponding to the awakening word exists in the received voice information.
And when the terminal equipment determines that the word set contains similar words of the awakening words and does not contain the same words of the awakening words according to the similarity degree, detecting whether a voice signal corresponding to the awakening words exists in the received voice information. Specifically, acoustic features of the voice information are extracted, the acoustic features are input into a preset voice recognition model, the received voice information is recognized through the preset voice recognition model, a recognition result corresponding to the voice information is obtained, and a matching score between the recognition result corresponding to the voice information and the awakening word, namely a first matching score, is calculated. Here, the matching score may range from 0 to 10 points. The higher the match score, the better the degree of match. The range of the matching score can be set according to actual needs.
After the first matching score is obtained, whether a voice signal corresponding to the awakening word exists in the received voice message or not can be determined according to the first matching score and a preset first matching threshold value. When the first matching score is larger than or equal to a first preset threshold value, the recognition result corresponding to the voice information is matched with the awakening word, namely, the voice signal corresponding to the awakening word is detected in the received voice information; and when the first matching score is smaller than a first preset threshold value, indicating that the recognition result corresponding to the voice information is not matched with the awakening word, namely detecting that the voice signal corresponding to the awakening word does not exist in the received voice information. For example, the matching score may range from 0 to 10 points, and the first preset threshold may be 8 points.
Step 203, when a voice signal corresponding to the awakening word is detected to exist in the voice information, acquiring a first matching score of the voice information and the awakening word and a second matching score of the voice information and the similar word, and comparing the first matching score and the second matching score.
And the first matching score of the voice information and the awakening word is the matching score between the recognition result corresponding to the voice information and the awakening word. The second matching score of the voice information and the similar words is a matching score between the recognition result corresponding to the voice information and the similar words.
When detecting that a voice signal corresponding to the awakening word exists in the received voice information, acquiring a matching score between a recognition result corresponding to the voice information and the awakening word, namely a first matching score, and calculating a matching score between the recognition result corresponding to the voice information and the similar word through a preset voice recognition model, namely a second matching score. And after a first matching score of the voice information and the awakening word and a second matching score of the voice information and the similar word are obtained, comparing the first matching score with the second matching score.
And step 204, when the first matching score is larger than or equal to the second matching score, determining that the awakening word is recognized in the voice message.
The first matching score is larger than the second matching score, the matching degree of the recognition result corresponding to the voice information and the awakening word is higher than the matching degree of the recognition result corresponding to the voice information and the similar word, and the awakening word is determined to be recognized in the voice information; the first matching score is equal to the second matching score, the matching degree of the recognition result corresponding to the voice information and the awakening word is more similar to that of the recognition result corresponding to the voice information and the similar word, and the awakening word is determined to be recognized in the voice information; and if the first matching score is smaller than the second matching score, the matching degree of the recognition result corresponding to the voice information and the awakening word is lower than the matching degree of the recognition result corresponding to the voice information and the similar word, and the awakening word is determined not to be recognized in the voice information.
And under the condition that the matching degree of the recognition result corresponding to the voice information and the awakening word is higher than that of the recognition result corresponding to the voice information and the similar word, or the matching degree of the recognition result corresponding to the voice information and the awakening word is close to that of the recognition result corresponding to the voice information and the similar word, determining that the awakening word is recognized in the voice information.
When the awakening words are identified in the voice information, awakening the terminal equipment or the preset application of the terminal equipment; and when the fact that the awakening words are not recognized in the voice information is determined, the terminal equipment or the preset application of the terminal equipment is not awakened.
In the voice recognition method provided by this embodiment, when it is determined that a word set includes similar words of a wakeup word and does not include the same words of the wakeup word, and when a voice signal corresponding to the wakeup word is detected to exist in voice information, the voice signal corresponding to the wakeup word is compared with a first matching score of the wakeup word, and the voice signal corresponding to the wakeup word and a second matching score of the similar words are compared; when the first matching score is larger than or equal to the second matching score, the awakening word is determined to be recognized in the voice message, and the voice message played in the environment can be shielded in a targeted manner according to the similar words of the awakening word in the voice message to be played during awakening recognition, so that the condition of mistaken awakening is avoided.
EXAMPLE III
Fig. 3 is a flowchart of a speech recognition method according to a third embodiment of the present invention, where the present embodiment optimizes step 102 based on the foregoing embodiment: the terminal equipment detects the awakening words of the received voice information according to the similarity degree between the word set and the preset awakening words, and the method comprises the following steps: when the terminal equipment determines that the word set does not contain similar words of the awakening words and does not contain the same words of the awakening words according to the similarity degree, whether voice signals corresponding to the awakening words exist in the received voice information or not is detected; and when the voice information is detected to have the voice signal corresponding to the awakening word, determining that the awakening word is recognized in the voice information.
As shown in fig. 3, the method includes:
step 301, when the terminal device determines that the voice information to be played is to be played in the sleep process, acquiring a word set corresponding to the voice information to be played.
Step 302, when the terminal device determines that the word set does not contain similar words of the awakening word and does not contain the same words of the awakening word according to the similarity degree, it detects whether a voice signal corresponding to the awakening word exists in the received voice information.
And when the terminal equipment determines that the word set does not contain similar words of the awakening word and does not contain the same words of the awakening word according to the similarity degree, detecting whether a voice signal corresponding to the awakening word exists in the received voice information. Specifically, acoustic features of the voice information are extracted, the acoustic features are input into a preset voice recognition model, the received voice information is recognized through the preset voice recognition model, a recognition result corresponding to the voice information is obtained, and a matching score between the recognition result corresponding to the voice information and the awakening word, namely a first matching score, is calculated. Here, the matching score may range from 0 to 10 points. The higher the match score, the better the degree of match. The range of the matching score can be set according to actual needs.
After the first matching score is obtained, whether a voice signal corresponding to the awakening word exists in the received voice message or not can be determined according to the first matching score and a preset first matching threshold value. When the first matching score is larger than or equal to a first preset threshold value, the recognition result corresponding to the voice information is matched with the awakening word, namely, the voice signal corresponding to the awakening word is detected in the received voice information; and when the first matching score is smaller than a first preset threshold value, indicating that the recognition result corresponding to the voice information is not matched with the awakening word, namely detecting that the voice signal corresponding to the awakening word does not exist in the received voice information. For example, the matching score may range from 0 to 10 points, and the first preset threshold may be 8 points.
Step 303, when it is detected that the voice information has a voice signal corresponding to the wakeup word, determining that the wakeup word is recognized in the voice information.
When voice signals corresponding to the awakening words are detected to exist in the voice information, the awakening words are determined to be recognized in the voice information, and the terminal equipment or the preset application of the terminal equipment is awakened; when the voice information is detected to have no voice signal corresponding to the awakening word, the awakening word is determined not to be recognized in the voice information, and the terminal equipment or the preset application of the terminal equipment is not awakened.
In the voice recognition method provided by this embodiment, under the condition that it is determined that a word set does not include similar words of a wakeup word and does not include the same words of the wakeup word, the terminal device detects whether a voice signal corresponding to the wakeup word exists in received voice information; when voice signals corresponding to the awakening words are detected to exist in the voice information, the awakening words are determined to be recognized in the voice information, and the awakening recognition can be directly performed according to the awakening words under the condition that similar words and identical words of the awakening words do not exist in the voice information to be played.
Example four
Fig. 4 is a flowchart of a speech recognition method according to a fourth embodiment of the present invention, where the present embodiment optimizes step 102 on the basis of the foregoing embodiment: the terminal equipment detects the awakening words of the received voice information according to the similarity degree between the word set and the preset awakening words, and the method comprises the following steps: when the terminal equipment determines the same words containing the awakening words in the word set according to the similarity degree, determining the voiceprint characteristics corresponding to the voice information according to the received voice information; judging whether the voiceprint features are matched with preset voiceprint features or not; when the voiceprint features are matched with the preset voiceprint features, detecting whether a voice signal corresponding to the awakening word exists in the voice information; and when the voice information is detected to have the voice signal corresponding to the awakening word, determining that the awakening word is recognized in the voice information.
As shown in fig. 4, the method includes:
step 401, when it is determined that the voice information to be played is to be played in the sleep process, the terminal device obtains a word set corresponding to the voice information to be played.
Step 402, when the terminal device determines that the same words containing the awakening words in the word set according to the similarity degree, determining the voiceprint characteristics corresponding to the voice information according to the received voice information.
The method for determining the voiceprint features corresponding to the voice information may be to process the received voice information and further extract the voiceprint features corresponding to the received voice information.
And step 403, judging whether the voiceprint features are matched with preset voiceprint features.
The preset voiceprint feature is the preset voiceprint feature of a user using the device. The obtaining mode of the voiceprint features can be directly set for the user, and also can be analyzed according to voice signals input by the user, so that the voiceprint features of the user can be obtained. Optionally, the preset voiceprint feature may comprise a plurality of voiceprint features.
And step 404, detecting whether a voice signal corresponding to the awakening word exists in the voice information when the voiceprint feature is matched with the preset voiceprint feature.
The voice print characteristics are matched with preset voice print characteristics, the received voice information is the voice information input by a user using the terminal equipment, and whether a voice signal corresponding to the awakening word exists in the voice information is detected; if the voiceprint feature is not matched with the preset voiceprint feature, the received voice information is not the voice information input by the user using the terminal equipment, and may be the interference voice information in the environment, and further processing is not performed.
Step 405, when a voice signal corresponding to the awakening word is detected to exist in the voice message, determining that the awakening word is recognized in the voice message.
When voice signals corresponding to the awakening words are detected to exist in the voice information, the awakening words are determined to be recognized in the voice information, and the terminal equipment or the preset application of the terminal equipment is awakened; when the voice information is detected to have no voice signal corresponding to the awakening word, the awakening word is determined not to be recognized in the voice information, and the terminal equipment or the preset application of the terminal equipment is not awakened.
In the voice recognition method provided by this embodiment, when the terminal device determines that the word set includes the same word of the wakeup word, the voiceprint feature corresponding to the voice information is determined according to the received voice information, and whether the voiceprint feature matches the preset voiceprint feature is determined, and when the voiceprint feature matches the preset voiceprint feature, whether a voice signal corresponding to the wakeup word exists in the voice information is detected, so that the voice recognition can be performed according to the voiceprint feature under the condition that the same word of the wakeup word exists in the voice information to be played, and the interfering voice information in the environment is shielded.
EXAMPLE five
Fig. 5 is a block diagram of a speech recognition apparatus according to a fifth embodiment of the present invention. As shown in fig. 5, the apparatus includes:
a word set acquisition module 501 and a wake-up word detection module 502.
The word set acquiring module 501 is configured to acquire a word set corresponding to voice information to be played when it is determined that the voice information to be played is to be played in a sleep process of the terminal device; and the awakening word detection module 502 is configured to perform awakening word detection on the received voice message by the terminal device according to the similarity between the word set and the preset awakening word.
The voice recognition device provided by the embodiment acquires the word set corresponding to the voice information to be played when the terminal device determines that the voice information to be played is to be played in the sleeping process, and performs awakening word detection on the received voice information according to the similarity between the word set and the preset awakening words, so that the voice information played in the environment can be shielded according to the voice information to be played in a targeted manner when awakening recognition is performed, the condition of mistaken awakening is avoided, awakening recognition is optimized, and user experience is improved.
On the basis of the foregoing embodiments, the word set obtaining module 501 may include:
the information regular detection unit is used for detecting the set playing application program in the local system and/or the associated equipment by the terminal equipment regularly, and determining that the voice information to be played is to be played when the set audio file to be played in the set playing application program is detected; and/or
And the information receiving unit is used for determining that the voice information to be played is to be played when the terminal equipment receives the audio file playing prompt information sent by the local system and/or the associated equipment.
On the basis of the foregoing embodiments, the wakeup word detection module 502 may include:
the first signal detection unit is used for detecting whether a voice signal corresponding to the awakening word exists in the received voice information or not when the terminal equipment determines that the similar words containing the awakening word in the word set and the same words not containing the awakening word according to the similarity degree;
the matching score comparing unit is used for acquiring a first matching score of the voice information and the awakening word and a second matching score of the voice information and the similar word when the voice signal corresponding to the awakening word is detected to exist in the voice information, and comparing the first matching score and the second matching score;
and the first identification unit is used for determining that the awakening word is identified in the voice message when the first matching score is larger than or equal to the second matching score.
On the basis of the foregoing embodiments, the wakeup word detection module 502 may include:
the second signal detection unit is used for detecting whether the voice signal corresponding to the awakening word exists in the received voice information or not when the terminal equipment determines that the similar words of the awakening word are not contained in the word set and the same words of the awakening word are not contained in the word set according to the similarity degree;
and the second identification unit is used for determining that the awakening word is identified in the voice information when the voice signal corresponding to the awakening word is detected to exist in the voice information.
On the basis of the foregoing embodiments, the wakeup word detection module 502 may include:
the voiceprint characteristic determining unit is used for determining the voiceprint characteristics corresponding to the voice information according to the received voice information when the terminal equipment determines the same words containing the awakening words in the word set according to the similarity degree;
the voiceprint judging unit is used for judging whether the voiceprint characteristics are matched with the preset voiceprint characteristics;
the third signal detection unit is used for detecting whether a voice signal corresponding to the awakening word exists in the voice information or not when the voiceprint feature is matched with the preset voiceprint feature;
and the third identification unit is used for determining that the awakening word is identified in the voice information when the voice signal corresponding to the awakening word is detected to exist in the voice information.
On the basis of the foregoing embodiments, the word set obtaining module 501 may include:
the introduction information acquisition unit is used for acquiring introduction information of the voice information to be played;
and the word set generating unit is used for acquiring the common words of the voice information to be played according to the introduction information and generating a word set corresponding to the voice information to be played.
The voice recognition device provided by the embodiment of the invention can execute the voice recognition method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE six
Fig. 6 is a schematic structural diagram of an apparatus according to a sixth embodiment of the present invention. Fig. 6 illustrates a block diagram of an exemplary device 612 suitable for use in implementing embodiments of the present invention. The device shown in fig. 6 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in FIG. 6, device 612 is in the form of a general purpose computing device. Components of device 612 may include, but are not limited to: one or more processors or processing units 616, a system memory 628, and a bus 618 that couples various system components including the system memory 628 and the processing unit 616.
Bus 618 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 612 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 612 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 628 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)630 and/or cache memory 632. The device 612 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 634 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be connected to bus 618 by one or more data media interfaces. Memory 628 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 640 having a set (at least one) of program modules 642 may be stored, for example, in memory 628, such program modules 642 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 642 generally perform the functions and/or methods of the described embodiments of the present invention.
Device 612 may also communicate with one or more external devices 614 (e.g., keyboard, pointing device, display 624, etc.), with one or more devices that enable a user to interact with device 612, and/or with any devices (e.g., network card, modem, etc.) that enable device 612 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 622. Also, the device 612 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through the network adapter 620. As shown, the network adapter 620 communicates with the other modules of the device 612 via the bus 618. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the device 612, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 616 executes programs stored in the system memory 628 to perform various functional applications and data processing, such as implementing a voice recognition method provided by an embodiment of the present invention.
Namely: the method comprises the steps that when terminal equipment determines that voice information to be played is to be played in a sleeping process, a word set corresponding to the voice information to be played is obtained; and the terminal equipment detects the awakening words of the received voice information according to the similarity between the word set and the preset awakening words.
EXAMPLE seven
The seventh embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the speech recognition method provided by the embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A speech recognition method, comprising:
the method comprises the steps that when terminal equipment determines that voice information to be played is to be played in a sleeping process, a word set corresponding to the voice information to be played is obtained;
the terminal equipment detects the awakening words of the received voice information according to the similarity degree between the word set and the preset awakening words;
the terminal equipment detects the awakening words of the received voice information according to the similarity degree between the word set and the preset awakening words, and the method comprises the following steps:
when the terminal equipment determines that the word set contains similar words of the awakening word and does not contain the same words of the awakening word according to the similarity degree, whether a voice signal corresponding to the awakening word exists in the received voice information or not is detected;
when voice signals corresponding to the awakening words exist in the voice information, acquiring a first matching score of the voice information and the awakening words and a second matching score of the voice information and the similar words, and comparing the first matching score with the second matching score;
determining that the wake-up word is recognized in the voice message when the first matching score is greater than or equal to the second matching score.
2. The method of claim 1, wherein the terminal device determines that the voice message to be played is to be played during the sleep process, and the method comprises:
the terminal equipment regularly detects a set playing application program in a local system and/or associated equipment, and determines that voice information to be played is to be played when a set audio file to be played in the set playing application program is detected; and/or
And when the terminal equipment receives the audio file playing prompt information sent by the local system and/or the associated equipment, the terminal equipment determines that the voice information to be played is to be played.
3. The method according to claim 1 or 2, wherein the terminal device performs wakeup word detection on the received voice message according to the similarity between the word set and a preset wakeup word, and the method includes:
when the terminal equipment determines that the word set does not contain similar words of the awakening word and does not contain the same words of the awakening word according to the similarity degree, whether a voice signal corresponding to the awakening word exists in the received voice information or not is detected;
and when the voice information is detected to have the voice signal corresponding to the awakening word, determining that the awakening word is recognized in the voice information.
4. The method according to claim 1 or 2, wherein the terminal device performs wakeup word detection on the received voice message according to the similarity between the word set and a preset wakeup word, and the method includes:
when the terminal equipment determines that the same words containing the awakening words in the word set according to the similarity degree, determining voiceprint characteristics corresponding to the voice information according to the received voice information;
judging whether the voiceprint features are matched with preset voiceprint features or not;
when the voiceprint features are matched with preset voiceprint features, detecting whether a voice signal corresponding to a wakeup word exists in the voice information;
and when detecting that the voice information has the voice signal corresponding to the awakening word, determining that the awakening word is recognized in the voice information.
5. The method of claim 1, wherein obtaining a set of words corresponding to the voice information to be played comprises:
acquiring introduction information of the voice information to be played;
and acquiring the common words of the voice information to be played according to the introduction information, and generating a word set corresponding to the voice information to be played.
6. A speech recognition apparatus, comprising:
the terminal equipment comprises a word set acquisition module, a word acquisition module and a word processing module, wherein the word set acquisition module is used for acquiring a word set corresponding to voice information to be played when the terminal equipment determines that the voice information to be played is to be played in a sleeping process;
the awakening word detection module is used for the terminal equipment to perform awakening word detection on the received voice information according to the similarity degree between the word set and a preset awakening word;
the awakening word detection module comprises:
the terminal equipment detects whether a voice signal corresponding to the awakening word exists in the received voice information or not when the terminal equipment determines that the similar words of the awakening word are contained in the word set and the same words of the awakening word are not contained in the word set according to the similarity degree;
the matching score comparing unit is used for acquiring a first matching score of the voice information and the awakening word and a second matching score of the voice information and the similar word when the voice signal corresponding to the awakening word is detected to exist in the voice information, and comparing the first matching score and the second matching score;
and the first identification unit is used for determining that the awakening word is identified in the voice information when the first matching score is greater than or equal to the second matching score.
7. The apparatus of claim 6, wherein the term set acquisition module comprises:
the information regular detection unit is used for detecting the set playing application programs in the local system and/or the associated equipment by the terminal equipment regularly, and determining that the voice information to be played is to be played when the set playing application programs are detected to be about to play the set audio files; and/or
And the information receiving unit is used for determining that the voice information to be played is to be played when the terminal equipment receives the audio file playing prompt information sent by the local system and/or the associated equipment.
8. An electronic device, characterized in that the device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the speech recognition method of any of claims 1-5.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the speech recognition method according to any one of claims 1 to 5.
CN201810615353.0A 2018-06-14 2018-06-14 Voice recognition method, device, equipment and storage medium Active CN108831477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810615353.0A CN108831477B (en) 2018-06-14 2018-06-14 Voice recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810615353.0A CN108831477B (en) 2018-06-14 2018-06-14 Voice recognition method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108831477A CN108831477A (en) 2018-11-16
CN108831477B true CN108831477B (en) 2021-07-09

Family

ID=64141911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810615353.0A Active CN108831477B (en) 2018-06-14 2018-06-14 Voice recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108831477B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112740321A (en) * 2018-11-20 2021-04-30 深圳市欢太科技有限公司 Method and device for waking up equipment, storage medium and electronic equipment
CN111354357A (en) * 2018-12-24 2020-06-30 中移(杭州)信息技术有限公司 Audio resource playing method and device, electronic equipment and storage medium
CN109448725A (en) * 2019-01-11 2019-03-08 百度在线网络技术(北京)有限公司 A kind of interactive voice equipment awakening method, device, equipment and storage medium
CN112185425A (en) * 2019-07-05 2021-01-05 阿里巴巴集团控股有限公司 Audio signal processing method, device, equipment and storage medium
CN112702469B (en) * 2019-10-23 2022-07-22 阿里巴巴集团控股有限公司 Voice interaction method and device, audio and video processing method and voice broadcasting method
CN110827792B (en) * 2019-11-15 2022-06-03 广州视源电子科技股份有限公司 Voice broadcasting method and device
CN112562685A (en) * 2020-12-10 2021-03-26 上海雷盎云智能技术有限公司 Voice interaction method and device for service robot
CN112885353B (en) * 2021-01-26 2023-03-14 维沃移动通信有限公司 Voice wake-up method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014181019A (en) * 2013-03-21 2014-09-29 Asmo Co Ltd Vehicle equipment control unit
CN104572009A (en) * 2015-01-28 2015-04-29 合肥联宝信息技术有限公司 External environment adaptive audio control method and device
CN106847283A (en) * 2017-02-28 2017-06-13 广东美的制冷设备有限公司 Intelligent electrical appliance control and device
CN107256707A (en) * 2017-05-24 2017-10-17 深圳市冠旭电子股份有限公司 A kind of audio recognition method, system and terminal device
CN107360327A (en) * 2017-07-19 2017-11-17 腾讯科技(深圳)有限公司 Audio recognition method, device and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7073386B2 (en) * 2004-06-14 2006-07-11 General Electric Company Multi-bore pressure sensing probe
CN106981290B (en) * 2012-11-27 2020-06-30 威盛电子股份有限公司 Voice control device and voice control method
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN105206271A (en) * 2015-08-25 2015-12-30 北京宇音天下科技有限公司 Intelligent equipment voice wake-up method and system for realizing method
CN106653031A (en) * 2016-10-17 2017-05-10 海信集团有限公司 Voice wake-up method and voice interaction device
CN106448663B (en) * 2016-10-17 2020-10-23 海信集团有限公司 Voice awakening method and voice interaction device
CN106910496A (en) * 2017-02-28 2017-06-30 广东美的制冷设备有限公司 Intelligent electrical appliance control and device
CN107622770B (en) * 2017-09-30 2021-03-16 百度在线网络技术(北京)有限公司 Voice wake-up method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014181019A (en) * 2013-03-21 2014-09-29 Asmo Co Ltd Vehicle equipment control unit
CN104572009A (en) * 2015-01-28 2015-04-29 合肥联宝信息技术有限公司 External environment adaptive audio control method and device
CN106847283A (en) * 2017-02-28 2017-06-13 广东美的制冷设备有限公司 Intelligent electrical appliance control and device
CN107256707A (en) * 2017-05-24 2017-10-17 深圳市冠旭电子股份有限公司 A kind of audio recognition method, system and terminal device
CN107360327A (en) * 2017-07-19 2017-11-17 腾讯科技(深圳)有限公司 Audio recognition method, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Electrostimulation mapping of comprehension of auditory and visual words";Franck-Emmanuel Roux;《ScienceDirect》;20151231;全文 *
"移动智能终端的语音交互设计原则初探";高峰;《工业设计产业研究中心专题资料汇编》;20161101;全文 *

Also Published As

Publication number Publication date
CN108831477A (en) 2018-11-16

Similar Documents

Publication Publication Date Title
CN108831477B (en) Voice recognition method, device, equipment and storage medium
CN110069608B (en) Voice interaction method, device, equipment and computer storage medium
JP6857699B2 (en) Wake-up methods, equipment, equipment, storage media, and programs for voice dialogue equipment
CN108520743B (en) Voice control method of intelligent device, intelligent device and computer readable medium
US11568876B2 (en) Method and device for user registration, and electronic device
CN107808670B (en) Voice data processing method, device, equipment and storage medium
CN107886944B (en) Voice recognition method, device, equipment and storage medium
CN110265040B (en) Voiceprint model training method and device, storage medium and electronic equipment
CN108133707B (en) Content sharing method and system
US10529340B2 (en) Voiceprint registration method, server and storage medium
CN109215646B (en) Voice interaction processing method and device, computer equipment and storage medium
CN110047481B (en) Method and apparatus for speech recognition
CN107516526B (en) Sound source tracking and positioning method, device, equipment and computer readable storage medium
EP3444811B1 (en) Speech recognition method and device
CN108055617B (en) Microphone awakening method and device, terminal equipment and storage medium
CN110706707B (en) Method, apparatus, device and computer-readable storage medium for voice interaction
EP3989217A1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN107146605B (en) Voice recognition method and device and electronic equipment
CN111312222A (en) Awakening and voice recognition model training method and device
CN106228047B (en) A kind of application icon processing method and terminal device
CN111243604A (en) Training method for speaker recognition neural network model supporting multiple awakening words, speaker recognition method and system
CN111400463A (en) Dialog response method, apparatus, device and medium
CN114155860A (en) Abstract recording method and device, computer equipment and storage medium
US20240096347A1 (en) Method and apparatus for determining speech similarity, and program product
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant