CN108831477A - A kind of audio recognition method, device, equipment and storage medium - Google Patents

A kind of audio recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN108831477A
CN108831477A CN201810615353.0A CN201810615353A CN108831477A CN 108831477 A CN108831477 A CN 108831477A CN 201810615353 A CN201810615353 A CN 201810615353A CN 108831477 A CN108831477 A CN 108831477A
Authority
CN
China
Prior art keywords
word
voice messaging
terminal
wake
played
Prior art date
Application number
CN201810615353.0A
Other languages
Chinese (zh)
Inventor
许超
Original Assignee
出门问问信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 出门问问信息科技有限公司 filed Critical 出门问问信息科技有限公司
Priority to CN201810615353.0A priority Critical patent/CN108831477A/en
Publication of CN108831477A publication Critical patent/CN108831477A/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The embodiment of the invention discloses a kind of audio recognition method, device, equipment and storage mediums.Wherein, this method includes:Terminal device determines when having voice messaging to be played that will play in sleep procedure, then obtains set of words corresponding with the voice messaging to be played, and the terminal device opens wake-up word arousal function in advance;The terminal device carries out the voice messaging received to wake up word detection according to the set of words and the preset similarity degree waken up between word.The technical solution of the embodiment of the present invention can optimize to identification is waken up when carrying out waking up identification, the case where targetedly shielding to the voice messaging played in environment according to voice messaging to be played, avoid the occurrence of false wake-up, improve user experience.

Description

A kind of audio recognition method, device, equipment and storage medium

Technical field

The present embodiments relate to intelligent terminal technology more particularly to a kind of audio recognition method, device, equipment and storages Medium.

Background technique

With the continuous progress of science and technology, voice control technology is gradually popularized.Most of intelligent terminal has substantially can be with Carry out the conversational system of interactive voice.Interactive voice is carried out by the conversational system with intelligent terminal, so as to intelligent terminal Operation becomes simpler, convenient.

In prior art implementation, every time before the interaction with conversational system, fixed wake-up word wake-up pair is used Telephone system carries out interactive voice after system enters wake-up states.

In the implementation of the present invention, there are following defects for the discovery prior art by inventor:User can pass through intelligent end End or association intelligent terminal play voice messaging, for example, playing talking e-book by intelligent terminal or association intelligent terminal.? The case where in the voice messaging of broadcasting comprising with when content, false wake-up is easy to appear as wake-up part of speech.It is not called out in user When the demand of the conversational system of awake intelligent terminal, will be with content recognition as wake-up part of speech in the voice messaging that played in environment Word, false wake-up conversational system are waken up, conversational system carries out interactive voice, causes to bother to user.

Summary of the invention

The present invention provides a kind of audio recognition method, device, equipment and storage medium, is called out with realizing in intelligent terminal When awake identification, the case where targetedly shielding to the voice messaging played in environment, avoid the occurrence of false wake-up, improves and use Family experience.

In a first aspect, the embodiment of the invention provides a kind of audio recognition methods, including:

Terminal device is determined in sleep procedure when having voice messaging to be played that will play, and is obtained and is believed with voice to be played Cease corresponding set of words;

Terminal device believes the voice received according to set of words and the preset similarity degree waken up between word Breath carries out waking up word detection.

Second aspect, the embodiment of the invention also provides a kind of speech recognition equipments, including:

Set of words obtains module, and for terminal device, determination has voice messaging to be played that will play in sleep procedure When, obtain set of words corresponding with voice messaging to be played;

Word detection module is waken up, for terminal device according to the similar journey between set of words and preset wake-up word Degree carries out the voice messaging received to wake up word detection.

The third aspect, the embodiment of the invention also provides a kind of equipment, including:

One or more processors;

Storage device, for storing one or more programs,

When one or more programs are executed by one or more processors, so that one or more processors realize the present invention Audio recognition method provided by embodiment.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes audio recognition method provided by the embodiment of the present invention when program is executed by processor.

The technical solution of the embodiment of the present invention, by terminal device, determination has voice messaging to be played i.e. in sleep procedure When will play, set of words corresponding with voice messaging to be played is obtained, and according to set of words and preset wake-up word Between similarity degree, to the voice messaging received carry out wake up word detection, can carry out wake up identification when, targetedly The case where ground shields to the voice messaging played in environment according to voice messaging to be played, avoids the occurrence of false wake-up, to calling out Awake identification optimizes, and improves user experience.

Detailed description of the invention

Fig. 1 is a kind of flow chart for audio recognition method that the embodiment of the present invention one provides;

Fig. 2 is a kind of flow chart of audio recognition method provided by Embodiment 2 of the present invention;

Fig. 3 is a kind of flow chart for audio recognition method that the embodiment of the present invention three provides;

Fig. 4 is a kind of flow chart for audio recognition method that the embodiment of the present invention four provides;

Fig. 5 is a kind of structural block diagram for speech recognition equipment that the embodiment of the present invention five provides;

Fig. 6 is a kind of structural schematic diagram for equipment that the embodiment of the present invention six provides.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is the flow chart of a kind of audio recognition method that the embodiment of the present invention one provides, and the present embodiment is applicable to pair The case where voice signal is identified, this method can be executed by speech recognition equipment, and described device is by software and/or hardware To execute, and can generally be integrated in terminal device.Terminal device includes but is not limited to computer etc..Illustratively, terminal Equipment can be smartwatch, smart phone, Intelligent bracelet, intelligent sound box or smart television etc..As shown in Figure 1, it is specific Include the following steps:

Step 101, terminal device determine when having voice messaging to be played that will play in sleep procedure, obtain with wait broadcast Put the corresponding set of words of voice messaging.

Wherein, in order to save the electricity of terminal device, extend the use duration of terminal device, set in user without using terminal When standby, terminal device is in sleep state, when user needs using terminal equipment, terminal device is waken up, so that terminal device Enter working condition from sleep state.

Waking up word is the word or multiple words that user preset or be default, such as be can be:You It is good, it asks.Specifically, waking up word can not be split, it is continuous.For example, if the voice of user's input is:Hello, Xiao Ming Classmate asks, then in the voice that user inputs and there is no wake up word.

Optionally, terminal device, which is provided with, wakes up word arousal function.Before terminal device enters sleep procedure, user is manual After the wake-up word arousal function of opening terminal apparatus, the voice messaging received can be waken up by preset wake-up word Word detection, wakes up terminal device according to testing result.The voice messaging received is the voice messaging that terminal device therefrom obtains. Specifically, terminal device is opened in advance wakes up word arousal function, terminal device determines in sleep procedure has voice to be played to believe Breath obtains set of words corresponding with voice messaging to be played when will play, then terminal device according to set of words in advance What is first set wakes up the similarity degree between word, carries out waking up word detection to the voice messaging received, believe determining in voice Wake-up word is identified in breath, then wakes up terminal device;When determining the unidentified word of wake-up out in voice messaging, terminal is not set It is standby to be waken up.

Optionally, terminal device is detecting that automatic opening wakes up word arousal function when entering sleep procedure.

It, cannot be by pre- when terminal device is not opened and wakes up word arousal function before terminal device enters sleep procedure If wake-up word to the voice messaging received carry out wake up word detection, wake up terminal device according to testing result.

Optionally, the presetting in application, the default application of terminal device is in sleep shape without using terminal device in user State, when user needs the default of using terminal equipment in application, the default application of terminal device is waken up, so that terminal device is pre- If entering working condition using from sleep state.For example, when user does not use the conversational system of terminal device, at conversational system In sleep state, when user needs using conversational system, conversational system is waken up, so that conversational system enters work from sleep state Make state, carries out interactive voice with user.

Wherein, before the conversational system of terminal device enters sleep procedure, after terminal device opens wake-up word arousal function, The voice messaging received can be carried out by preset wake-up word waking up word detection, wake up terminal device according to testing result Conversational system.Specifically, terminal device is opened in advance wakes up word arousal function, terminal device determination in sleep procedure needs When playing voice messaging will play, set of words corresponding with voice messaging to be played is obtained, then terminal device is according to word Language set and the preset similarity degree waken up between word carry out the voice messaging received to wake up word detection, true Be scheduled in voice messaging identify wake up word when, wake up the conversational system of terminal device;It is unidentified in voice messaging determining When waking up word out, the conversational system of terminal device is not waken up.

When terminal device will play setting audio file in detecting setting broadcast message class application program, determination needs to be broadcast Putting voice messaging will play.Set application program of the broadcast message class application program as audio file can be played.Set audio text Part may include music file and talking e-book file.

The corresponding set of words of voice messaging to be played is the set of the everyday expressions of voice messaging to be played.According to setting The everyday expressions of the available voice messaging to be played of audio file, to generate word collection corresponding with voice messaging to be played It closes.

Step 102, terminal device are according to set of words and the preset similarity degree waken up between word, to receiving Voice messaging carry out wake up word detection.

Wherein, each of set of words word is calculated according to preset Words similarity algorithm to call out with preset Similarity degree between awake word.Then according to each word and the preset similarity degree waken up between word, and in advance If similarity threshold whether determine in set of words comprising waking up the similar word of word and waking up the identical word of word.Specifically , preset similarity threshold includes similar word threshold value and identical word threshold value.Similar word threshold value is less than identical word threshold Value.When word and it is preset wake up word between similarity degree be greater than similar word threshold value, and be less than identical word threshold value When, determine that word is the similar word for waking up word;When the similarity degree between word and preset wake-up word is greater than identical When word threshold value, determine that word is the identical word for waking up word.

Terminal device believes the voice received according to set of words and the preset similarity degree waken up between word Breath carries out waking up word detection.Set of words and the preset similarity degree waken up between word include:Include in set of words It wakes up the similar word of word and does not include the identical word for waking up word;Terminal device determines in set of words according to similarity degree Not comprising the similar word for waking up word and not comprising the identical word for waking up word;Terminal device is determining word according to similarity degree Do not include the similar word for waking up word in set and does not include the identical word for waking up word.

Specifically, terminal device is in determining set of words comprising waking up the similar word of word and not including the phase of wake-up word When with word, needs according to the corresponding recognition result of voice messaging and wake up the matching score and voice messaging pair between word Matching score between the recognition result answered and similar word judges whether to identify wake-up word in voice messaging.In basis The corresponding recognition result of voice messaging and the matching score for waking up between word be more than or equal to the corresponding recognition result of voice messaging with When matching score between similar word, determination identifies wake-up word in voice messaging;According to the corresponding knowledge of voice messaging Other result is less than matching between the corresponding recognition result of voice messaging and similar word with the matching score waken up between word Timesharing, determining wake-up word out unidentified in voice messaging, does not wake up the default application of terminal device or terminal device, The case where avoiding false wake-up.Terminal device according to similarity degree determine in set of words do not include wake up word similar word and When not comprising the identical word for waking up word, directly carried out waking up word detection according to preset wake-up word.Terminal device is according to phase When determining the identical word comprising wake-up word in set of words like degree, need to carry out body to user according to vocal print feature according to elder generation Part verifying, with confirm the voice messaging that receives whether be user voice messaging.It is use in the voice messaging that confirmation receives When the voice messaging at family, directly carried out waking up word detection according to preset wake-up word.

When determination identifies in voice messaging and wakes up word, the default application of terminal device or terminal device is waken up;? When determining the unidentified word of wake-up out in voice messaging, the default application of terminal device or terminal device is not waken up.

A kind of audio recognition method provided in this embodiment, by terminal device, determination has language to be played in sleep procedure Message breath be when will play, and obtain set of words corresponding with voice messaging to be played, and according to set of words with preset Wake up word between similarity degree, to the voice messaging received carry out wake up word detection, can carry out wake up identification when, Targetedly the voice messaging played in environment is shielded according to voice messaging to be played, avoids the occurrence of the feelings of false wake-up Condition optimizes to identification is waken up, improves user experience.

Embodiment two

Fig. 2 is a kind of flow chart of audio recognition method provided by Embodiment 2 of the present invention, and the present embodiment is in above-mentioned each reality On the basis of applying example, step 102 is optimized:Terminal device is according to the phase between set of words and preset wake-up word Like degree, the voice messaging received is carried out to wake up word detection, including:Terminal device is determining word collection according to similarity degree When in conjunction comprising waking up the similar word of word and not including the identical word for waking up word, detect in the voice messaging received whether In the presence of the corresponding voice signal of wake-up word;In detecting voice messaging when voice signal corresponding in the presence of wake-up word, acquisition is called out The awake corresponding voice signal of word matches score with the first of wake-up word, and wakes up the corresponding voice signal of word and similar word Second matching score, and compare the size of the first matching score and the second matching score;It is more than or equal to the in the first matching score When two matching scores, determination identifies wake-up word in voice messaging.

As shown in Fig. 2, this method includes:

Step 201, terminal device determine when having voice messaging to be played that will play in sleep procedure, obtain with wait broadcast Put the corresponding set of words of voice messaging.

Optionally, terminal device determination in sleep procedure has voice messaging to be played that will play, including:Terminal device Periodically to setting broadcast message class application program detects in local system and/or in associate device, setting broadcast message class is being detected When will play setting audio file in application program, determination has voice messaging to be played that will play;And/or terminal device exists When receiving the audio file play cuing information that local system and/or associate device are sent, determination has voice messaging to be played It will play.

Wherein, terminal device periodically examines setting broadcast message class application program in local system according to the preset period It surveys, when will play setting audio file in detecting setting broadcast message class application program, determination has voice messaging to be played i.e. It will play.Set application program of the broadcast message class application program as audio file can be played.Setting audio file may include sound Music file and talking e-book file.

Optionally, terminal device periodically detects setting broadcast message class application program in associate device, is detecting to set When determining that setting audio file will be played in broadcast message class application program, determination has voice messaging to be played that will play.Association is set Standby can be other terminal devices that the same server is connect with terminal device.Optionally, associate device can be and terminal Equipment shares other terminal devices of the same user account.User account is for recording the user name and password of user, being subordinate to The group of category, the personal document of accessible Internet resources and user and setting.

Optionally, terminal device is periodically to setting broadcast message class application program is examined in local system and in associate device It surveys, when will play setting audio file in detecting setting broadcast message class application program, determination has voice messaging to be played i.e. It will play.

Optionally, when receiving the audio file play cuing information of local system transmission, determination needs terminal device Playing voice messaging will play.

Wherein, when broadcast message class application program is set in local system will play setting audio file, audio file is sent Play cuing information.Terminal device can determine this according to the audio file play cuing information for receiving local system transmission Broadcast message class application program is set in machine system will play setting audio file, i.e. determination has voice messaging to be played that will broadcast It puts.

Optionally, when receiving the audio file play cuing information of associate device transmission, determination needs terminal device Playing voice messaging will play.

Wherein, when broadcast message class application program is set in associate device will play setting audio file, audio file is sent Play cuing information.Terminal device can determine pass according to the audio file play cuing information for receiving associate device transmission Setting broadcast message class application program will play setting audio file in connection equipment, i.e. determination has voice messaging to be played that will broadcast It puts.

Optionally, terminal device is in the audio file play cuing information for receiving local system and associate device transmission When, determination has voice messaging to be played that will play.

Optionally, terminal device is periodically to setting broadcast message class application program is examined in local system and in associate device It surveys, when will play setting audio file in detecting setting broadcast message class application program, determination has voice messaging to be played i.e. While playing, terminal device when receiving the audio file play cuing information that local system and associate device are sent, Determination has voice messaging to be played that will play.

Optionally, set of words corresponding with voice messaging to be played is obtained, including:Obtain Jie of voice messaging to be played Continue information;The everyday expressions of voice messaging to be played are obtained according to recommended information, generate word corresponding with voice messaging to be played Language set.

Wherein, the recommended information of voice messaging to be played be it is preset will for introducing that voice messaging to be played is corresponding The information of the content of the setting audio file of broadcasting.After the recommended information for obtaining voice messaging to be played, obtained according to recommended information The everyday expressions of voice messaging to be played are taken, set of words corresponding with voice messaging to be played is generated.Specifically, by default Statistic algorithm extracts each of recommended information statistical nature of the word in recommended information, is then screened according to statistical nature Frequency of occurrence reaches the word of default frequency threshold value out, determines it as the everyday expressions of voice messaging to be played.According to really Fixed whole everyday expressions generate set of words corresponding with voice messaging to be played.

Optionally, set of words corresponding with voice messaging to be played is obtained, including:It is corresponding to obtain voice messaging to be played The setting audio file that will be played;Obtain the everyday expressions of voice messaging to be played according to setting audio file, generate with The corresponding set of words of voice messaging to be played.

Wherein, each of setting audio file word is extracted in setting audio file by default statistic algorithm Then statistical nature filters out the word that frequency of occurrence reaches default frequency threshold value according to statistical nature, determines it as wait broadcast Put the everyday expressions of voice messaging.Word collection corresponding with voice messaging to be played is generated according to identified whole everyday expressions It closes.

Step 202, terminal device are determining the similar word and not in set of words comprising waking up word according to similarity degree When identical word comprising waking up word, detect in the voice messaging received with the presence or absence of waking up the corresponding voice signal of word.

Wherein, terminal device is determining the similar word comprising wake-up word in set of words according to similarity degree and is not including When waking up the identical word of word, detect in the voice messaging received with the presence or absence of the corresponding voice signal of wake-up word.Specifically, The acoustic feature of voice messaging is extracted, and acoustic feature is input in preset speech recognition modeling, passes through preset voice Identification model identifies the voice messaging received, obtains the corresponding recognition result of voice messaging, and it is corresponding to calculate voice messaging Recognition result and wake up word between matching score, i.e., first matching score.Wherein, the range of matching score can be set to 0 Assign to 10 points.Matching score is higher, and matching degree is better.The range of matching score can be set according to actual needs.

After acquisition first matches score, it can be received according to the first matching score and preset first matching threshold to determine To voice messaging in the presence or absence of waking up the corresponding voice signal of word.When the first matching score is more than or equal to the first preset threshold When, show that the corresponding recognition result of voice messaging is matched with word is waken up, that is, detects in the voice messaging received and there is wake-up word Corresponding voice signal;When first matching score less than the first preset threshold when, show the corresponding recognition result of voice messaging with It wakes up word and mismatches, that is, connect and the corresponding voice signal of wake-up word is not present in the voice messaging that detection receives.For example, matching The range of score can be set to 0 and assign to 10 points, and the first preset threshold is 8 points.

Step 203 when voice signal corresponding in the presence of wake-up word, obtains voice messaging and calls out in detecting voice messaging Wake up the first matching score of word and voice messaging with the second of similar word match score, and compare the first matching score with The size of second matching score.

Wherein, voice messaging and the first matching score for waking up word be the corresponding recognition result of voice messaging and wake up word it Between matching score.Second matching score of voice messaging and similar word is the corresponding recognition result of voice messaging and similar word Matching score between language.

When detecting voice signal corresponding in the presence of wake-up word in the voice messaging received, the corresponding knowledge of voice messaging is obtained Matching score between other result and wake-up word, i.e., the first matching score, and voice is calculated by preset speech recognition modeling Matching score between the corresponding recognition result of information and similar word, i.e., the second matching score.It obtains voice messaging and wakes up First matching score of word and after voice messaging matches score with the second of similar word compares the first matching score and the The size of two matching scores.

Step 204, when the first matching score is more than or equal to the second matching score, determine and identify and calls out in voice messaging Awake word.

Wherein, the first matching score is greater than the second matching score, shows the corresponding recognition result of voice messaging and wakes up word Matching degree recognition result more corresponding than voice messaging and the matching degree of similar word it is high, it is determined that know in voice messaging Word Chu not waken up;First matching score is equal to the second matching score, shows the corresponding recognition result of voice messaging and wakes up word Matching degree recognition result more corresponding than voice messaging is close to the matching degree of similar word, it is determined that knows in voice messaging Word Chu not waken up;First matching score shows the corresponding recognition result of voice messaging and wakes up word less than the second matching score Matching degree recognition result more corresponding than voice messaging is low to the matching degree of similar word, it is determined that does not know in voice messaging Word Chu not waken up.

The corresponding recognition result of voice messaging with wake up word matching degree recognition result more corresponding than voice messaging with The matching degree height or the corresponding recognition result of voice messaging of similar word and the matching degree for waking up word are more corresponding than voice messaging Recognition result in situation similar in the matching degree of similar word, determination wake-up word is identified in voice messaging.

When determination identifies in voice messaging and wakes up word, the default application of terminal device or terminal device is waken up;? When determining the unidentified word of wake-up out in voice messaging, the default application of terminal device or terminal device is not waken up.

A kind of audio recognition method provided in this embodiment, by terminal device comprising waking up word in determining set of words Similar word and do not include wake up word identical word in the case where, in detecting voice messaging exist wake up word it is corresponding When voice signal, compares the corresponding voice signal of wake-up word with the first of wake-up word and match score, and wake up the corresponding language of word The size that sound signal matches score with the second of similar word;When the first matching score is more than or equal to the second matching score, really It is scheduled in voice messaging and identifies wake-up word, it can be when carrying out waking up identification, targetedly according to voice messaging to be played In the similar word of the wake-up word the case where voice messaging played in environment is shielded, avoids the occurrence of false wake-up.

Embodiment three

Fig. 3 is a kind of flow chart for audio recognition method that the embodiment of the present invention three provides, and the present embodiment is in above-mentioned implementation On the basis of example, step 102 is optimized:Terminal device is according to similar between set of words and preset wake-up word Degree carries out the voice messaging received to wake up word detection, including:Terminal device is determining set of words according to similarity degree In when not including the similar word for waking up word and not including the identical word for waking up word, detect in the voice messaging received whether In the presence of the corresponding voice signal of wake-up word;In detecting voice messaging when voice signal corresponding in the presence of wake-up word, determine Wake-up word is identified in voice messaging.

As shown in figure 3, this method includes:

Step 301, terminal device in sleep procedure determine have voice messaging to be played that will play when, obtain with to Play the corresponding set of words of voice messaging.

Step 302, terminal device according to similarity degree determine in set of words do not include wake up word similar word and When not comprising the identical word for waking up word, detect in the voice messaging received with the presence or absence of the corresponding voice signal of wake-up word.

Wherein, terminal device is determining the similar word not comprising wake-up word in set of words according to similarity degree and is not wrapping When containing the identical word for waking up word, detect in the voice messaging received with the presence or absence of the corresponding voice signal of wake-up word.Specifically , the acoustic feature of voice messaging is extracted, and acoustic feature is input in preset speech recognition modeling, passes through preset language Sound identification model identifies the voice messaging received, obtains the corresponding recognition result of voice messaging, and calculate voice messaging pair Matching score between the recognition result answered and wake-up word, i.e., the first matching score.Wherein, the range of matching score can be set to 0 assigns to 10 points.Matching score is higher, and matching degree is better.The range of matching score can be set according to actual needs.

After acquisition first matches score, it can be received according to the first matching score and preset first matching threshold to determine To voice messaging in the presence or absence of waking up the corresponding voice signal of word.When the first matching score is more than or equal to the first preset threshold When, show that the corresponding recognition result of voice messaging is matched with word is waken up, that is, detects in the voice messaging received and there is wake-up word Corresponding voice signal;When first matching score less than the first preset threshold when, show the corresponding recognition result of voice messaging with It wakes up word and mismatches, that is, connect and the corresponding voice signal of wake-up word is not present in the voice messaging that detection receives.For example, matching The range of score can be set to 0 and assign to 10 points, and the first preset threshold is 8 points.

Step 303 when voice signal corresponding in the presence of wake-up word, determines in voice messaging in detecting voice messaging Identify wake-up word.

Wherein, in detecting voice messaging when voice signal corresponding in the presence of wake-up word, determination is known in voice messaging Word Chu not be waken up, then wakes up the default application of terminal device or terminal device;There is no wake up word in detecting voice messaging When corresponding voice signal, determine it is unidentified in voice messaging go out to wake up word, then not to the pre- of terminal device or terminal device If using being waken up.

A kind of audio recognition method provided in this embodiment is not included in determining set of words by terminal device and is waken up The similar word of word and in the case where not including the identical word for waking up word, is detected in the voice messaging received with the presence or absence of calling out The awake corresponding voice signal of word;In detecting voice messaging when voice signal corresponding in the presence of wake-up word, determines and believe in voice The case where identifying wake-up word in breath, the similar word and identical word that wake up word can be not present in voice messaging to be played Under, wake-up identification is directly carried out according to wake-up word.

Example IV

Fig. 4 is a kind of flow chart for audio recognition method that the embodiment of the present invention four provides, and the present embodiment is in above-mentioned implementation On the basis of example, step 102 is optimized:Terminal device is according to similar between set of words and preset wake-up word Degree carries out the voice messaging received to wake up word detection, including:Terminal device is determining set of words according to similarity degree In comprising wake up word identical word when, the corresponding vocal print feature of voice messaging is determined according to the voice messaging received;Judgement Whether vocal print feature matches with default vocal print feature;When vocal print feature and default vocal print feature match, detection voice letter With the presence or absence of the corresponding voice signal of wake-up word in breath;Exist in detecting voice messaging and wakes up the corresponding voice signal of word When, determination identifies wake-up word in voice messaging.

As shown in figure 4, this method includes:

Step 401, terminal device determine when having voice messaging to be played that will play in sleep procedure, obtain with wait broadcast Put the corresponding set of words of voice messaging.

Step 402, terminal device are when determining the identical word comprising wake-up word in set of words according to similarity degree, root The corresponding vocal print feature of voice messaging is determined according to the voice messaging received.

Wherein it is determined that at the voice messaging that the mode of the corresponding vocal print feature of voice messaging can receive for docking Reason, and then extract the corresponding vocal print feature of voice messaging received.

Step 403 judges whether vocal print feature matches with default vocal print feature.

Wherein, the vocal print feature that vocal print feature is the preset user using equipment is preset.The acquisition of vocal print feature Mode can directly set for user, or analyzed according to the voice signal that user inputs, and then obtain user's Vocal print feature.Optionally, default vocal print feature may include multiple vocal print features.

Step 404, when vocal print feature is matched with default vocal print feature, detect voice messaging in the presence or absence of wake up word pair The voice signal answered.

Wherein, vocal print feature is matched with default vocal print feature, illustrates that the voice messaging received is using terminal equipment The voice messaging of user's input then detects in voice messaging with the presence or absence of the corresponding voice signal of wake-up word;Vocal print feature and pre- If vocal print feature mismatches, then illustrate the voice messaging received not and be the voice messaging of user's input of using terminal equipment, It may be the interference voice messaging in environment, be no longer further processed.

Step 405 when voice signal corresponding in the presence of wake-up word, determines in voice messaging in detecting voice messaging Identify wake-up word.

Wherein, in detecting voice messaging when voice signal corresponding in the presence of wake-up word, determination is known in voice messaging Word Chu not be waken up, then wakes up the default application of terminal device or terminal device;There is no wake up word in detecting voice messaging When corresponding voice signal, determine it is unidentified in voice messaging go out to wake up word, then not to the pre- of terminal device or terminal device If using being waken up.

A kind of audio recognition method provided in this embodiment, by determining in set of words in terminal device comprising waking up word Identical word in the case where, the corresponding vocal print feature of voice messaging is determined according to the voice messaging received, and judge vocal print Whether feature matches with default vocal print feature, when vocal print feature and default vocal print feature match, detects in voice messaging With the presence or absence of the corresponding voice signal of word is waken up, there can be the case where identical word for waking up word in voice messaging to be played Under, wake-up identification is carried out according to vocal print feature, the interference voice messaging in shielding environment.

Embodiment five

Fig. 5 is a kind of structural block diagram for speech recognition equipment that the embodiment of the present invention five provides.As shown in figure 5, the device Including:

Set of words obtains module 501 and wakes up word detection module 502.

Wherein, set of words obtains module 501, and for terminal device, determination has voice messaging to be played in sleep procedure When will play, set of words corresponding with voice messaging to be played is obtained;Word detection module 502 is waken up, terminal device is used for According to set of words and the preset similarity degree waken up between word, the voice messaging received is carried out to wake up word inspection It surveys.

A kind of speech recognition equipment provided in this embodiment, by terminal device in sleep procedure determine have it is to be played When voice messaging will play, obtain corresponding with voice messaging to be played set of words, and according to set of words with set in advance The fixed similarity degree waken up between word carries out the voice messaging received to wake up word detection, can carry out wake-up identification When, targetedly the voice messaging played in environment is shielded according to voice messaging to be played, avoids the occurrence of false wake-up The case where, it is optimized to identification is waken up, improves user experience.

On the basis of the various embodiments described above, set of words obtains module 501 and may include:

Information periodic detection unit is periodically played to setting in local system and/or in associate device for terminal device Class application program is detected, and when will play setting audio file in detecting setting broadcast message class application program, determination has Voice messaging to be played will play;And/or

Information receiving unit, for terminal device in the audio file for receiving local system and/or associate device transmission When play cuing information, it is determined that have voice messaging to be played that will play.

On the basis of the various embodiments described above, waking up word detection module 502 may include:

First detecting signal unit determines in set of words according to similarity degree comprising waking up word for terminal device Similar word and do not include wake up word identical word when, detect in the voice messaging received with the presence or absence of wake up word it is corresponding Voice signal;

Score comparing unit is matched, when for voice signal corresponding in the presence of wake-up word in detecting voice messaging, is obtained It takes voice messaging to match score and voice messaging with the first of wake-up word and matches score with the second of similar word, and compare The size of first matching score and the second matching score;

First recognition unit, for determining in voice messaging when the first matching score is more than or equal to the second matching score In identify wake-up word.

On the basis of the various embodiments described above, waking up word detection module 502 may include:

Second signal detection unit determines in set of words according to similarity degree not comprising wake-up word for terminal device Similar word and when not including the identical word for waking up word, detect corresponding with the presence or absence of word is waken up in the voice messaging received Voice signal;

Second recognition unit determines when for voice signal corresponding in the presence of wake-up word in detecting voice messaging Wake-up word is identified in voice messaging.

On the basis of the various embodiments described above, waking up word detection module 502 may include:

Vocal print feature determination unit determines in set of words according to similarity degree comprising waking up word for terminal device When identical word, the corresponding vocal print feature of voice messaging is determined according to the voice messaging received;

Vocal print judging unit, for judging whether vocal print feature matches with default vocal print feature;

Third detecting signal unit, for when vocal print feature is matched with default vocal print feature, detecting in voice messaging to be It is no to there is the corresponding voice signal of wake-up word;

Third recognition unit determines when for voice signal corresponding in the presence of wake-up word in detecting voice messaging Wake-up word is identified in voice messaging.

On the basis of the various embodiments described above, set of words obtains module 501 and may include:

Recommended information acquiring unit, for obtaining the recommended information of voice messaging to be played;

Set of words generation unit, for obtaining the everyday expressions of voice messaging to be played according to recommended information, generate with The corresponding set of words of voice messaging to be played.

Voice provided by any embodiment of the invention, which can be performed, in speech recognition equipment provided by the embodiment of the present invention knows Other method has the corresponding functional module of execution method and beneficial effect.

Embodiment six

Fig. 6 is a kind of structural schematic diagram for equipment that the embodiment of the present invention six provides.Fig. 6, which is shown, to be suitable for being used to realizing this The block diagram of the example devices 612 of invention embodiment.The equipment that Fig. 6 is shown is only an example, should not be to of the invention real The function and use scope for applying example bring any restrictions.

As shown in fig. 6, equipment 612 is showed in the form of universal computing device.The component of equipment 612 may include but unlimited In:One or more processor or processing unit 616, system storage 628 connect different system components (including system Memory 628 and processing unit 616) bus 618.

Bus 618 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Equipment 612 typically comprises a variety of computer system readable media.These media can be it is any can be by equipment The usable medium of 612 access, including volatile and non-volatile media, moveable and immovable medium.

System storage 628 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 630 and/or cache memory 632.Equipment 612 may further include other removable/not removable Dynamic, volatile/non-volatile computer system storage medium.Only as an example, storage system 634 can be used for read and write can not Mobile, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 6, Ke Yiti For the disc driver for being read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to moving non-volatile light The CD drive of disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver It can be connected by one or more data media interfaces with bus 618.Memory 628 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention each The function of embodiment.

Program/utility 640 with one group of (at least one) program module 642, can store in such as memory In 628, such program module 642 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 642 Usually execute the function and/or method in embodiment described in the invention.

Equipment 612 can also be logical with one or more external equipments 614 (such as keyboard, sensing equipment, display 624 etc.) Letter, can also be enabled a user to one or more equipment interact with the equipment 612 communicate, and/or with make the equipment 612 Any equipment (such as network interface card, modem etc.) that can be communicated with one or more of the other calculating equipment communicates.This Kind communication can be carried out by input/output (I/O) interface 622.Also, equipment 612 can also by network adapter 620 with One or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Such as Shown in figure, network adapter 620 is communicated by bus 618 with other modules of equipment 612.It should be understood that although not showing in figure Out, other hardware and/or software module can be used with bonding apparatus 612, including but not limited to:It is microcode, device driver, superfluous Remaining processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processing unit 616 by the program that is stored in system storage 628 of operation, thereby executing various function application with And data processing, such as realize audio recognition method provided by the embodiment of the present invention.

Namely:Terminal device determines when having voice messaging to be played that will play in sleep procedure, obtain with it is to be played The corresponding set of words of voice messaging;Terminal device according to set of words and it is preset wake up word between similarity degree, The voice messaging received is carried out to wake up word detection.

Embodiment seven

The embodiment of the present invention seven additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should Audio recognition method provided by the embodiment of the present invention is realized when computer program is executed by processor.

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes:Tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (10)

1. a kind of audio recognition method, which is characterized in that including:
Terminal device is determined in sleep procedure when having voice messaging to be played that will play, and is obtained and is believed with the voice to be played Cease corresponding set of words;
The terminal device is according to the set of words and the preset similarity degree waken up between word, to the language received Message breath carries out waking up word detection.
2. the method according to claim 1, wherein terminal device determination in sleep procedure has voice to be played Information will play, including:
The terminal device is periodically detected to setting broadcast message class application program in local system and/or in associate device, When detecting that setting audio file will be played in the setting broadcast message class application program, determination has voice messaging to be played will It plays;And/or
The terminal device is when receiving the audio file play cuing information that local system and/or associate device are sent, really There is voice messaging to be played that will play surely.
3. method according to claim 1 or 2, which is characterized in that the terminal device is according to the set of words and in advance What is first set wakes up the similarity degree between word, carries out waking up word detection to the voice messaging received, including:
The terminal device determines in the set of words according to the similarity degree comprising the similar word for waking up word And when not including the identical word for waking up word, detect corresponding with the presence or absence of the wake-up word in the voice messaging received Voice signal;
In detecting the voice messaging when voice signal corresponding there are the wake-up word, the voice messaging and institute are obtained It states the first matching score for waking up word and the voice messaging and matches score with the second of the similar word, and compare institute State the size of the first matching score and the second matching score;
When the first matching score is more than or equal to the second matching score, determination identifies institute in the voice messaging State wake-up word.
4. method according to claim 1 or 2, which is characterized in that the terminal device is according to the set of words and in advance What is first set wakes up the similarity degree between word, carries out waking up word detection to the voice messaging received, including:
The terminal device does not include the similar word of the wake-up word determining according to the similarity degree in the set of words Language and do not include it is described wake up word identical word when, detect in the voice messaging received with the presence or absence of the wake-ups word correspondence Voice signal;
In detecting the voice messaging when voice signal corresponding there are the wake-up word, determine in the voice messaging Identify the wake-up word.
5. method according to claim 1 or 2, which is characterized in that the terminal device is according to the set of words and in advance What is first set wakes up the similarity degree between word, carries out waking up word detection to the voice messaging received, including:
The terminal device determines in the set of words according to the similarity degree comprising the identical word for waking up word When, the corresponding vocal print feature of the voice messaging is determined according to the voice messaging received;
Judge whether the vocal print feature matches with default vocal print feature;
When the vocal print feature and default vocal print feature match, detect corresponding with the presence or absence of word is waken up in the voice messaging Voice signal;
In detecting the voice messaging when voice signal corresponding in the presence of wake-up word, determination identifies in the voice messaging The wake-up word out.
6. the method according to claim 1, wherein obtaining word collection corresponding with the voice messaging to be played It closes, including:
Obtain the recommended information of the voice messaging to be played;
The everyday expressions of the voice messaging to be played are obtained according to the recommended information, are generated and the voice messaging to be played Corresponding set of words.
7. a kind of speech recognition equipment, which is characterized in that including:
Set of words obtains module, when for terminal device, determination has voice messaging to be played that will play in sleep procedure, Obtain set of words corresponding with the voice messaging to be played;
Word detection module is waken up, for the terminal device according to the phase between the set of words and preset wake-up word Like degree, the voice messaging received is carried out to wake up word detection.
8. device according to claim 7, which is characterized in that the set of words obtains module and includes:
Information periodic detection unit is periodically played to setting in local system and/or in associate device for the terminal device Class application program is detected, when will play setting audio file in detecting the setting broadcast message class application program, really There is voice messaging to be played that will play surely;And/or
Information receiving unit, for the terminal device in the audio file for receiving local system and/or associate device transmission When play cuing information, determination has voice messaging to be played that will play.
9. a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as audio recognition method as claimed in any one of claims 1 to 6.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Such as audio recognition method as claimed in any one of claims 1 to 6 is realized when processor executes.
CN201810615353.0A 2018-06-14 2018-06-14 A kind of audio recognition method, device, equipment and storage medium CN108831477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810615353.0A CN108831477A (en) 2018-06-14 2018-06-14 A kind of audio recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810615353.0A CN108831477A (en) 2018-06-14 2018-06-14 A kind of audio recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN108831477A true CN108831477A (en) 2018-11-16

Family

ID=64141911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810615353.0A CN108831477A (en) 2018-06-14 2018-06-14 A kind of audio recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108831477A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020102991A1 (en) * 2018-11-20 2020-05-28 深圳市欢太科技有限公司 Method and apparatus for waking up device, storage medium and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050274178A1 (en) * 2004-06-14 2005-12-15 General Electric Company Multi-bore pressure sensing probe
CN105206271A (en) * 2015-08-25 2015-12-30 北京宇音天下科技有限公司 Intelligent equipment voice wake-up method and system for realizing method
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN106448663A (en) * 2016-10-17 2017-02-22 海信集团有限公司 Voice wakeup method and voice interaction device
CN106653031A (en) * 2016-10-17 2017-05-10 海信集团有限公司 Voice wake-up method and voice interaction device
CN106910496A (en) * 2017-02-28 2017-06-30 广东美的制冷设备有限公司 Intelligent electrical appliance control and device
CN107622770A (en) * 2017-09-30 2018-01-23 百度在线网络技术(北京)有限公司 voice awakening method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050274178A1 (en) * 2004-06-14 2005-12-15 General Electric Company Multi-bore pressure sensing probe
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN105206271A (en) * 2015-08-25 2015-12-30 北京宇音天下科技有限公司 Intelligent equipment voice wake-up method and system for realizing method
CN106448663A (en) * 2016-10-17 2017-02-22 海信集团有限公司 Voice wakeup method and voice interaction device
CN106653031A (en) * 2016-10-17 2017-05-10 海信集团有限公司 Voice wake-up method and voice interaction device
CN106910496A (en) * 2017-02-28 2017-06-30 广东美的制冷设备有限公司 Intelligent electrical appliance control and device
CN107622770A (en) * 2017-09-30 2018-01-23 百度在线网络技术(北京)有限公司 voice awakening method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020102991A1 (en) * 2018-11-20 2020-05-28 深圳市欢太科技有限公司 Method and apparatus for waking up device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
US10249304B2 (en) Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
EP3321928B1 (en) Operation of a virtual assistant on an electronic device
AU2014200407B2 (en) Method for Voice Activation of a Software Agent from Standby Mode
US10679619B2 (en) Method of providing voice command and electronic device supporting the same
CN106448678B (en) Method and apparatus for executing voice command in electronic device
US8968103B2 (en) Systems and methods for digital multimedia capture using haptic control, cloud voice changer, and protecting digital multimedia privacy
US9336773B2 (en) System and method for standardized speech recognition infrastructure
US8032364B1 (en) Distortion measurement for noise suppression system
RU2373584C2 (en) Method and device for increasing speech intelligibility using several sensors
US10699702B2 (en) System and method for personalization of acoustic models for automatic speech recognition
US9940929B2 (en) Extending the period of voice recognition
US6775651B1 (en) Method of transcribing text from computer voice mail
EP2987312B1 (en) System and method for acoustic echo cancellation
US8983846B2 (en) Information processing apparatus, information processing method, and program for providing feedback on a user request
US20140244273A1 (en) Voice-controlled communication connections
CN103971680B (en) A kind of method, apparatus of speech recognition
US9412371B2 (en) Visualization interface of continuous waveform multi-speaker identification
US7162421B1 (en) Dynamic barge-in in a speech-responsive system
US10045140B2 (en) Utilizing digital microphones for low power keyword detection and noise suppression
US9953643B2 (en) Selective transmission of voice data
CN103811003B (en) A kind of audio recognition method and electronic equipment
JP2017516167A (en) Perform actions related to an individual's presence
US9424838B2 (en) Pattern processing system specific to a user group
JP5053285B2 (en) Determining audio device quality
CN103280216B (en) Improve the speech recognition device the relying on context robustness to environmental change

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination