CN109272991A - Method, apparatus, equipment and the computer readable storage medium of interactive voice - Google Patents
Method, apparatus, equipment and the computer readable storage medium of interactive voice Download PDFInfo
- Publication number
- CN109272991A CN109272991A CN201811148245.3A CN201811148245A CN109272991A CN 109272991 A CN109272991 A CN 109272991A CN 201811148245 A CN201811148245 A CN 201811148245A CN 109272991 A CN109272991 A CN 109272991A
- Authority
- CN
- China
- Prior art keywords
- user
- voice
- command
- electronic equipment
- voice command
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 32
- 238000003860 storage Methods 0.000 title claims abstract description 22
- 230000004913 activation Effects 0.000 claims abstract description 92
- 230000004044 response Effects 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims description 49
- 230000001755 vocal effect Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 8
- 230000000630 rising effect Effects 0.000 claims 1
- 230000002618 waking effect Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 241000238558 Eucarida Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
- Telephone Function (AREA)
Abstract
Embodiment of the disclosure provides the method, apparatus, equipment and computer readable storage medium for interactive voice.It is a kind of at electronic equipment execute voice interactive method include that the identity of the user is identified based on the first voice command in response to receiving the first voice command from the user.This method further includes the identity based on mark to be configured to the first voice command and predetermined activation command carrying out matched matching threshold.This method further includes determining whether the first voice command matches with predetermined activation command based on matching threshold.In addition, this method further includes in response to determining that the first voice command is matched with predetermined activation command, so that electronic equipment enters state of activation, electronic equipment can carry out interactive voice with user in active state.By this method, embodiment of the disclosure can either improve registered users to the wake-up rate of electronic equipment, can also be effectively reduced the false wake-up rate under noise scenarios.
Description
Technical field
The disclosure relates generally to field of speech recognition, more particularly, to voice interactive method, device, equipment and meter
Calculation machine readable medium.
Background technique
With the development of speech recognition technology, intelligent sound equipment has more been generally applicable to the daily life of people
Living, work, even in production process.The example of intelligent sound equipment includes smart phone, intelligent sound box, wearable device etc.,
It allows people to interact by voice mode.For power saving and the purpose of misrecognition is reduced, it can be with entering
Before user carries out the state of activation of interactive voice, the intelligent sound equipment under standby mode usually requires detection first and uses
The specific activation command (for example, waking up word) that family issues.The process is also referred to as " voice wake-up ".Voice wake-up can be lower
It is realized under power consumption, some wake-up word predetermined is detected.It, will be intelligent when detecting that user says the wake-up word
Speech ciphering equipment activation enables the intelligent sound equipment to carry out normal voice with user and interacts.
The performance that voice wakes up mainly includes wake-up rate and false wake-up rate.Wake-up rate is referred to when in the voice command received
In the presence of the ratio successfully detected when waking up word, and there will be no the voice commands for waking up word to be mistaken for depositing for the reference of false wake-up rate
In the ratio for waking up word.It is generally desirable to improve the wake-up rate of speech ciphering equipment and reduce its false wake-up rate, to improve user's body
It tests.However, improving the raising that wake-up rate also necessarily brings false wake-up rate in traditional scheme.
Summary of the invention
According to an example embodiment of the present disclosure, the scheme for interactive voice is provided.
In the first aspect of the disclosure, a kind of voice interactive method executed at electronic equipment is provided.This method
Including identifying the identity of the user based on the first voice command in response to receiving the first voice command from the user.It should
Method further includes the identity based on mark to be configured to the first voice command and predetermined activation command carrying out matched matching
Threshold value.This method further includes determining whether the first voice command matches with predetermined activation command based on matching threshold.In addition, should
Method further include in response to determining that the first voice command is matched with predetermined activation command so that electronic equipment enters state of activation,
Electronic equipment can carry out interactive voice with user in active state.
In the second aspect of the disclosure, a kind of device for interactive voice is provided.The device includes: identity
Module is configured to respond to receive the first voice command from the user, the user is identified based on the first voice command
Identity;Threshold value configuration module is configured as the identity based on mark to be configured to the first voice command and predetermined activation
Order carries out matched matching threshold;Match determining module, be configured as determining based on matching threshold the first voice command and
Whether predetermined activation command matches;And active module, it is configured to respond to determine the first voice command and predetermined activation life
Matching is enabled, so that electronic equipment enters state of activation, electronic equipment can carry out interactive voice with user in active state.
In the third aspect of the disclosure, a kind of electronic equipment, including one or more processors and storage dress are provided
It sets.Storage device is for storing one or more programs.When one or more programs are executed by one or more processors, make
The method for obtaining the first aspect that one or more processors are executed according to the disclosure.
In the fourth aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, it should
The method that computer program realizes the first aspect according to the disclosure when being executed by processor.
It should be appreciated that content described in Summary be not intended to limit embodiment of the disclosure key or
Important feature, it is also non-for limiting the scope of the present disclosure.The other feature of the disclosure will become easy reason by description below
Solution.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure
It will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, in which:
Fig. 1 shows embodiment of the disclosure can be in the schematic diagram for the example context wherein realized;
Fig. 2 shows the flow charts of voice interactive method according to an embodiment of the present disclosure;
Fig. 3 shows the flow chart of the method for the identity user identity according to the realization of the disclosure;
Fig. 4 is shown according to the realization of the disclosure based on user identity come the flow chart of the method for configurations match threshold value;
Fig. 5 shows the schematic block diagram of the device for interactive voice according to the embodiment of the present disclosure;And
Fig. 6 shows the block diagram that can implement the calculating equipment of multiple embodiments of the disclosure.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing
Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this
In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that
It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
In the description of embodiment of the disclosure, term " includes " and its similar term should be understood as that opening includes,
I.e. " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " or " reality
Apply example " it should be understood as " at least one embodiment ".Term " first ", " second " etc. may refer to different or identical right
As.Hereafter it is also possible that other specific and implicit definition.
As mentioned above, for the purpose of power saving and reduction misrecognition, voice friendship can be carried out with user entering
Before mutual state of activation, the intelligent sound equipment under standby mode usually requires specific the calling out of detection user sending first
Awake word.When detecting that user says the wake-up word, intelligent sound equipment can be activated, normal so as to carry out with user
Interactive voice.
In order to realize that voice wakes up, some traditional schemes record training audio data set generally directed to scheduling wake-up word, so
Carry out the training of acoustic model for the pronunciation of scheduling wake-up word using the training audio data set afterwards.The acoustic model can be by
The pronunciation similarity score between voice command and scheduling wake-up word for determining input.If the similarity score is more than pre-
Fixed matching threshold is then judged to waking up successfully (that is, detecting wake-up word).If the similarity score is less than scheduled
With threshold value, then it is judged to waking up failure (waking up word that is, being not detected).
In these schemes, generally use two ways to improve wake-up rate: a kind of mode is acquisition instruction as much as possible
Practice audio data to carry out acoustic training model, to improve the coverage rate of acoustic model;Another way is reduced for sentencing
It is fixed wake up whether successful matching threshold so that more similarity scores are more than the matching threshold, thus be judged as waking up at
Function.The training cost that first way will lead to acoustic model significantly improves, and second is passing through reduction matching threshold to mention
The raising of false wake-up rate is inevitably resulted in while high wake-up rate.In addition, this matching scheme based on pronunciation similarity cannot
It is enough to distinguish voice, animal sound, ambient sound or the sound of machine synthesis well.Therefore, it is easy under more noisy environment
Cause higher false wake-up rate.
In accordance with an embodiment of the present disclosure, a kind of interactive voice scheme is proposed.The program is mentioned from the voice command of user
Voiceprint is taken, and based on extracted voiceprint come the identity of identity user.The program is further according to being identified
User identity is configured to voice command and predetermined activation command carrying out matched matching threshold, wherein for registration user
Matching threshold be set below the matching threshold for nonregistered user.By this method, embodiment of the disclosure can either
Registered users are improved to the wake-up rate of electronic equipment, while the false wake-up rate that can be effectively reduced under noise scenarios.
Hereinafter reference will be made to the drawings to specifically describe embodiment of the disclosure.
Fig. 1 shows embodiment of the disclosure can be in the schematic diagram for the example context 100 wherein realized.Environment 100 is total
It may include electronic equipment 110 and user 120 on body.The example of electronic equipment 110 can include but is not limited to can be with user
Carry out smart phone, intelligent sound box, the wearable device etc. of interactive voice.It should be appreciated that being described merely for exemplary purpose
The structure and function of environment 100, and do not imply that any restrictions for the scope of the present disclosure.Embodiment of the disclosure can also quilt
It is applied in the environment with different structure and/or function.
As shown in Figure 1, electronic equipment 110 for example may include speech capturing device 111 and voice processing apparatus 112.Language
The example of sound acquisition equipment 111 can include but is not limited to various microphones or microphone array etc..Speech capturing device 111
The voice command from user 120 can be captured, and the data captured are passed at voice processing apparatus 112
Reason.For example, voice processing apparatus 112 can be true when electronic equipment 110 is in standby mode (also referred to as " unactivated state ")
Whether the fixed voice command from user 120 matches with specific activation command.Predetermined activation command described herein can be packet
Include order or the scheduling wake-up word itself of scheduling wake-up word.The example for waking up word is, for example, " Siri ", " hello for small degree " etc..When
Voice processing apparatus 112 determines the voice command from user 120 when matching with specific activation command, and electronic equipment 110 can be with
It is waken up to enter state of activation.When electronic equipment 110 is active, in response to being received from speech capturing device 111
To subsequent voice order from the user, voice processing apparatus 112 can identify the voice command and the result based on identification
To execute corresponding operation, such as information inquiry, music etc..
The process executed at electronic equipment 110 is described in detail in below with reference to Fig. 2.Fig. 2 shows according to the disclosure
The flow chart for the exemplary method 200 of embodiment executed at electronic equipment 110.For example, method 200 can be by electronic equipment
Voice processing apparatus 112 in 110 executes.Each movement of method 200 is described in detail below with reference to Fig. 1.It should manage
Solution, method 200 can also include unshowned additional move and/or can be omitted shown movement.The scope of the present disclosure exists
It is unrestricted in this respect.
At frame 210, in response to receiving the first voice command from user 120, voice processing apparatus 112 is based on the
One voice command carrys out the identity of identity user 120.First voice command for example can be the voice command comprising scheduling wake-up word,
User 120, which is expected that by, says scheduling wake-up word to activate electronic equipment 110, to carry out interactive voice with electronic equipment 110.?
In some embodiments, whether voice processing apparatus 112 can determine electronic equipment 110 in unactivated state.Electronic equipment 110
Under unactivated state interactive voice can not be carried out with user 120.When voice processing apparatus 112 determines at electronic equipment 110
Under unactivated state and when receiving the first voice command from audio capturing device 111, voice processing apparatus 112 can be with base
Carry out the identity of identity user 120 in the first voice command.
Additionally or alternatively, in some embodiments, voice processing apparatus 112 can be based in the first voice command
Voiceprint carry out the identity of identity user 120.It is used as an example, Fig. 3 is shown according to the mark that is used for of the realization of the disclosure
The flow chart of the exemplary method 300 of family identity.For example, method 300 can be used as a kind of example implementation of frame 210.
In frame 310, voice processing apparatus 112 extracts the first voiceprint from the first voice command.First voiceprint
Such as may include the sound wave spectrum extracted from the first voice command, it is specific for the information of user 120.Studies have shown that
The vocal print of one people not only has specificity, but also has stability.After this people adult, vocal print is generally remained for a long time
It is relatively stable.No matter how other people deliberately imitate the sound and the tone of this people, and the vocal print of the two is different always.Therefore, sound
Line can be used for the identity for identifying speaker.In some embodiments, voice processing apparatus 112 can use it is any known or
The technology developed in the future can be identified for that the first voiceprint of the identity of user 120 to extract from the first voice command.
In frame 320, voice processing apparatus 112 obtains the second voiceprint of the registration user of electronic equipment 110.This institute
The registration user stated can be the legitimate user registered in advance to electronic equipment 110.In some embodiments, user is registered
The second voiceprint can be in the storage device for being pre-stored in and being coupled with electronic equipment 110.Therefore, at voice
The second voiceprint of registration user can be obtained from the storage device by managing device 112.Alternatively, in some embodiments, it infuses
The voice messaging of volume user can be pre-stored in the storage device being coupled with electronic equipment 110.Voice processing apparatus
112 can obtain the voice messaging of registration user from the storage device, and therefrom extract the second voiceprint (for example, with the
The extraction of one voiceprint is similar).
In frame 330, voice processing apparatus 112 determines the first voiceprint of user 120 and the second vocal print of registration user
Vocal print similarity between information.Then, in frame 340, voice processing apparatus 112 can be by identified vocal print similarity and pre-
Determine threshold value to be compared.When vocal print similarity is more than predetermined threshold, in frame 350, voice processing apparatus 112 can be by user
120 are identified as registration user.
In some embodiments, electronic equipment 110 may have multiple registration users.For example, the vocal print of multiple users is believed
Breath can be pre-stored at electronic equipment 110 (for example, in the storage device being coupled with electronic equipment 110).In this feelings
Under condition, voice processing apparatus 112 can execute method for the voiceprint of each registration user in multiple registration users
300.When voice processing apparatus 112 is determined from the first voiceprint and multiple registration users extracted in the first voice command
The voiceprint of any one match (for example, vocal print similarity be more than predetermined threshold) when, voice processing apparatus 112 can will be used
Family 120 is identified as registration user.
Back to Fig. 2, method 200 is carried out to frame 220, wherein the body of user 120 of the voice processing apparatus 112 based on mark
It part is configured to the first voice command and predetermined activation command carrying out matched matching threshold.Matching threshold will be used for really
Whether fixed first voice command matches with predetermined activation command.As previously discussed, the height of matching threshold can determine language
The sensitivity that sound wakes up.When matching threshold is lower, more voice commands will be judged as matching with predetermined activation command, from
And wake-up rate is caused to be enhanced.
Fig. 4 is shown according to the realization of the disclosure based on user identity come the exemplary method 400 of configurations match threshold value
Flow chart.For example, method 400 can be used as a kind of example implementation of frame 220.In frame 410, voice processing apparatus 112, which determines, to be used
Whether family 120 is identified as registration user.If user 120 is identified as registration user, in frame 420, voice processing apparatus
112 can configure first threshold for matching threshold.If user 120 is not identified as registration user, in frame 430, voice
Processing unit 112 can configure matching threshold to the second threshold more than first threshold.In some embodiments, first threshold
The matching threshold for being respectively used to registration user and nonregistered user can be predetermined with second threshold.That is, for note
The matching threshold of volume user is directed to the matching threshold of nonregistered user by being set below.By this method, the implementation of the disclosure
Example can effectively improve the wake-up rate of registration user.At the same time, since the matching threshold for nonregistered user is higher,
It can be effectively reduced the false wake-up rate under noise scenarios.This is because the voiceprint of noise is usually obvious with the voiceprint of people
Difference, therefore will not be identified as from registration user.
Back to Fig. 2, method 200 is carried out to frame 230, and wherein voice processing apparatus 112 is based on the matching threshold configured
To determine whether the first voice command matches with predetermined activation command.In some embodiments, voice processing apparatus 112 can be true
Similarity between fixed first voice command and predetermined activation command.When the similarity is more than configured matching threshold, language
Sound processor 112 can determine that the first voice command is matched with predetermined activation command.
Voice processing apparatus 112 can be determined based on technology that is any known or will developing the first voice command and
Similarity between predetermined activation command, and by being compared to the similarity and identified matching threshold to determine first
Whether voice command matches with predetermined activation command.List several possible examples for illustration purposes only below.However,
It should be appreciated that these examples do not constitute the limitation to the scope of the present disclosure.Embodiment of the disclosure is suitable for removing following example
Except other various situations.
In some embodiments, voice processing apparatus 112 can be compared based on acoustic feature to determine the first voice command
Similarity between predetermined activation command.For example, voice processing apparatus 112 can extract the first sound from the first voice command
Learn feature." acoustic feature " described herein may include syllable, pronouncing frequency, the sound intensity, loudness, pitch, signal-to-noise ratio, humorous make an uproar
Than, any one of lock in phenomenon, Shimmer, cepstrum coefficient etc. or any combination.For example, extracted first acoustics is special
Sign can be expressed in the form of feature vector.In addition, voice processing apparatus 112 can be based on any known or will open
The technology of hair extracts the first acoustic feature from the first voice command.Similarly, voice processing apparatus 112 can obtain predetermined
The correspondence acoustic feature (also referred to as " the second acoustic feature ") of activation command.In some embodiments, voice processing apparatus 112 can
Similarly to extract the second acoustic feature from pre-stored predetermined activation command.Alternatively, the second of predetermined activation command
Acoustic feature can be previously extracted and be stored at electronic equipment 110, therefore voice processing apparatus 112 can directly acquire
Two acoustic features.For example, the second acoustic feature can use the different forms such as feature vector, template, acoustic model to be stored.
In some embodiments, voice processing apparatus 112 can determine the by comparing the first acoustic feature and the second acoustic feature
Similarity between one voice command and predetermined activation command.
Alternatively, in some embodiments, voice processing apparatus 112 is available is directed to predetermined activation command (for example, pre-
Surely word is waken up) and trained in advance acoustic model.The acoustic model can be to word, syllable, the phoneme etc. in predetermined activation command
Basic acoustic elements are modeled, to describe its statistical property.Voice processing apparatus 112 can will be mentioned from the first voice command
The first acoustic feature taken is input to for predetermined activation command and in acoustic model trained in advance, is commented with obtaining acoustic model
Point.The scoring for example can reflect the pronunciation similarity between the first voice command and predetermined activation command.
Alternatively, in further embodiments, voice processing apparatus 112 is available preparatory for predetermined activation command
Trained identification model end to end.That is, when the acoustic feature extracted from some voice command is input to the identification mould
When type, which can directly export the result whether voice command matches with predetermined activation command.In general, this
Differentiation network is provided in identification model.For example, the identification model can determine the language by calculating the acoustic feature of input
The confidence level that sound order and predetermined activation command match, and differentiating network can be by the confidence level and set confidence level threshold
Value is compared to determine whether the voice command matches with predetermined activation command.In some embodiments, for example, speech processes
Device 112 can configure the identification model based on identified matching threshold, so that differentiation network base therein
Determine whether the voice command matches with predetermined activation command in the matching threshold.
Additionally or alternatively, in further embodiments, voice processing apparatus 112 can with any other technology or
Mode determines whether the first voice command matches with predetermined activation command, such as, but not limited to combines acoustic model and language mould
The speech recognition technology of both types, speech recognition technology based on rubbish word etc..In the case, the matching threshold configured
It will determine the height of successful match rate.That is, lower matching threshold can correspond to higher successful match rate, and it is higher
Matching threshold can correspond to lower successful match rate.Due to will be set as the matching threshold of registration user in frame 220
Lower than the matching threshold for being directed to nonregistered user, so that the voice command from registration user has with predetermined activation command
Higher successful match rate, and the voice command from nonregistered user and predetermined activation command have lower successful match
Rate.
In frame 240, when voice processing apparatus 112 determines that the first voice command and predetermined activation command match, voice
Processing unit 112 can make electronic equipment 110 enter state of activation.Electronic equipment 110 in active state can be with user
120 carry out interactive voice, such as respond to the subsequent voice order from user 120.
Additionally or alternatively, when electronic equipment 110 enters state of activation and does not connect in threshold time interval
When receiving from the second voice command of user 120, electronic equipment 110 will come back to unactivated state.That is, if user
120 expectations carry out interactive voice with electronic equipment 110 again, and user 120 needs to issue predetermined activation command (for example, saying pre-
Surely word is waken up) so that electronic equipment 110 reenters state of activation.
By above description, it can be seen that interactive voice scheme according to an embodiment of the present disclosure can be from the voice of user
Voiceprint is extracted in order, and the identity of user is identified based on extracted voiceprint.The further basis of the program
The user identity identified is configured to voice command and predetermined activation command carrying out matched matching threshold, wherein being directed to
The matching threshold of registration user is set below the matching threshold for nonregistered user.By this method, the implementation of the disclosure
Example can either improve registered users to the wake-up rate of electronic equipment, while can be effectively reduced accidentally calling out under noise scenarios
The rate of waking up.
Fig. 5 shows the schematic block diagram of the device 500 for interactive voice according to the embodiment of the present disclosure.For example, such as
Voice processing apparatus 112 shown in FIG. 1 can use device 500 to realize.As shown in figure 5, device 500 may include identity mark
Know module 510, is configured to respond to receive the first voice command from the user, be identified based on the first voice command
The identity of the user.Device 500 can also include threshold value configuration module 520, be configured as the identity based on mark to configure
For the first voice command and predetermined activation command to be carried out matched matching threshold.Device 500 can also include that matching determines
Module 530 is configured as determining whether the first voice command matches with predetermined activation command based on matching threshold.Device
500 can also include active module 540, be configured to respond to determine that the first voice command is matched with predetermined activation command,
So that electronic equipment enters state of activation, electronic equipment can carry out interactive voice with user in active state.
In some embodiments, identity module 510 includes: status determining unit, is configured to determine that electronic equipment
Whether unactivated state is in, which can not carry out interactive voice with user under unactivated state;And first
Identity unit is configured to respond to the electronic equipment and is in unactivated state and receives the first voice command, base
The identity of the user is identified in the first voice command.
In some embodiments, identity module 510 includes: the first vocal print acquiring unit, is configured as from the first language
The first voiceprint of the user is extracted in sound order;Second vocal print acquiring unit is configured as obtaining the registration of electronic equipment
The second voiceprint of user;Vocal print similarity determining unit is configured to determine that the first voiceprint and the second voiceprint
Between vocal print similarity;And the second identity unit, it is configured to respond to vocal print similarity more than predetermined threshold, it will
The user identifier is registration user.
In some embodiments, the second vocal print acquiring unit is configured as obtaining from the storage device that electronic equipment is coupled
Take the second voiceprint.
In some embodiments, threshold value configuration module 520 includes: first threshold configuration unit, is configured to respond to this
User is identified as registration user, configures first threshold for matching threshold;And second threshold configuration unit, it is configured as ringing
It should be not identified as registration user in the user, configure second threshold for matching threshold, wherein first threshold is lower than the second threshold
Value.
In some embodiments, matching determining module 530 includes: similarity determining unit, is configured to determine that the first language
Similarity between sound order and predetermined activation command;And matching determination unit, being configured to respond to the similarity is more than
Matching threshold determines that the first voice command is matched with predetermined activation command.
In some embodiments, similarity determining unit is also configured to extract the first acoustics from the first voice command
Feature;The second acoustic feature is extracted from predetermined activation command;And by comparing the first acoustic feature and the second acoustic feature,
To determine the similarity between the first voice command and predetermined activation command.
In some embodiments, matching determining module 530 includes: model configuration unit, is configured as utilizing matching threshold
It is configured to identify the identification model of predetermined activation command, so that the identification model determines that voice is ordered based on matching threshold
It enables and whether being matched with predetermined activation command;And model applying unit, it is configured as determining using the identification model being configured
Whether the first voice command matches with predetermined activation command.
In some embodiments, device 500 further includes deactivating flexible module, is configured to respond to electronic equipment and is in sharp
State living and do not receive the second voice command from the user in threshold time interval so that the electronic equipment enter it is non-
State of activation, the electronic equipment can not carry out interactive voice with user under unactivated state.
Fig. 6 shows the schematic block diagram that can be used to implement the example apparatus 600 of embodiment of the disclosure.Equipment 600
It can be used to implement electronic equipment 110 as shown in Figure 1.As shown, equipment 600 includes central processing unit (CPU) 601,
It can according to the computer program instructions being stored in read-only memory (ROM) 602 or from storage unit 608 be loaded into
Machine accesses the computer program instructions in memory (RAM) 603, to execute various movements appropriate and processing.In RAM 603
In, it can also store equipment 600 and operate required various programs and data.CPU 601, ROM 602 and RAM 603 pass through bus
604 are connected with each other.Input/output (I/O) interface 605 is also connected to bus 604.
Multiple components in equipment 600 are connected to I/O interface 605, comprising: input unit 606, such as keyboard, mouse etc.;
Output unit 607, such as various types of displays, loudspeaker etc.;Storage unit 608, such as disk, CD etc.;And it is logical
Believe unit 609, such as network interface card, modem, wireless communication transceiver etc..Communication unit 609 allows equipment 600 by such as
The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 601 executes each method as described above and processing, such as method 200,300 and/or 400.Example
Such as, in some embodiments, method 200,300 and/or 400 can be implemented as computer software programs, be physically include
In machine readable media, such as storage unit 608.In some embodiments, some or all of of computer program can be through
It is loaded into and/or is installed in equipment 600 by ROM 602 and/or communication unit 609.When computer program loads to RAM
603 and by CPU 601 execute when, the one or more steps of method as described above 200,300 and/or 400 can be executed.It is standby
Selection of land, in other embodiments, CPU 601 can be matched by other any modes (for example, by means of firmware) appropriate
It is set to execution method 200,300 and/or 400.
Function described herein can be executed at least partly by one or more hardware logic components.Example
Such as, without limitation, the hardware logic component for the exemplary type that can be used includes: field programmable gate array (FPGA), dedicated
Integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device
(CPLD) etc..
For implement disclosed method program code can using any combination of one or more programming languages come
It writes.These program codes can be supplied to the place of general purpose computer, special purpose computer or other programmable data processing units
Device or controller are managed, so that program code makes defined in flowchart and or block diagram when by processor or controller execution
Function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as stand alone software
Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, may include or is stored for
The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can
Reading medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media can include but is not limited to electricity
Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content any conjunction
Suitable combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable meter
Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM
Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or
Any appropriate combination of above content.
Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order
Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result.
Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above
Body realizes details, but these are not construed as the limitation to the scope of the present disclosure.In the context of individual embodiment
Described in certain features can also realize in combination in single realize.On the contrary, in the described in the text up and down individually realized
Various features can also realize individually or in any suitable subcombination in multiple realizations.
Although having used specific to this theme of the language description of structure feature and/or method logical action, answer
When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary,
Special characteristic described in face and movement are only to realize the exemplary forms of claims.
Claims (20)
1. a kind of voice interactive method executed at electronic equipment, comprising:
In response to receiving the first voice command from the user, the body of the user is identified based on first voice command
Part;
It is configured to first voice command and predetermined activation command carrying out matched based on the identity of mark
With threshold value;
Determine whether first voice command matches with the predetermined activation command based on the matching threshold;And
It is matched in response to determination first voice command with the predetermined activation command, so that the electronic equipment enters activation
State, the electronic equipment can carry out interactive voice with the user under the state of activation.
2. according to the method described in claim 1, the identity for wherein identifying the user includes:
Determine whether the electronic equipment is in unactivated state, the electronic equipment can not be under the unactivated state
The user carries out interactive voice;And
The unactivated state is in response to the electronic equipment and receives first voice command, based on described the
One voice command identifies the identity of the user.
3. according to the method described in claim 1, the identity for wherein identifying the user includes:
The first voiceprint of the user is extracted from first voice command;
Obtain the second voiceprint of the registration user of the electronic equipment;
Determine the vocal print similarity between first voiceprint and second voiceprint;And
It is more than predetermined threshold in response to the vocal print similarity, is the registration user by the user identifier.
4. according to the method described in claim 3, wherein obtaining the second vocal print information and including:
Second voiceprint is obtained from the storage device being coupled with the electronic equipment.
5. according to the method described in claim 3, wherein configuring the matching threshold and including:
It is identified as the registration user in response to the user, configures first threshold for the matching threshold;And
It is not identified as the registration user in response to the user, configures second threshold for the matching threshold, wherein institute
First threshold is stated lower than the second threshold.
6. according to the method described in claim 1, wherein determining whether are first voice command and the predetermined activation command
Matching includes:
Determine the similarity between first voice command and the predetermined activation command;And it is super in response to the similarity
The matching threshold is crossed, determines that first voice command is matched with the predetermined activation command.
7. according to the method described in claim 6, wherein determining that the similarity includes:
The first acoustic feature is extracted from first voice command;
The second acoustic feature is extracted from the predetermined activation command;And
By comparing first acoustic feature and second acoustic feature, to determine first voice command and described pre-
Determine the similarity between activation command.
8. according to the method described in claim 1, wherein determining whether are first voice command and the predetermined activation command
Matching includes:
It is configured to identify the identification model of the predetermined activation command using the matching threshold, so that the identification mould
Type determines whether voice command matches with the predetermined activation command based on the matching threshold;And
Determine whether first voice command matches with the predetermined activation command using the identification model being configured.
9. according to the method described in claim 1, further include:
The state of activation is in response to the electronic equipment and is not received in threshold time interval from the use
Second voice command at family, so that the electronic equipment enters unactivated state, the electronic equipment is in the unactivated state
Under can not with the user carry out interactive voice.
10. a kind of device realized at electronic equipment, comprising:
Identity module is configured to respond to receive the first voice command from the user, is based on first voice
It orders to identify the identity of the user;
Threshold value configuration module is configured as being configured to based on the identity of mark by first voice command and being made a reservation for
Activation command carries out matched matching threshold;
Determining module is matched, is configured as determining first voice command and the predetermined activation based on the matching threshold
Whether order matches;And
Active module is configured to respond to determine that first voice command is matched with the predetermined activation command, so that institute
It states electronic equipment and enters state of activation, the electronic equipment can carry out voice friendship with the user under the state of activation
Mutually.
11. device according to claim 10, wherein the identity module includes:
Status determining unit, is configured to determine that whether the electronic equipment is in unactivated state, and the electronic equipment is in institute
Interactive voice can not be carried out with the user under unactivated state by stating;And
First identity unit is configured to respond to the electronic equipment and is in the unactivated state and receives institute
The first voice command is stated, the identity of the user is identified based on first voice command.
12. device according to claim 10, wherein the identity module includes:
First vocal print acquiring unit is configured as extracting the first voiceprint of the user from first voice command;
Second vocal print acquiring unit is configured as obtaining the second voiceprint of the registration user of the electronic equipment;
Vocal print similarity determining unit, the sound being configured to determine that between first voiceprint and second voiceprint
Line similarity;And
Second identity unit is configured to respond to the vocal print similarity more than predetermined threshold, by the user identifier
For the registration user.
13. device according to claim 12, wherein the rising tone line acquiring unit is also configured to
Second voiceprint is obtained from the storage device being coupled with the electronic equipment.
14. device according to claim 12, wherein the threshold value configuration module includes:
First threshold configuration unit is configured to respond to the user and is identified as the registration user, by the matching threshold
Value is configured to first threshold;And
Second threshold configuration unit is configured to respond to the user and is not identified as the registration user, by the matching
Threshold value is configured to second threshold, wherein the first threshold is lower than the second threshold.
15. device according to claim 10, wherein the matching determining module includes:
Similarity determining unit is configured to determine that similar between first voice command and the predetermined activation command
Degree;And
Determination unit is matched, the similarity is configured to respond to more than the matching threshold, determines the first voice life
Order is matched with the predetermined activation command.
16. device according to claim 15, wherein the similarity determining unit is also configured to
The first acoustic feature is extracted from first voice command;
The second acoustic feature is extracted from the predetermined activation command;And
By comparing first acoustic feature and second acoustic feature, to determine first voice command and described pre-
Determine the similarity between activation command.
17. device according to claim 10, wherein the matching determining module includes:
Model configuration unit is configured as being configured to identify the identification of the predetermined activation command using the matching threshold
Model so that the identification model determined based on the matching threshold voice command and the predetermined activation command whether
Match;And
Model applying unit, be configured as determining using the identification model being configured first voice command with it is described
Whether predetermined activation command matches.
18. device according to claim 10, further includes:
Flexible module is deactivated, the electronic equipment is configured to respond to and is in the state of activation and in threshold time interval
The second voice command from the user is not received, so that the electronic equipment enters unactivated state, the electronics is set
It is standby to carry out interactive voice with the user under the unactivated state.
19. a kind of electronic equipment, comprising:
One or more processors;And
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing
Device executes, so that one or more of processors realize method according to claim 1 to 9.
20. a kind of computer readable storage medium, is stored thereon with computer program, the computer program is held by processor
Method according to claim 1 to 9 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811148245.3A CN109272991B (en) | 2018-09-29 | 2018-09-29 | Voice interaction method, device, equipment and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811148245.3A CN109272991B (en) | 2018-09-29 | 2018-09-29 | Voice interaction method, device, equipment and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109272991A true CN109272991A (en) | 2019-01-25 |
CN109272991B CN109272991B (en) | 2021-11-02 |
Family
ID=65194800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811148245.3A Active CN109272991B (en) | 2018-09-29 | 2018-09-29 | Voice interaction method, device, equipment and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109272991B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977317A (en) * | 2019-04-03 | 2019-07-05 | 恒生电子股份有限公司 | Data query method and device |
CN110335315A (en) * | 2019-06-27 | 2019-10-15 | Oppo广东移动通信有限公司 | A kind of image processing method and device, computer readable storage medium |
CN110364178A (en) * | 2019-07-22 | 2019-10-22 | 出门问问(苏州)信息科技有限公司 | Voice processing method and device, storage medium and electronic equipment |
CN111833874A (en) * | 2020-07-10 | 2020-10-27 | 上海茂声智能科技有限公司 | Man-machine interaction method, system, equipment and storage medium based on identifier |
CN112951243A (en) * | 2021-02-07 | 2021-06-11 | 深圳市汇顶科技股份有限公司 | Voice awakening method, device, chip, electronic equipment and storage medium |
US11513767B2 (en) | 2020-04-13 | 2022-11-29 | Yandex Europe Ag | Method and system for recognizing a reproduced utterance |
US11915711B2 (en) | 2021-07-20 | 2024-02-27 | Direct Cursus Technology L.L.C | Method and system for augmenting audio signals |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101441869A (en) * | 2007-11-21 | 2009-05-27 | 联想(北京)有限公司 | Method and terminal for speech recognition of terminal user identification |
CN103838991A (en) * | 2014-02-20 | 2014-06-04 | 联想(北京)有限公司 | Information processing method and electronic device |
CN104021790A (en) * | 2013-02-28 | 2014-09-03 | 联想(北京)有限公司 | Sound control unlocking method and electronic device |
CN106295672A (en) * | 2015-06-12 | 2017-01-04 | 中国移动(深圳)有限公司 | A kind of face identification method and device |
CN106531172A (en) * | 2016-11-23 | 2017-03-22 | 湖北大学 | Speaker voice playback identification method and system based on environmental noise change detection |
CN107799120A (en) * | 2017-11-10 | 2018-03-13 | 北京康力优蓝机器人科技有限公司 | Service robot identifies awakening method and device |
CN107895578A (en) * | 2017-11-15 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | Voice interactive method and device |
US20180240463A1 (en) * | 2017-02-22 | 2018-08-23 | Plantronics, Inc. | Enhanced Voiceprint Authentication |
CN108537917A (en) * | 2018-02-07 | 2018-09-14 | 青岛海尔智能家电科技有限公司 | Identification success rate improvement method and intelligent door lock, doorway machine and server |
-
2018
- 2018-09-29 CN CN201811148245.3A patent/CN109272991B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101441869A (en) * | 2007-11-21 | 2009-05-27 | 联想(北京)有限公司 | Method and terminal for speech recognition of terminal user identification |
CN104021790A (en) * | 2013-02-28 | 2014-09-03 | 联想(北京)有限公司 | Sound control unlocking method and electronic device |
CN103838991A (en) * | 2014-02-20 | 2014-06-04 | 联想(北京)有限公司 | Information processing method and electronic device |
CN106295672A (en) * | 2015-06-12 | 2017-01-04 | 中国移动(深圳)有限公司 | A kind of face identification method and device |
CN106531172A (en) * | 2016-11-23 | 2017-03-22 | 湖北大学 | Speaker voice playback identification method and system based on environmental noise change detection |
US20180240463A1 (en) * | 2017-02-22 | 2018-08-23 | Plantronics, Inc. | Enhanced Voiceprint Authentication |
CN107799120A (en) * | 2017-11-10 | 2018-03-13 | 北京康力优蓝机器人科技有限公司 | Service robot identifies awakening method and device |
CN107895578A (en) * | 2017-11-15 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | Voice interactive method and device |
CN108537917A (en) * | 2018-02-07 | 2018-09-14 | 青岛海尔智能家电科技有限公司 | Identification success rate improvement method and intelligent door lock, doorway machine and server |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977317A (en) * | 2019-04-03 | 2019-07-05 | 恒生电子股份有限公司 | Data query method and device |
CN110335315A (en) * | 2019-06-27 | 2019-10-15 | Oppo广东移动通信有限公司 | A kind of image processing method and device, computer readable storage medium |
CN110335315B (en) * | 2019-06-27 | 2021-11-02 | Oppo广东移动通信有限公司 | Image processing method and device and computer readable storage medium |
CN110364178A (en) * | 2019-07-22 | 2019-10-22 | 出门问问(苏州)信息科技有限公司 | Voice processing method and device, storage medium and electronic equipment |
CN110364178B (en) * | 2019-07-22 | 2021-09-10 | 出门问问(苏州)信息科技有限公司 | Voice processing method and device, storage medium and electronic equipment |
US11513767B2 (en) | 2020-04-13 | 2022-11-29 | Yandex Europe Ag | Method and system for recognizing a reproduced utterance |
CN111833874A (en) * | 2020-07-10 | 2020-10-27 | 上海茂声智能科技有限公司 | Man-machine interaction method, system, equipment and storage medium based on identifier |
CN111833874B (en) * | 2020-07-10 | 2023-12-05 | 上海茂声智能科技有限公司 | Man-machine interaction method, system, equipment and storage medium based on identifier |
CN112951243A (en) * | 2021-02-07 | 2021-06-11 | 深圳市汇顶科技股份有限公司 | Voice awakening method, device, chip, electronic equipment and storage medium |
US11915711B2 (en) | 2021-07-20 | 2024-02-27 | Direct Cursus Technology L.L.C | Method and system for augmenting audio signals |
Also Published As
Publication number | Publication date |
---|---|
CN109272991B (en) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109272991A (en) | Method, apparatus, equipment and the computer readable storage medium of interactive voice | |
US9940935B2 (en) | Method and device for voiceprint recognition | |
WO2021159688A1 (en) | Voiceprint recognition method and apparatus, and storage medium and electronic apparatus | |
BR102018070673A2 (en) | GENERATE DIALOGUE BASED ON VERIFICATION SCORES | |
WO2016150001A1 (en) | Speech recognition method, device and computer storage medium | |
CN108711429B (en) | Electronic device and device control method | |
KR20160098771A (en) | Operating Method for Voice function and electronic device supporting the same | |
CN109564759A (en) | Speaker Identification | |
CN108766441A (en) | A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition | |
WO2014114116A1 (en) | Method and system for voiceprint recognition | |
CN104143326A (en) | Voice command recognition method and device | |
CN104021790A (en) | Sound control unlocking method and electronic device | |
KR20190018282A (en) | Method for performing personalized speech recognition and user terminal and server performing the same | |
CN101540170B (en) | Voiceprint recognition method based on biomimetic pattern recognition | |
US11862153B1 (en) | System for recognizing and responding to environmental noises | |
KR102563817B1 (en) | Method for processing user voice input and electronic device supporting the same | |
US20230386506A1 (en) | Self-supervised speech representations for fake audio detection | |
CN109637542A (en) | A kind of outer paging system of voice | |
WO2021169711A1 (en) | Instruction execution method and apparatus, storage medium, and electronic device | |
CN101350196A (en) | On-chip system for confirming role related talker identification and confirming method thereof | |
WO2020073839A1 (en) | Voice wake-up method, apparatus and system, and electronic device | |
KR20150035312A (en) | Method for unlocking user equipment based on voice, user equipment releasing lock based on voice and computer readable medium having computer program recorded therefor | |
WO2020102991A1 (en) | Method and apparatus for waking up device, storage medium and electronic device | |
TW202029181A (en) | Method and apparatus for specific user to wake up by speech recognition | |
CN110083392B (en) | Audio awakening pre-recording method, storage medium, terminal and Bluetooth headset thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211013 Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing Applicant after: Apollo Zhilian (Beijing) Technology Co.,Ltd. Address before: 100080 No.10, Shangdi 10th Street, Haidian District, Beijing Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |