CN109272991A - Method, apparatus, equipment and the computer readable storage medium of interactive voice - Google Patents

Method, apparatus, equipment and the computer readable storage medium of interactive voice Download PDF

Info

Publication number
CN109272991A
CN109272991A CN201811148245.3A CN201811148245A CN109272991A CN 109272991 A CN109272991 A CN 109272991A CN 201811148245 A CN201811148245 A CN 201811148245A CN 109272991 A CN109272991 A CN 109272991A
Authority
CN
China
Prior art keywords
user
voice
command
electronic equipment
voice command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811148245.3A
Other languages
Chinese (zh)
Other versions
CN109272991B (en
Inventor
贺学焱
赵科
欧阳能钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811148245.3A priority Critical patent/CN109272991B/en
Publication of CN109272991A publication Critical patent/CN109272991A/en
Application granted granted Critical
Publication of CN109272991B publication Critical patent/CN109272991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephone Function (AREA)

Abstract

Embodiment of the disclosure provides the method, apparatus, equipment and computer readable storage medium for interactive voice.It is a kind of at electronic equipment execute voice interactive method include that the identity of the user is identified based on the first voice command in response to receiving the first voice command from the user.This method further includes the identity based on mark to be configured to the first voice command and predetermined activation command carrying out matched matching threshold.This method further includes determining whether the first voice command matches with predetermined activation command based on matching threshold.In addition, this method further includes in response to determining that the first voice command is matched with predetermined activation command, so that electronic equipment enters state of activation, electronic equipment can carry out interactive voice with user in active state.By this method, embodiment of the disclosure can either improve registered users to the wake-up rate of electronic equipment, can also be effectively reduced the false wake-up rate under noise scenarios.

Description

Method, apparatus, equipment and the computer readable storage medium of interactive voice
Technical field
The disclosure relates generally to field of speech recognition, more particularly, to voice interactive method, device, equipment and meter Calculation machine readable medium.
Background technique
With the development of speech recognition technology, intelligent sound equipment has more been generally applicable to the daily life of people Living, work, even in production process.The example of intelligent sound equipment includes smart phone, intelligent sound box, wearable device etc., It allows people to interact by voice mode.For power saving and the purpose of misrecognition is reduced, it can be with entering Before user carries out the state of activation of interactive voice, the intelligent sound equipment under standby mode usually requires detection first and uses The specific activation command (for example, waking up word) that family issues.The process is also referred to as " voice wake-up ".Voice wake-up can be lower It is realized under power consumption, some wake-up word predetermined is detected.It, will be intelligent when detecting that user says the wake-up word Speech ciphering equipment activation enables the intelligent sound equipment to carry out normal voice with user and interacts.
The performance that voice wakes up mainly includes wake-up rate and false wake-up rate.Wake-up rate is referred to when in the voice command received In the presence of the ratio successfully detected when waking up word, and there will be no the voice commands for waking up word to be mistaken for depositing for the reference of false wake-up rate In the ratio for waking up word.It is generally desirable to improve the wake-up rate of speech ciphering equipment and reduce its false wake-up rate, to improve user's body It tests.However, improving the raising that wake-up rate also necessarily brings false wake-up rate in traditional scheme.
Summary of the invention
According to an example embodiment of the present disclosure, the scheme for interactive voice is provided.
In the first aspect of the disclosure, a kind of voice interactive method executed at electronic equipment is provided.This method Including identifying the identity of the user based on the first voice command in response to receiving the first voice command from the user.It should Method further includes the identity based on mark to be configured to the first voice command and predetermined activation command carrying out matched matching Threshold value.This method further includes determining whether the first voice command matches with predetermined activation command based on matching threshold.In addition, should Method further include in response to determining that the first voice command is matched with predetermined activation command so that electronic equipment enters state of activation, Electronic equipment can carry out interactive voice with user in active state.
In the second aspect of the disclosure, a kind of device for interactive voice is provided.The device includes: identity Module is configured to respond to receive the first voice command from the user, the user is identified based on the first voice command Identity;Threshold value configuration module is configured as the identity based on mark to be configured to the first voice command and predetermined activation Order carries out matched matching threshold;Match determining module, be configured as determining based on matching threshold the first voice command and Whether predetermined activation command matches;And active module, it is configured to respond to determine the first voice command and predetermined activation life Matching is enabled, so that electronic equipment enters state of activation, electronic equipment can carry out interactive voice with user in active state.
In the third aspect of the disclosure, a kind of electronic equipment, including one or more processors and storage dress are provided It sets.Storage device is for storing one or more programs.When one or more programs are executed by one or more processors, make The method for obtaining the first aspect that one or more processors are executed according to the disclosure.
In the fourth aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, it should The method that computer program realizes the first aspect according to the disclosure when being executed by processor.
It should be appreciated that content described in Summary be not intended to limit embodiment of the disclosure key or Important feature, it is also non-for limiting the scope of the present disclosure.The other feature of the disclosure will become easy reason by description below Solution.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure It will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, in which:
Fig. 1 shows embodiment of the disclosure can be in the schematic diagram for the example context wherein realized;
Fig. 2 shows the flow charts of voice interactive method according to an embodiment of the present disclosure;
Fig. 3 shows the flow chart of the method for the identity user identity according to the realization of the disclosure;
Fig. 4 is shown according to the realization of the disclosure based on user identity come the flow chart of the method for configurations match threshold value;
Fig. 5 shows the schematic block diagram of the device for interactive voice according to the embodiment of the present disclosure;And
Fig. 6 shows the block diagram that can implement the calculating equipment of multiple embodiments of the disclosure.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
In the description of embodiment of the disclosure, term " includes " and its similar term should be understood as that opening includes, I.e. " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " or " reality Apply example " it should be understood as " at least one embodiment ".Term " first ", " second " etc. may refer to different or identical right As.Hereafter it is also possible that other specific and implicit definition.
As mentioned above, for the purpose of power saving and reduction misrecognition, voice friendship can be carried out with user entering Before mutual state of activation, the intelligent sound equipment under standby mode usually requires specific the calling out of detection user sending first Awake word.When detecting that user says the wake-up word, intelligent sound equipment can be activated, normal so as to carry out with user Interactive voice.
In order to realize that voice wakes up, some traditional schemes record training audio data set generally directed to scheduling wake-up word, so Carry out the training of acoustic model for the pronunciation of scheduling wake-up word using the training audio data set afterwards.The acoustic model can be by The pronunciation similarity score between voice command and scheduling wake-up word for determining input.If the similarity score is more than pre- Fixed matching threshold is then judged to waking up successfully (that is, detecting wake-up word).If the similarity score is less than scheduled With threshold value, then it is judged to waking up failure (waking up word that is, being not detected).
In these schemes, generally use two ways to improve wake-up rate: a kind of mode is acquisition instruction as much as possible Practice audio data to carry out acoustic training model, to improve the coverage rate of acoustic model;Another way is reduced for sentencing It is fixed wake up whether successful matching threshold so that more similarity scores are more than the matching threshold, thus be judged as waking up at Function.The training cost that first way will lead to acoustic model significantly improves, and second is passing through reduction matching threshold to mention The raising of false wake-up rate is inevitably resulted in while high wake-up rate.In addition, this matching scheme based on pronunciation similarity cannot It is enough to distinguish voice, animal sound, ambient sound or the sound of machine synthesis well.Therefore, it is easy under more noisy environment Cause higher false wake-up rate.
In accordance with an embodiment of the present disclosure, a kind of interactive voice scheme is proposed.The program is mentioned from the voice command of user Voiceprint is taken, and based on extracted voiceprint come the identity of identity user.The program is further according to being identified User identity is configured to voice command and predetermined activation command carrying out matched matching threshold, wherein for registration user Matching threshold be set below the matching threshold for nonregistered user.By this method, embodiment of the disclosure can either Registered users are improved to the wake-up rate of electronic equipment, while the false wake-up rate that can be effectively reduced under noise scenarios.
Hereinafter reference will be made to the drawings to specifically describe embodiment of the disclosure.
Fig. 1 shows embodiment of the disclosure can be in the schematic diagram for the example context 100 wherein realized.Environment 100 is total It may include electronic equipment 110 and user 120 on body.The example of electronic equipment 110 can include but is not limited to can be with user Carry out smart phone, intelligent sound box, the wearable device etc. of interactive voice.It should be appreciated that being described merely for exemplary purpose The structure and function of environment 100, and do not imply that any restrictions for the scope of the present disclosure.Embodiment of the disclosure can also quilt It is applied in the environment with different structure and/or function.
As shown in Figure 1, electronic equipment 110 for example may include speech capturing device 111 and voice processing apparatus 112.Language The example of sound acquisition equipment 111 can include but is not limited to various microphones or microphone array etc..Speech capturing device 111 The voice command from user 120 can be captured, and the data captured are passed at voice processing apparatus 112 Reason.For example, voice processing apparatus 112 can be true when electronic equipment 110 is in standby mode (also referred to as " unactivated state ") Whether the fixed voice command from user 120 matches with specific activation command.Predetermined activation command described herein can be packet Include order or the scheduling wake-up word itself of scheduling wake-up word.The example for waking up word is, for example, " Siri ", " hello for small degree " etc..When Voice processing apparatus 112 determines the voice command from user 120 when matching with specific activation command, and electronic equipment 110 can be with It is waken up to enter state of activation.When electronic equipment 110 is active, in response to being received from speech capturing device 111 To subsequent voice order from the user, voice processing apparatus 112 can identify the voice command and the result based on identification To execute corresponding operation, such as information inquiry, music etc..
The process executed at electronic equipment 110 is described in detail in below with reference to Fig. 2.Fig. 2 shows according to the disclosure The flow chart for the exemplary method 200 of embodiment executed at electronic equipment 110.For example, method 200 can be by electronic equipment Voice processing apparatus 112 in 110 executes.Each movement of method 200 is described in detail below with reference to Fig. 1.It should manage Solution, method 200 can also include unshowned additional move and/or can be omitted shown movement.The scope of the present disclosure exists It is unrestricted in this respect.
At frame 210, in response to receiving the first voice command from user 120, voice processing apparatus 112 is based on the One voice command carrys out the identity of identity user 120.First voice command for example can be the voice command comprising scheduling wake-up word, User 120, which is expected that by, says scheduling wake-up word to activate electronic equipment 110, to carry out interactive voice with electronic equipment 110.? In some embodiments, whether voice processing apparatus 112 can determine electronic equipment 110 in unactivated state.Electronic equipment 110 Under unactivated state interactive voice can not be carried out with user 120.When voice processing apparatus 112 determines at electronic equipment 110 Under unactivated state and when receiving the first voice command from audio capturing device 111, voice processing apparatus 112 can be with base Carry out the identity of identity user 120 in the first voice command.
Additionally or alternatively, in some embodiments, voice processing apparatus 112 can be based in the first voice command Voiceprint carry out the identity of identity user 120.It is used as an example, Fig. 3 is shown according to the mark that is used for of the realization of the disclosure The flow chart of the exemplary method 300 of family identity.For example, method 300 can be used as a kind of example implementation of frame 210.
In frame 310, voice processing apparatus 112 extracts the first voiceprint from the first voice command.First voiceprint Such as may include the sound wave spectrum extracted from the first voice command, it is specific for the information of user 120.Studies have shown that The vocal print of one people not only has specificity, but also has stability.After this people adult, vocal print is generally remained for a long time It is relatively stable.No matter how other people deliberately imitate the sound and the tone of this people, and the vocal print of the two is different always.Therefore, sound Line can be used for the identity for identifying speaker.In some embodiments, voice processing apparatus 112 can use it is any known or The technology developed in the future can be identified for that the first voiceprint of the identity of user 120 to extract from the first voice command.
In frame 320, voice processing apparatus 112 obtains the second voiceprint of the registration user of electronic equipment 110.This institute The registration user stated can be the legitimate user registered in advance to electronic equipment 110.In some embodiments, user is registered The second voiceprint can be in the storage device for being pre-stored in and being coupled with electronic equipment 110.Therefore, at voice The second voiceprint of registration user can be obtained from the storage device by managing device 112.Alternatively, in some embodiments, it infuses The voice messaging of volume user can be pre-stored in the storage device being coupled with electronic equipment 110.Voice processing apparatus 112 can obtain the voice messaging of registration user from the storage device, and therefrom extract the second voiceprint (for example, with the The extraction of one voiceprint is similar).
In frame 330, voice processing apparatus 112 determines the first voiceprint of user 120 and the second vocal print of registration user Vocal print similarity between information.Then, in frame 340, voice processing apparatus 112 can be by identified vocal print similarity and pre- Determine threshold value to be compared.When vocal print similarity is more than predetermined threshold, in frame 350, voice processing apparatus 112 can be by user 120 are identified as registration user.
In some embodiments, electronic equipment 110 may have multiple registration users.For example, the vocal print of multiple users is believed Breath can be pre-stored at electronic equipment 110 (for example, in the storage device being coupled with electronic equipment 110).In this feelings Under condition, voice processing apparatus 112 can execute method for the voiceprint of each registration user in multiple registration users 300.When voice processing apparatus 112 is determined from the first voiceprint and multiple registration users extracted in the first voice command The voiceprint of any one match (for example, vocal print similarity be more than predetermined threshold) when, voice processing apparatus 112 can will be used Family 120 is identified as registration user.
Back to Fig. 2, method 200 is carried out to frame 220, wherein the body of user 120 of the voice processing apparatus 112 based on mark It part is configured to the first voice command and predetermined activation command carrying out matched matching threshold.Matching threshold will be used for really Whether fixed first voice command matches with predetermined activation command.As previously discussed, the height of matching threshold can determine language The sensitivity that sound wakes up.When matching threshold is lower, more voice commands will be judged as matching with predetermined activation command, from And wake-up rate is caused to be enhanced.
Fig. 4 is shown according to the realization of the disclosure based on user identity come the exemplary method 400 of configurations match threshold value Flow chart.For example, method 400 can be used as a kind of example implementation of frame 220.In frame 410, voice processing apparatus 112, which determines, to be used Whether family 120 is identified as registration user.If user 120 is identified as registration user, in frame 420, voice processing apparatus 112 can configure first threshold for matching threshold.If user 120 is not identified as registration user, in frame 430, voice Processing unit 112 can configure matching threshold to the second threshold more than first threshold.In some embodiments, first threshold The matching threshold for being respectively used to registration user and nonregistered user can be predetermined with second threshold.That is, for note The matching threshold of volume user is directed to the matching threshold of nonregistered user by being set below.By this method, the implementation of the disclosure Example can effectively improve the wake-up rate of registration user.At the same time, since the matching threshold for nonregistered user is higher, It can be effectively reduced the false wake-up rate under noise scenarios.This is because the voiceprint of noise is usually obvious with the voiceprint of people Difference, therefore will not be identified as from registration user.
Back to Fig. 2, method 200 is carried out to frame 230, and wherein voice processing apparatus 112 is based on the matching threshold configured To determine whether the first voice command matches with predetermined activation command.In some embodiments, voice processing apparatus 112 can be true Similarity between fixed first voice command and predetermined activation command.When the similarity is more than configured matching threshold, language Sound processor 112 can determine that the first voice command is matched with predetermined activation command.
Voice processing apparatus 112 can be determined based on technology that is any known or will developing the first voice command and Similarity between predetermined activation command, and by being compared to the similarity and identified matching threshold to determine first Whether voice command matches with predetermined activation command.List several possible examples for illustration purposes only below.However, It should be appreciated that these examples do not constitute the limitation to the scope of the present disclosure.Embodiment of the disclosure is suitable for removing following example Except other various situations.
In some embodiments, voice processing apparatus 112 can be compared based on acoustic feature to determine the first voice command Similarity between predetermined activation command.For example, voice processing apparatus 112 can extract the first sound from the first voice command Learn feature." acoustic feature " described herein may include syllable, pronouncing frequency, the sound intensity, loudness, pitch, signal-to-noise ratio, humorous make an uproar Than, any one of lock in phenomenon, Shimmer, cepstrum coefficient etc. or any combination.For example, extracted first acoustics is special Sign can be expressed in the form of feature vector.In addition, voice processing apparatus 112 can be based on any known or will open The technology of hair extracts the first acoustic feature from the first voice command.Similarly, voice processing apparatus 112 can obtain predetermined The correspondence acoustic feature (also referred to as " the second acoustic feature ") of activation command.In some embodiments, voice processing apparatus 112 can Similarly to extract the second acoustic feature from pre-stored predetermined activation command.Alternatively, the second of predetermined activation command Acoustic feature can be previously extracted and be stored at electronic equipment 110, therefore voice processing apparatus 112 can directly acquire Two acoustic features.For example, the second acoustic feature can use the different forms such as feature vector, template, acoustic model to be stored. In some embodiments, voice processing apparatus 112 can determine the by comparing the first acoustic feature and the second acoustic feature Similarity between one voice command and predetermined activation command.
Alternatively, in some embodiments, voice processing apparatus 112 is available is directed to predetermined activation command (for example, pre- Surely word is waken up) and trained in advance acoustic model.The acoustic model can be to word, syllable, the phoneme etc. in predetermined activation command Basic acoustic elements are modeled, to describe its statistical property.Voice processing apparatus 112 can will be mentioned from the first voice command The first acoustic feature taken is input to for predetermined activation command and in acoustic model trained in advance, is commented with obtaining acoustic model Point.The scoring for example can reflect the pronunciation similarity between the first voice command and predetermined activation command.
Alternatively, in further embodiments, voice processing apparatus 112 is available preparatory for predetermined activation command Trained identification model end to end.That is, when the acoustic feature extracted from some voice command is input to the identification mould When type, which can directly export the result whether voice command matches with predetermined activation command.In general, this Differentiation network is provided in identification model.For example, the identification model can determine the language by calculating the acoustic feature of input The confidence level that sound order and predetermined activation command match, and differentiating network can be by the confidence level and set confidence level threshold Value is compared to determine whether the voice command matches with predetermined activation command.In some embodiments, for example, speech processes Device 112 can configure the identification model based on identified matching threshold, so that differentiation network base therein Determine whether the voice command matches with predetermined activation command in the matching threshold.
Additionally or alternatively, in further embodiments, voice processing apparatus 112 can with any other technology or Mode determines whether the first voice command matches with predetermined activation command, such as, but not limited to combines acoustic model and language mould The speech recognition technology of both types, speech recognition technology based on rubbish word etc..In the case, the matching threshold configured It will determine the height of successful match rate.That is, lower matching threshold can correspond to higher successful match rate, and it is higher Matching threshold can correspond to lower successful match rate.Due to will be set as the matching threshold of registration user in frame 220 Lower than the matching threshold for being directed to nonregistered user, so that the voice command from registration user has with predetermined activation command Higher successful match rate, and the voice command from nonregistered user and predetermined activation command have lower successful match Rate.
In frame 240, when voice processing apparatus 112 determines that the first voice command and predetermined activation command match, voice Processing unit 112 can make electronic equipment 110 enter state of activation.Electronic equipment 110 in active state can be with user 120 carry out interactive voice, such as respond to the subsequent voice order from user 120.
Additionally or alternatively, when electronic equipment 110 enters state of activation and does not connect in threshold time interval When receiving from the second voice command of user 120, electronic equipment 110 will come back to unactivated state.That is, if user 120 expectations carry out interactive voice with electronic equipment 110 again, and user 120 needs to issue predetermined activation command (for example, saying pre- Surely word is waken up) so that electronic equipment 110 reenters state of activation.
By above description, it can be seen that interactive voice scheme according to an embodiment of the present disclosure can be from the voice of user Voiceprint is extracted in order, and the identity of user is identified based on extracted voiceprint.The further basis of the program The user identity identified is configured to voice command and predetermined activation command carrying out matched matching threshold, wherein being directed to The matching threshold of registration user is set below the matching threshold for nonregistered user.By this method, the implementation of the disclosure Example can either improve registered users to the wake-up rate of electronic equipment, while can be effectively reduced accidentally calling out under noise scenarios The rate of waking up.
Fig. 5 shows the schematic block diagram of the device 500 for interactive voice according to the embodiment of the present disclosure.For example, such as Voice processing apparatus 112 shown in FIG. 1 can use device 500 to realize.As shown in figure 5, device 500 may include identity mark Know module 510, is configured to respond to receive the first voice command from the user, be identified based on the first voice command The identity of the user.Device 500 can also include threshold value configuration module 520, be configured as the identity based on mark to configure For the first voice command and predetermined activation command to be carried out matched matching threshold.Device 500 can also include that matching determines Module 530 is configured as determining whether the first voice command matches with predetermined activation command based on matching threshold.Device 500 can also include active module 540, be configured to respond to determine that the first voice command is matched with predetermined activation command, So that electronic equipment enters state of activation, electronic equipment can carry out interactive voice with user in active state.
In some embodiments, identity module 510 includes: status determining unit, is configured to determine that electronic equipment Whether unactivated state is in, which can not carry out interactive voice with user under unactivated state;And first Identity unit is configured to respond to the electronic equipment and is in unactivated state and receives the first voice command, base The identity of the user is identified in the first voice command.
In some embodiments, identity module 510 includes: the first vocal print acquiring unit, is configured as from the first language The first voiceprint of the user is extracted in sound order;Second vocal print acquiring unit is configured as obtaining the registration of electronic equipment The second voiceprint of user;Vocal print similarity determining unit is configured to determine that the first voiceprint and the second voiceprint Between vocal print similarity;And the second identity unit, it is configured to respond to vocal print similarity more than predetermined threshold, it will The user identifier is registration user.
In some embodiments, the second vocal print acquiring unit is configured as obtaining from the storage device that electronic equipment is coupled Take the second voiceprint.
In some embodiments, threshold value configuration module 520 includes: first threshold configuration unit, is configured to respond to this User is identified as registration user, configures first threshold for matching threshold;And second threshold configuration unit, it is configured as ringing It should be not identified as registration user in the user, configure second threshold for matching threshold, wherein first threshold is lower than the second threshold Value.
In some embodiments, matching determining module 530 includes: similarity determining unit, is configured to determine that the first language Similarity between sound order and predetermined activation command;And matching determination unit, being configured to respond to the similarity is more than Matching threshold determines that the first voice command is matched with predetermined activation command.
In some embodiments, similarity determining unit is also configured to extract the first acoustics from the first voice command Feature;The second acoustic feature is extracted from predetermined activation command;And by comparing the first acoustic feature and the second acoustic feature, To determine the similarity between the first voice command and predetermined activation command.
In some embodiments, matching determining module 530 includes: model configuration unit, is configured as utilizing matching threshold It is configured to identify the identification model of predetermined activation command, so that the identification model determines that voice is ordered based on matching threshold It enables and whether being matched with predetermined activation command;And model applying unit, it is configured as determining using the identification model being configured Whether the first voice command matches with predetermined activation command.
In some embodiments, device 500 further includes deactivating flexible module, is configured to respond to electronic equipment and is in sharp State living and do not receive the second voice command from the user in threshold time interval so that the electronic equipment enter it is non- State of activation, the electronic equipment can not carry out interactive voice with user under unactivated state.
Fig. 6 shows the schematic block diagram that can be used to implement the example apparatus 600 of embodiment of the disclosure.Equipment 600 It can be used to implement electronic equipment 110 as shown in Figure 1.As shown, equipment 600 includes central processing unit (CPU) 601, It can according to the computer program instructions being stored in read-only memory (ROM) 602 or from storage unit 608 be loaded into Machine accesses the computer program instructions in memory (RAM) 603, to execute various movements appropriate and processing.In RAM 603 In, it can also store equipment 600 and operate required various programs and data.CPU 601, ROM 602 and RAM 603 pass through bus 604 are connected with each other.Input/output (I/O) interface 605 is also connected to bus 604.
Multiple components in equipment 600 are connected to I/O interface 605, comprising: input unit 606, such as keyboard, mouse etc.; Output unit 607, such as various types of displays, loudspeaker etc.;Storage unit 608, such as disk, CD etc.;And it is logical Believe unit 609, such as network interface card, modem, wireless communication transceiver etc..Communication unit 609 allows equipment 600 by such as The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 601 executes each method as described above and processing, such as method 200,300 and/or 400.Example Such as, in some embodiments, method 200,300 and/or 400 can be implemented as computer software programs, be physically include In machine readable media, such as storage unit 608.In some embodiments, some or all of of computer program can be through It is loaded into and/or is installed in equipment 600 by ROM 602 and/or communication unit 609.When computer program loads to RAM 603 and by CPU 601 execute when, the one or more steps of method as described above 200,300 and/or 400 can be executed.It is standby Selection of land, in other embodiments, CPU 601 can be matched by other any modes (for example, by means of firmware) appropriate It is set to execution method 200,300 and/or 400.
Function described herein can be executed at least partly by one or more hardware logic components.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used includes: field programmable gate array (FPGA), dedicated Integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device (CPLD) etc..
For implement disclosed method program code can using any combination of one or more programming languages come It writes.These program codes can be supplied to the place of general purpose computer, special purpose computer or other programmable data processing units Device or controller are managed, so that program code makes defined in flowchart and or block diagram when by processor or controller execution Function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as stand alone software Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, may include or is stored for The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can Reading medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media can include but is not limited to electricity Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content any conjunction Suitable combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable meter Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or Any appropriate combination of above content.
Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result. Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above Body realizes details, but these are not construed as the limitation to the scope of the present disclosure.In the context of individual embodiment Described in certain features can also realize in combination in single realize.On the contrary, in the described in the text up and down individually realized Various features can also realize individually or in any suitable subcombination in multiple realizations.
Although having used specific to this theme of the language description of structure feature and/or method logical action, answer When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary, Special characteristic described in face and movement are only to realize the exemplary forms of claims.

Claims (20)

1. a kind of voice interactive method executed at electronic equipment, comprising:
In response to receiving the first voice command from the user, the body of the user is identified based on first voice command Part;
It is configured to first voice command and predetermined activation command carrying out matched based on the identity of mark With threshold value;
Determine whether first voice command matches with the predetermined activation command based on the matching threshold;And
It is matched in response to determination first voice command with the predetermined activation command, so that the electronic equipment enters activation State, the electronic equipment can carry out interactive voice with the user under the state of activation.
2. according to the method described in claim 1, the identity for wherein identifying the user includes:
Determine whether the electronic equipment is in unactivated state, the electronic equipment can not be under the unactivated state The user carries out interactive voice;And
The unactivated state is in response to the electronic equipment and receives first voice command, based on described the One voice command identifies the identity of the user.
3. according to the method described in claim 1, the identity for wherein identifying the user includes:
The first voiceprint of the user is extracted from first voice command;
Obtain the second voiceprint of the registration user of the electronic equipment;
Determine the vocal print similarity between first voiceprint and second voiceprint;And
It is more than predetermined threshold in response to the vocal print similarity, is the registration user by the user identifier.
4. according to the method described in claim 3, wherein obtaining the second vocal print information and including:
Second voiceprint is obtained from the storage device being coupled with the electronic equipment.
5. according to the method described in claim 3, wherein configuring the matching threshold and including:
It is identified as the registration user in response to the user, configures first threshold for the matching threshold;And
It is not identified as the registration user in response to the user, configures second threshold for the matching threshold, wherein institute First threshold is stated lower than the second threshold.
6. according to the method described in claim 1, wherein determining whether are first voice command and the predetermined activation command Matching includes:
Determine the similarity between first voice command and the predetermined activation command;And it is super in response to the similarity The matching threshold is crossed, determines that first voice command is matched with the predetermined activation command.
7. according to the method described in claim 6, wherein determining that the similarity includes:
The first acoustic feature is extracted from first voice command;
The second acoustic feature is extracted from the predetermined activation command;And
By comparing first acoustic feature and second acoustic feature, to determine first voice command and described pre- Determine the similarity between activation command.
8. according to the method described in claim 1, wherein determining whether are first voice command and the predetermined activation command Matching includes:
It is configured to identify the identification model of the predetermined activation command using the matching threshold, so that the identification mould Type determines whether voice command matches with the predetermined activation command based on the matching threshold;And
Determine whether first voice command matches with the predetermined activation command using the identification model being configured.
9. according to the method described in claim 1, further include:
The state of activation is in response to the electronic equipment and is not received in threshold time interval from the use Second voice command at family, so that the electronic equipment enters unactivated state, the electronic equipment is in the unactivated state Under can not with the user carry out interactive voice.
10. a kind of device realized at electronic equipment, comprising:
Identity module is configured to respond to receive the first voice command from the user, is based on first voice It orders to identify the identity of the user;
Threshold value configuration module is configured as being configured to based on the identity of mark by first voice command and being made a reservation for Activation command carries out matched matching threshold;
Determining module is matched, is configured as determining first voice command and the predetermined activation based on the matching threshold Whether order matches;And
Active module is configured to respond to determine that first voice command is matched with the predetermined activation command, so that institute It states electronic equipment and enters state of activation, the electronic equipment can carry out voice friendship with the user under the state of activation Mutually.
11. device according to claim 10, wherein the identity module includes:
Status determining unit, is configured to determine that whether the electronic equipment is in unactivated state, and the electronic equipment is in institute Interactive voice can not be carried out with the user under unactivated state by stating;And
First identity unit is configured to respond to the electronic equipment and is in the unactivated state and receives institute The first voice command is stated, the identity of the user is identified based on first voice command.
12. device according to claim 10, wherein the identity module includes:
First vocal print acquiring unit is configured as extracting the first voiceprint of the user from first voice command;
Second vocal print acquiring unit is configured as obtaining the second voiceprint of the registration user of the electronic equipment;
Vocal print similarity determining unit, the sound being configured to determine that between first voiceprint and second voiceprint Line similarity;And
Second identity unit is configured to respond to the vocal print similarity more than predetermined threshold, by the user identifier For the registration user.
13. device according to claim 12, wherein the rising tone line acquiring unit is also configured to
Second voiceprint is obtained from the storage device being coupled with the electronic equipment.
14. device according to claim 12, wherein the threshold value configuration module includes:
First threshold configuration unit is configured to respond to the user and is identified as the registration user, by the matching threshold Value is configured to first threshold;And
Second threshold configuration unit is configured to respond to the user and is not identified as the registration user, by the matching Threshold value is configured to second threshold, wherein the first threshold is lower than the second threshold.
15. device according to claim 10, wherein the matching determining module includes:
Similarity determining unit is configured to determine that similar between first voice command and the predetermined activation command Degree;And
Determination unit is matched, the similarity is configured to respond to more than the matching threshold, determines the first voice life Order is matched with the predetermined activation command.
16. device according to claim 15, wherein the similarity determining unit is also configured to
The first acoustic feature is extracted from first voice command;
The second acoustic feature is extracted from the predetermined activation command;And
By comparing first acoustic feature and second acoustic feature, to determine first voice command and described pre- Determine the similarity between activation command.
17. device according to claim 10, wherein the matching determining module includes:
Model configuration unit is configured as being configured to identify the identification of the predetermined activation command using the matching threshold Model so that the identification model determined based on the matching threshold voice command and the predetermined activation command whether Match;And
Model applying unit, be configured as determining using the identification model being configured first voice command with it is described Whether predetermined activation command matches.
18. device according to claim 10, further includes:
Flexible module is deactivated, the electronic equipment is configured to respond to and is in the state of activation and in threshold time interval The second voice command from the user is not received, so that the electronic equipment enters unactivated state, the electronics is set It is standby to carry out interactive voice with the user under the unactivated state.
19. a kind of electronic equipment, comprising:
One or more processors;And
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing Device executes, so that one or more of processors realize method according to claim 1 to 9.
20. a kind of computer readable storage medium, is stored thereon with computer program, the computer program is held by processor Method according to claim 1 to 9 is realized when row.
CN201811148245.3A 2018-09-29 2018-09-29 Voice interaction method, device, equipment and computer-readable storage medium Active CN109272991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811148245.3A CN109272991B (en) 2018-09-29 2018-09-29 Voice interaction method, device, equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811148245.3A CN109272991B (en) 2018-09-29 2018-09-29 Voice interaction method, device, equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN109272991A true CN109272991A (en) 2019-01-25
CN109272991B CN109272991B (en) 2021-11-02

Family

ID=65194800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811148245.3A Active CN109272991B (en) 2018-09-29 2018-09-29 Voice interaction method, device, equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN109272991B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977317A (en) * 2019-04-03 2019-07-05 恒生电子股份有限公司 Data query method and device
CN110335315A (en) * 2019-06-27 2019-10-15 Oppo广东移动通信有限公司 A kind of image processing method and device, computer readable storage medium
CN110364178A (en) * 2019-07-22 2019-10-22 出门问问(苏州)信息科技有限公司 Voice processing method and device, storage medium and electronic equipment
CN111833874A (en) * 2020-07-10 2020-10-27 上海茂声智能科技有限公司 Man-machine interaction method, system, equipment and storage medium based on identifier
CN112951243A (en) * 2021-02-07 2021-06-11 深圳市汇顶科技股份有限公司 Voice awakening method, device, chip, electronic equipment and storage medium
US11513767B2 (en) 2020-04-13 2022-11-29 Yandex Europe Ag Method and system for recognizing a reproduced utterance
US11915711B2 (en) 2021-07-20 2024-02-27 Direct Cursus Technology L.L.C Method and system for augmenting audio signals

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441869A (en) * 2007-11-21 2009-05-27 联想(北京)有限公司 Method and terminal for speech recognition of terminal user identification
CN103838991A (en) * 2014-02-20 2014-06-04 联想(北京)有限公司 Information processing method and electronic device
CN104021790A (en) * 2013-02-28 2014-09-03 联想(北京)有限公司 Sound control unlocking method and electronic device
CN106295672A (en) * 2015-06-12 2017-01-04 中国移动(深圳)有限公司 A kind of face identification method and device
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
CN107799120A (en) * 2017-11-10 2018-03-13 北京康力优蓝机器人科技有限公司 Service robot identifies awakening method and device
CN107895578A (en) * 2017-11-15 2018-04-10 百度在线网络技术(北京)有限公司 Voice interactive method and device
US20180240463A1 (en) * 2017-02-22 2018-08-23 Plantronics, Inc. Enhanced Voiceprint Authentication
CN108537917A (en) * 2018-02-07 2018-09-14 青岛海尔智能家电科技有限公司 Identification success rate improvement method and intelligent door lock, doorway machine and server

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441869A (en) * 2007-11-21 2009-05-27 联想(北京)有限公司 Method and terminal for speech recognition of terminal user identification
CN104021790A (en) * 2013-02-28 2014-09-03 联想(北京)有限公司 Sound control unlocking method and electronic device
CN103838991A (en) * 2014-02-20 2014-06-04 联想(北京)有限公司 Information processing method and electronic device
CN106295672A (en) * 2015-06-12 2017-01-04 中国移动(深圳)有限公司 A kind of face identification method and device
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
US20180240463A1 (en) * 2017-02-22 2018-08-23 Plantronics, Inc. Enhanced Voiceprint Authentication
CN107799120A (en) * 2017-11-10 2018-03-13 北京康力优蓝机器人科技有限公司 Service robot identifies awakening method and device
CN107895578A (en) * 2017-11-15 2018-04-10 百度在线网络技术(北京)有限公司 Voice interactive method and device
CN108537917A (en) * 2018-02-07 2018-09-14 青岛海尔智能家电科技有限公司 Identification success rate improvement method and intelligent door lock, doorway machine and server

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977317A (en) * 2019-04-03 2019-07-05 恒生电子股份有限公司 Data query method and device
CN110335315A (en) * 2019-06-27 2019-10-15 Oppo广东移动通信有限公司 A kind of image processing method and device, computer readable storage medium
CN110335315B (en) * 2019-06-27 2021-11-02 Oppo广东移动通信有限公司 Image processing method and device and computer readable storage medium
CN110364178A (en) * 2019-07-22 2019-10-22 出门问问(苏州)信息科技有限公司 Voice processing method and device, storage medium and electronic equipment
CN110364178B (en) * 2019-07-22 2021-09-10 出门问问(苏州)信息科技有限公司 Voice processing method and device, storage medium and electronic equipment
US11513767B2 (en) 2020-04-13 2022-11-29 Yandex Europe Ag Method and system for recognizing a reproduced utterance
CN111833874A (en) * 2020-07-10 2020-10-27 上海茂声智能科技有限公司 Man-machine interaction method, system, equipment and storage medium based on identifier
CN111833874B (en) * 2020-07-10 2023-12-05 上海茂声智能科技有限公司 Man-machine interaction method, system, equipment and storage medium based on identifier
CN112951243A (en) * 2021-02-07 2021-06-11 深圳市汇顶科技股份有限公司 Voice awakening method, device, chip, electronic equipment and storage medium
US11915711B2 (en) 2021-07-20 2024-02-27 Direct Cursus Technology L.L.C Method and system for augmenting audio signals

Also Published As

Publication number Publication date
CN109272991B (en) 2021-11-02

Similar Documents

Publication Publication Date Title
CN109272991A (en) Method, apparatus, equipment and the computer readable storage medium of interactive voice
US9940935B2 (en) Method and device for voiceprint recognition
WO2021159688A1 (en) Voiceprint recognition method and apparatus, and storage medium and electronic apparatus
BR102018070673A2 (en) GENERATE DIALOGUE BASED ON VERIFICATION SCORES
WO2016150001A1 (en) Speech recognition method, device and computer storage medium
CN108711429B (en) Electronic device and device control method
KR20160098771A (en) Operating Method for Voice function and electronic device supporting the same
CN109564759A (en) Speaker Identification
CN108766441A (en) A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition
WO2014114116A1 (en) Method and system for voiceprint recognition
CN104143326A (en) Voice command recognition method and device
CN104021790A (en) Sound control unlocking method and electronic device
KR20190018282A (en) Method for performing personalized speech recognition and user terminal and server performing the same
CN101540170B (en) Voiceprint recognition method based on biomimetic pattern recognition
US11862153B1 (en) System for recognizing and responding to environmental noises
KR102563817B1 (en) Method for processing user voice input and electronic device supporting the same
US20230386506A1 (en) Self-supervised speech representations for fake audio detection
CN109637542A (en) A kind of outer paging system of voice
WO2021169711A1 (en) Instruction execution method and apparatus, storage medium, and electronic device
CN101350196A (en) On-chip system for confirming role related talker identification and confirming method thereof
WO2020073839A1 (en) Voice wake-up method, apparatus and system, and electronic device
KR20150035312A (en) Method for unlocking user equipment based on voice, user equipment releasing lock based on voice and computer readable medium having computer program recorded therefor
WO2020102991A1 (en) Method and apparatus for waking up device, storage medium and electronic device
TW202029181A (en) Method and apparatus for specific user to wake up by speech recognition
CN110083392B (en) Audio awakening pre-recording method, storage medium, terminal and Bluetooth headset thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211013

Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Zhilian (Beijing) Technology Co.,Ltd.

Address before: 100080 No.10, Shangdi 10th Street, Haidian District, Beijing

Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant