CN116524919A - Equipment awakening method, related device and communication system - Google Patents
Equipment awakening method, related device and communication system Download PDFInfo
- Publication number
- CN116524919A CN116524919A CN202210075546.8A CN202210075546A CN116524919A CN 116524919 A CN116524919 A CN 116524919A CN 202210075546 A CN202210075546 A CN 202210075546A CN 116524919 A CN116524919 A CN 116524919A
- Authority
- CN
- China
- Prior art keywords
- wake
- voice
- word
- audio energy
- electronic device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 151
- 238000004891 communication Methods 0.000 title claims abstract description 47
- 230000004044 response Effects 0.000 claims abstract description 61
- 238000004590 computer program Methods 0.000 claims description 9
- 230000002618 waking effect Effects 0.000 claims description 5
- 230000000875 corresponding effect Effects 0.000 description 38
- 230000006870 function Effects 0.000 description 38
- 230000008569 process Effects 0.000 description 36
- 230000007613 environmental effect Effects 0.000 description 30
- 239000010410 layer Substances 0.000 description 19
- 238000012545 processing Methods 0.000 description 16
- 238000007726 management method Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 230000000694 effects Effects 0.000 description 8
- 230000005236 sound signal Effects 0.000 description 8
- 238000001514 detection method Methods 0.000 description 6
- 238000010295 mobile communication Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 241001669679 Eleotris Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012792 core layer Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Telephone Function (AREA)
Abstract
The application provides a device wake-up method, a related device and a communication system. The plurality of voice wake-up devices may detect whether the collected sound contains a pre-wake-up word. The pre-wake word is part of the wake word. When the pre-wake-up word is detected, the multiple voice wake-up devices can negotiate and determine the response device according to the audio energy of the pre-wake-up word corresponding to the pre-wake-up word. The answering device may enter an awake state after detecting the wake word, responding to the user. The method can start negotiating to determine the response equipment when the user does not speak the wake-up word yet, and the response equipment enters the wake-up state after detecting the wake-up word. The method can improve the response speed of the voice awakening device after detecting the awakening word under the condition of not reducing the awakening rate in the scene of a plurality of voice awakening devices.
Description
Technical Field
The present disclosure relates to the field of terminal technologies, and in particular, to a device wake-up method, a related device, and a communication system.
Background
With the development of electronic devices such as mobile phones, tablet computers, intelligent home devices and the like, more and more electronic devices have voice wake-up capability. The electronic device with voice wake-up capability may enter a wake-up state after detecting wake-up voice, recognize a voice instruction of a user, and execute an operation corresponding to the voice instruction. The wake-up speech is speech containing wake-up words.
However, in a scenario where there are a plurality of electronic devices with voice wake-up capability and wake-up words for waking up the plurality of electronic devices are the same, the plurality of electronic devices need to negotiate after detecting wake-up voice, and determine one electronic device as a response device to respond to a voice command of a user. The negotiation process requires time, which makes the response speed of the electronic device slower and the user experience worse when there are multiple electronic devices with voice wake-up capability in the environment.
Disclosure of Invention
The application provides a device wake-up method, a related device and a communication system. In the device wake-up method provided by the application, after the electronic device detects the pre-wake-up voice containing the pre-wake-up word, the electronic device can judge which device to answer according to the pre-wake-up audio energy determined by the electronic device and the audio energy determined by other electronic devices after the electronic device detects the pre-wake-up voice containing the pre-wake-up word. The pre-wake word may be part of a wake word. When a wake-up speech containing a wake-up word is detected, the determined device for answering may respond to the user. The method can advance the process of negotiating and determining the answering device by the electronic devices with the same wake-up words, so that the time for waiting for the response of the electronic devices after the user speaks the wake-up words is reduced. The method can improve the response speed of the electronic equipment after the wake-up voice is detected, and improve the use experience of the user.
In a first aspect, the present application provides a device wake-up method. The first electronic device may detect a first pre-wake-up speech including a pre-wake-up word, and obtain first audio energy according to the first pre-wake-up speech. The first electronic device may receive M audio energies sent by M electronic devices, where one audio energy of the M audio energies is obtained by one electronic device of the M electronic devices according to a detected pre-wake-up voice including a pre-wake-up word, and M is a positive integer. The first electronic device may determine, based on the first audio energy and the M audio energies, that the first electronic device is a device that responds. When a first wake-up speech containing a wake-up word is detected, a first application in the first electronic device may enter a wake-up state. The pre-wake word is a part of the wake word, and the first application is used for detecting and responding to the voice command in the wake state to execute an operation corresponding to the voice command.
The wake-up words of the first electronic device and the M electronic devices are the same. The first pre-wake-up speech for determining the first audio energy and the pre-wake-up speech for determining the M audio energies may be detected by the first electronic device and the M electronic devices based on a sentence of pre-wake-up words spoken by the user.
The first audio energy may be the largest of the first audio energy and the M audio energies. It will be appreciated that the greater the audio energy determined by the electronic device from the detected pre-wake-up speech, the closer the electronic device may be to the user. In the above negotiation to determine the answering machine, the electronic device nearest to the user may be selected to respond to the wake-up word and/or voice command spoken by the user.
In some embodiments, the first electronic device may also determine the answering device in combination with the first audio energy, the M audio energies, and device information (e.g., device type, device frequency of use, device capabilities, etc.) for the respective electronic devices, not limited to just the audio energy determined by the pre-wake-up speech. For example, in the case where it is determined that there are a plurality of the first audio energy and the largest audio energy of the M audio energies, the first electronic device may compare device capabilities of the electronic devices corresponding to the plurality of largest audio energies to determine the answering device. The device capability may be, for example, the sound effect of a microphone. The first electronic device may select, as the response device, one electronic device with the best sound effect from the electronic devices corresponding to the plurality of maximum audio energies.
In some embodiments, the wake word may include a plurality of syllables. Syllables contained in the pre-wake word can be intercepted from the wake word. For example, the syllables that the pre-wake word contains may be the first few syllables of the wake word.
It can be seen that the method can determine the timing of the start of the negotiation process by detecting pre-wake-up speech. The pre-wake-up speech may be speech containing pre-wake-up words. The first electronic device and the other voice wake-up device may begin negotiating to determine the answering device when the user has not yet spoken the wake-up word. In this way, the answering machine can enter the wake-up state when wake-up speech is detected, i.e. after the user has finished speaking wake-up words. The method not only improves the response speed of the voice wake-up equipment after the voice wake-up equipment detects the wake-up voice, but also responds under the condition that the voice wake-up equipment detects the wake-up voice, and the wake-up rate is not affected. The voice wake-up method and the voice wake-up device can effectively improve the use experience of voice wake-up of the user in the scene that a plurality of wake-up words exist in the same voice wake-up device.
In combination with the first aspect, in some embodiments, after the first electronic device detects the first pre-wake speech including the pre-wake word, when the collected sound is detected to not include the wake word, the first application in the first electronic device may not enter the wake state. That is, the first electronic device may start negotiating with the M electronic devices to determine the answering device after detecting the pre-wake-up voice. When the wake-up voice containing the wake-up word is not detected, the first application in the first electronic device may not respond to the user even if the answering device is determined to be the first electronic device.
The method can reduce the probability of false awakening and improve the use experience of the user voice control equipment.
The first application in the first electronic device enters the wake-up state, which may indicate that the first electronic device invokes an application program of the first application. When the first application in the first electronic device is in the wake-up state, the process of the first application is contained in the program run by the first electronic device. In the wake-up state, the first electronic device may detect and identify the voice command through the first application, and execute an operation corresponding to the voice command.
The first application in the first electronic device does not enter the wake-up state, which may indicate that a program run by the first electronic device does not include a process of the first application. Alternatively, the first application in the first electronic device does not enter the awake state, which may indicate a process including the first application in a program running in the first electronic device, but the first application does not respond to the user. That is, in the case where the first application does not enter the awake state, the first electronic device does not respond to the wake word and/or the voice command uttered by the user. It can be understood that, in other electronic devices including the first application in the present application, the case that the first application enters the awake state may refer to the case that the first application in the first electronic device enters the awake state. Other cases where the first application in the electronic device including the first application does not enter the awake state may refer to the above case where the first application in the first electronic device does not enter the awake state.
In combination with the first aspect, in some embodiments, the first electronic device may detect a second pre-wake-up speech comprising a pre-wake-up word, and derive the second audio energy from the second pre-wake-up speech. The first electronic device may receive K audio energies sent by K electronic devices, where one audio energy of the K audio energies is obtained by one electronic device of the K electronic devices according to the detected pre-wake-up voice containing the pre-wake-up word, and K is a positive integer. The first electronic device may determine, according to the second audio energy and the K audio energies, that a second electronic device of the K electronic devices is a device that responds. In the case where the second electronic device is determined to be the responding device, the first application in the first electronic device may not enter the awake state.
The wake-up words of the first electronic device and the K electronic devices are the same. The second pre-wake-up speech for determining the second audio energy and the pre-wake-up speech for determining the K audio energies may be detected by the first electronic device and the K electronic devices based on a sentence of pre-wake-up words spoken by the user.
It can be seen that the method can determine the timing of the start of the negotiation process by detecting pre-wake-up speech. The pre-wake-up speech may be speech containing pre-wake-up words. The first electronic device and the other voice wake-up device may begin negotiating to determine the answering device when the user has not yet spoken the wake-up word. When it is determined that the answering machine is not the first electronic device, the first application in the first electronic device may not enter the awake state. In this way, it can be ensured that only the first application of the answering device enters the wake-up state after the answering device detects the wake-up voice. The method can reduce the probability of false awakening and avoid the interference to the user caused by the response of the plurality of electronic devices to the user after the awakening voice is detected.
In combination with the first aspect, in some embodiments, in a case where the second electronic device is determined to be a device that responds, the first electronic device may send a first message to the second electronic device, where the first message includes a first result, the first result is used to indicate that the second electronic device is a device that responds, and the first message is used to indicate that the second electronic device, after detecting a wake-up voice that includes a wake-up word, causes a first application in the second electronic device to enter a wake-up state.
In combination with the first aspect, in some embodiments, the first electronic device may further send the first audio energy to the M electronic devices after deriving the first audio energy from the first pre-wake-up speech. The M electronic devices may also mutually announce audio energies each determined according to the detected pre-wake-up voice.
It can be seen that in a scenario where there are multiple electronic devices with the same wake-up word, the process of negotiating after detecting the pre-wake-up speech by the multiple electronic devices may be: the answering device is determined by an electronic device, such as a first electronic device. The other electronic device may transmit audio energy to the first electronic device, each determined from the detected pre-wake-up speech. The first electronic device may send the determination result of the answering device to the other electronic devices. Alternatively, the process by which the plurality of electronic devices negotiate after detecting the pre-wake-up voice may be: the plurality of electronic devices may advertise to each other audio energy each determined from the detected pre-wake-up speech. The plurality of electronic devices may each determine a answering device. When determining that the electronic device is a response device, the electronic device can enter a wake-up state after detecting wake-up voice. When it is determined that the electronic device is not a answering device, the electronic device may not enter the wake-up state after detecting wake-up speech.
In some embodiments, the process of negotiating after detecting the pre-wake-up voice by the plurality of electronic devices may be: the plurality of electronic devices may transmit audio energy, each determined from the detected pre-wake-up speech, to a master device. The one master device may not be one of the plurality of wake-up words identical electronic devices. The one master device may also be a cloud server. The master device may determine the answering device and transmit the determination result of the answering device to the plurality of electronic devices. Alternatively, the master device may transmit only the determination result of the answering device to the answering device.
In combination with the first aspect, in some embodiments, the first electronic device may collect a first sound, the first sound not including the pre-wake word. The first electronic device may derive the third audio energy from the first sound. The first electronic device obtains fourth audio energy according to the first pre-wake-up voice. The first electronic device may subtract the third audio energy from the fourth audio energy to obtain the first audio energy. The first sound may be collected by the first electronic device within a preset time range in which the first pre-wake-up voice is detected. It will be appreciated that the first pre-wake-up speech described above typically includes ambient noise. The four audio energies described above include energy resulting from ambient noise. The first electronic device collects first sound in a preset time range when the pre-wake-up voice is detected, and the first sound is close to environmental noise contained in the first pre-wake-up voice. The first electronic device may utilize the third audio energy to reduce energy in the first audio energy that is generated by ambient noise.
The M electronic devices may also reduce energy generated by wake-up noise in audio energy derived from the detected pre-wake-up speech, respectively. The specific method may refer to a method of reducing energy in the first audio energy generated by ambient noise by the first electronic device.
It can be seen that in the above-described negotiation to determine a answering device, the first electronic device and the M electronic devices may remove audio energy generated by ambient noise from the audio energy determined by the pre-wake-up speech. The influence of environmental noise on the determination of the response device can be reduced, and the accuracy of the determination result of the response device is improved. Through the above audio energy for reducing the influence of the environmental noise, the first electronic device and the M electronic devices may start negotiating to determine the answering device when the user has not yet uttered the wake-up word. In this way, the answering machine can enter the wake-up state when wake-up speech is detected, i.e. after the user has finished speaking wake-up words. The method not only improves the response speed of the answering equipment after the wake-up voice is detected, but also responds under the condition that the answering equipment determines that the wake-up voice is detected, and the wake-up rate is not affected. The method and the device can effectively improve the use experience of the voice awakening function of the user in the scene of the electronic equipment with the same awakening words.
In a second aspect, the present application provides a device wake-up method. The method can be applied to a voice awakening system, wherein the voice awakening system comprises H electronic devices, the H electronic devices comprise first electronic devices, and H is a positive integer greater than 1. The first electronic device detects first pre-wake-up voice containing the pre-wake-up word, and obtains first audio energy according to the first pre-wake-up voice. H1 pieces of electronic equipment in the H pieces of electronic equipment send H1 pieces of audio energy to the first electronic equipment, the H1 pieces of electronic equipment do not contain the first electronic equipment, and one piece of audio energy in the H1 pieces of audio energy is obtained by one piece of electronic equipment in the H1 pieces of electronic equipment according to detected pre-wake-up voice containing a pre-wake-up word; h1 is a positive integer less than H. The first electronic device determines that the first electronic device is a device for responding according to the first audio energy and the H1 audio energy. When a first wake-up voice containing a wake-up word is detected, a first application in the first electronic device enters a wake-up state. The pre-wake word is a part of the wake word, and the first application is used for detecting and responding to the voice command in the wake state to execute an operation corresponding to the voice command.
It can be seen that the method can determine the timing of the start of the negotiation process by detecting pre-wake-up speech. The pre-wake-up speech may be speech containing pre-wake-up words. The first electronic device and the other voice wake-up device may begin negotiating to determine the answering device when the user has not yet spoken the wake-up word. In this way, the answering machine can enter the wake-up state when wake-up speech is detected, i.e. after the user has finished speaking wake-up words. The method not only improves the response speed of the voice wake-up equipment after the voice wake-up equipment detects the wake-up voice, but also responds under the condition that the voice wake-up equipment detects the wake-up voice, and the wake-up rate is not affected. The voice wake-up method and the voice wake-up device can effectively improve the use experience of voice wake-up of the user in the scene that a plurality of wake-up words exist in the same voice wake-up device.
With reference to the second aspect, in some embodiments, the first application in each of the H1 electronic devices does not enter the awake state.
It can be seen that, in the case that it is determined that the first electronic device is a device that responds, the first application in the other electronic devices except the first electronic device in the voice wake-up system may not enter the wake-up state. The method can reduce the probability of false wake-up and avoid the interference to the user caused by the response of the plurality of electronic devices to the user after the wake-up voice is detected.
With reference to the second aspect, in some embodiments, after the first electronic device detects the first pre-wake speech including the pre-wake word, when the collected sound is detected to not include the wake word, the first application in the first electronic device does not enter the wake state.
It can be seen that when the wake-up speech containing the wake-up word is not detected, the first application in the first electronic device may not respond to the user even if the answering device is determined to be the first electronic device. The method can reduce the probability of false wake-up and improve the use experience of the voice control equipment of the user.
With reference to the second aspect, in some embodiments, the first electronic device detects a second pre-wake-up speech including a pre-wake-up word, and obtains a second audio energy according to the second pre-wake-up speech. The method comprises the steps that H2 pieces of electronic equipment in the H pieces of electronic equipment send H2 pieces of audio energy to first electronic equipment, the H2 pieces of electronic equipment do not contain the first electronic equipment, one piece of audio energy in the H2 pieces of audio energy is obtained by one piece of electronic equipment in the H2 pieces of electronic equipment according to detected pre-wake-up voice containing pre-wake-up words; h2 is a positive integer less than H. And the first electronic device determines the second electronic device in the H2 electronic devices as the device for answering according to the second audio energy and the H2 audio energy. When the second wake-up voice containing the wake-up word is detected, the first application in the second electronic device enters a wake-up state, and neither the first application in the first electronic device nor the first application in each of the (H2-1) electronic devices enters the wake-up state, wherein the (H2-1) electronic devices are devices of the H2 electronic devices except the second electronic device.
With reference to the second aspect, in some embodiments, after determining that the second electronic device is a device that responds, the first electronic device may send a first message to the second electronic device, where the first message includes a first result, and the first result is used to indicate that the second electronic device is a device that responds. Based on the first message, when the second wake-up speech is detected, a first application in the second electronic device enters a wake-up state.
With reference to the second aspect, in some embodiments, the first electronic device transmits the second audio energy to the second electronic device. The (H2-1) electronic devices send (H2-1) audio energy to the second electronic device, and the (H2-1) audio energy is audio energy obtained by the (H2-1) electronic devices in the H2 audio energy. The second electronic equipment determines that the second electronic equipment is equipment for responding according to second audio energy, (H2-1) audio energy and fifth audio energy obtained by the second electronic equipment according to the detected third pre-awakening voice containing the pre-awakening word, wherein the fifth audio energy is contained in the H2 audio energy.
With reference to the second aspect, in some embodiments, the first electronic device collects a first sound, and the first sound does not include a pre-wake word. The first electronic device obtains third audio energy according to the first sound. The first electronic device obtains fourth audio energy according to the first pre-wake-up voice. The first electronic device subtracts the third audio energy from the fourth audio energy to obtain the first audio energy.
It can be seen that in the above-described negotiation to determine a answering device, the electronic device included in the voice wake-up system may remove audio energy generated by ambient noise from the audio energy determined by the pre-wake-up voice. The influence of environmental noise on the determination of the response device can be reduced, and the accuracy of the determination result of the response device is improved. Through the above audio energy for reducing the influence of environmental noise, the electronic device included in the voice wake-up system may start negotiating to determine the answering device when the user has not yet uttered the wake-up word. In this way, the answering machine can enter the wake-up state when wake-up speech is detected, i.e. after the user has finished speaking wake-up words. The method not only improves the response speed of the answering equipment after the wake-up voice is detected, but also responds under the condition that the answering equipment determines that the wake-up voice is detected, and the wake-up rate is not affected. The method and the device can effectively improve the use experience of the voice awakening function of the user in the scene of the electronic equipment with the same awakening words.
In a third aspect, the present application provides an electronic device that may comprise a microphone, a communication means, a memory and a processor, wherein the microphone may be used for capturing sound, the memory may be used for storing a computer program, and the processor may be used for invoking the computer program to cause the electronic device to perform any of the possible implementation methods as in the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform any one of the possible implementations of the first aspect.
In a fifth aspect, the present application provides a computer program product, which may contain computer instructions, which when run on an electronic device, cause the electronic device to perform any one of the possible implementation methods as in the first aspect.
In a sixth aspect, the present application provides a chip for application to an electronic device, the chip comprising one or more processors for invoking computer instructions to cause the electronic device to perform any of the possible implementation methods as in the first aspect.
It will be appreciated that the electronic device provided in the third aspect, the computer readable storage medium provided in the fourth aspect, the computer program product provided in the fifth aspect, and the chip provided in the sixth aspect are all configured to perform the method provided by the embodiments of the present application. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.
Drawings
Fig. 1 is a schematic diagram of a distribution situation of a voice wake-up device according to an embodiment of the present application;
Fig. 2 is a schematic diagram of time distribution from when a user speaks a wake-up word to when a voice wake-up device determines a response device according to an embodiment of the present application;
fig. 3A is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application;
fig. 3B is a software architecture block diagram of the electronic device 100 provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a voice wake apparatus 10 according to an embodiment of the present application;
fig. 5 is a flowchart of a device wake-up method provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of another time distribution from when a user speaks a wake-up word to when a voice wake-up device determines a answering device according to an embodiment of the present application;
FIG. 7 is a flowchart of another device wake-up method provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of a master device 200 according to an embodiment of the present application;
FIG. 9 is a flowchart of another device wake-up method provided in an embodiment of the present application;
fig. 10 is a schematic diagram of a voice wake-up system according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments of the present application, the terminology used in the embodiments below is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in the various embodiments herein below, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an association relationship of associated objects, meaning that there may be three relationships; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless stated otherwise. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Many electronic devices, such as cell phones, tablet computers, speakers, televisions, etc., currently have voice wakeup capabilities. The above-described electronic device with voice wake-up capability may be referred to as a voice wake-up device. The voice wake apparatus may have an Application (APP) installed therein for performing voice recognition, for example, a voice assistant APP. When the voice wake-up function is turned on, the voice wake-up device can collect the voice in the environment in real time and detect whether the voice contains wake-up words (i.e. detect whether wake-up voice exists). When a wake word is detected, the voice wake device may wake up and enter a wake state. In one possible implementation, the voice wakeup device is awakened, and more specifically, the voice assistant APP in the voice wakeup device is awakened. That is, the above-described awake state may represent a state in which the voice assistant APP in the voice awake device is awake.
In some embodiments, the voice assistant APP may respond to a wake word spoken by the user after it has been woken up. The response may be a voice response. For example, after the user speaks the wake word, the voice wake device may answer "I am" in voice. In some embodiments, after the voice assistant APP is awakened, a voice instruction in the sound collected by the voice awakening device may be identified, and an operation corresponding to the voice instruction is performed.
It can be seen that the wake-up word described above can be used to wake up the voice assistant APP in the voice wake-up device. To guarantee the wake-up rate at which a voice wake-up device is woken up, wake-up words typically have multiple syllables. For example, the wake word "small art" has four syllables. The wake-up word "heySiri" has three syllables. Compared with the wake-up words of single syllable, the wake-up rate of the voice wake-up device can be effectively improved by the wake-up words of multiple syllables. For example, if the wake word is "hey", the wake word has only one syllable. The user may often speak "hey" in daily life, but the purpose is not to wake up the voice wake-up device. This may result in the voice wake-up device being awakened by mistake. In addition, in order to facilitate the user speaking the wake-up word, the wake-up word should not be too complex and contain too many syllables. Illustratively, the wake word may contain 4-6 syllables. The method can ensure the wake-up rate of the wake-up voice wake-up equipment and also can facilitate the user to speak the wake-up word. The number of syllables of the wake-up word is not limited in the embodiment of the application. In the following embodiments, the wake-up word "small skill" containing a plurality of syllables is specifically taken as an example for illustration.
The wake-up word may also be used to wake up other modules or other APPs in the voice wake-up device, not limited to waking up the voice assistant APP. In the subsequent embodiments of the present application, the wake-up voice assistant APP is specifically taken as an example for illustration.
As electronic devices evolve, multiple voice wake-up devices may be deployed in a home (or other environment, such as an office, etc.).
Referring to fig. 1, fig. 1 illustrates a distribution of voice wake-up devices in a user's home.
As shown in fig. 1, a voice wake-up device 10, a voice wake-up device 11, and a voice wake-up device 12 are arranged in the living room. A voice wakeup device 13 and a voice wakeup device 14 are arranged in the main sleeper. The next lying is provided with a voice wake-up device 15. The wake-up words used to wake-up the voice wake-up devices 10-15 described above may be identical. For example, the wake-up word is "small art.
In one possible implementation, multiple voice wake-up devices with the same wake-up word configured in a home may form a voice wake-up system. One voice wake-up device included in the voice wake-up system can detect other voice wake-up devices included in the voice wake-up system and communicate with the other voice wake-up devices. The voice wake system may include methods for communicating between voice wake devices such as bluetooth communications, wireless fidelity (wireless fidelity, wi-Fi) communications, and the like. The embodiments of the present application are not limited in this regard.
For example, the home shown in fig. 1 has a voice wake-up system a. The voice wakeup system a includes the voice wakeup devices 10 to 15 described above. The voice wake apparatuses 10 to 15 may communicate with each other to sense states of each other (e.g., an on state of a voice wake function, an operating state, etc.). Not limited to the voice wakeup devices 10-15, more or fewer voice wakeup devices may be included in the voice wakeup system a.
The implementation manner of constructing the voice wake system a is not limited in the embodiments of the present application. In one possible implementation, the voice wake system a may be composed of voice wake devices that are on the same lan and that wake words are the same. Specifically, the voice wakeup device 10 accesses the network through a router. The electronic devices accessing the router may be on the same local area network. The voice wake-up device 10 can detect which electronic devices are voice wake-up devices in the electronic devices on the same lan as itself, and if the wake-up words are the same as the wake-up words of itself. When the voice wake-up devices 11 to 15 are detected to exist in the lan, and the wake-up words of the voice wake-up devices 11 to 15 are the same as the wake-up words of the voice wake-up devices themselves, the voice wake-up device 10 can establish communication connection with the voice wake-up devices 11 to 15. Similarly, any one of the voice wakeup devices 11 to 15 may discover the presence of other voice wakeup devices in the voice wakeup system a, so as to establish a communication connection therewith. Thus, the voice wakeup devices 10-15 may constitute a voice wakeup system, i.e., voice wakeup system A. Optionally, the voice wake-up system a may also be composed of voice wake-up devices that log in to the same account and have the same wake-up words.
In some embodiments, each voice wakeup device in voice wakeup system a may detect the presence of a voice wakeup device in voice wakeup system a every preset time period. For example, the voice wakeup devices 10-15 are all on the same local area network. The voice wakeup device 15 may exit the lan in response to a user operation acting on the voice wakeup device 15 to trigger the voice wakeup device 15 to exit the lan. Then the voice wakeup device 15 may be removed from the voice wakeup system a. The voice wakeup devices 10-14 may determine that the voice wakeup system a no longer includes the voice wakeup device 15.
In some embodiments, each voice wake-up device in the voice wake-up system a may detect an on state of a voice wake-up function of the voice wake-up device in the voice wake-up system a every a preset period of time. For example, in response to a user operation acting on the voice wakeup device 15 for triggering the voice wakeup device 15 to turn off the voice wakeup function, the voice wakeup device 15 may turn off the voice wakeup function. Upon receiving a message that the other voice wakeup device inquires of the voice wakeup function is on, the voice wakeup device 15 may send a reply message for indicating that the voice wakeup function is off. Thus, the voice wakeup devices 10 to 14 can determine that the voice wakeup function in the voice wakeup device 15 is turned off.
The embodiment of the present application does not limit the preset time period. Each voice wake-up device in the voice wake-up system a can detect the existence of the voice wake-up device in the voice wake-up system a, the on state of the voice wake-up function in the voice wake-up device, and the like at regular or irregular time.
The following describes a method for waking up a voice wake-up device in case there are a plurality of voice wake-up devices with the same wake-up word.
In one possible implementation method, in the case that there are a plurality of voice wake-up devices and wake-up words are the same, the voice wake-up devices may negotiate after detecting wake-up voice, and select one voice wake-up device closest to the user as the response device. The answering device can enter an awakening state, and a voice assistant APP in the answering device can be awakened to respond to the user. And other voice wake-up devices except the response device do not enter the wake-up state. Therefore, one voice awakening device in the voice awakening devices can execute the voice instruction of the user, and the situation that the voice awakening devices respond to cause trouble to the user after the user speaks the awakening words is avoided.
Specifically, when the user speaks the wake-up word, the voice wake-up devices 10 to 15 in the voice wake-up system a shown in fig. 1 all detect the wake-up voice. The detection of wake-up speech may indicate that the speech wake-up device detected a wake-up word from the collected sound. When the voice wake-up device 10 detects wake-up voice, the voice wake-up device 10 may determine wake-up word audio energy corresponding to the audio containing the wake-up word according to the audio obtained by the voice collected by itself. The voice wake-up device 10 may send the wake-up word audio energy determined by itself to other voice wake-up devices in the voice wake-up system a, and receive the wake-up word audio energy determined by the other voice wake-up devices in the voice wake-up system a. That is, the voice wake-up devices 10-15 may mutually announce their own determined wake-up word audio energy to each other after detecting wake-up voice. Optionally, when it is determined that there is a voice wake-up device with a voice wake-up function turned off in the voice wake-up system a, the voice wake-up device 10 may not send the determined wake-up word audio energy to the voice wake-up device with the voice wake-up function turned off.
The wake-up word audio energy may be a sound intensity of the audio including the wake-up word, or a parameter such as a sound pressure. The embodiment of the application does not limit the calculation method of the wake-up word audio energy. It will be appreciated that the closer the voice wake-up device is to the user, the shorter the time for the user to speak the wake-up word to reach the voice wake-up device, and the faster the voice wake-up device can detect the wake-up voice. And, because the sound will attenuate gradually in the propagation process, the closer the voice wake-up device is to the user, the greater the wake-up word audio energy that the voice wake-up device determines. Each voice wakeup device in voice wakeup system a may compare the magnitude of the wake-up word audio energy determined by each voice wakeup device to determine the responding voice wakeup device.
Since the wake-up word audio energy determined by each of the voice wake-up devices in voice wake-up system a is required to determine the answering device, voice wake-up device 10 is typically required to wait for the wake-up word audio energy sent by the other voice wake-up devices in voice wake-up system a. Therefore, the condition of missing voice awakening equipment can be reduced, and the accuracy of the determination result of the response equipment is improved.
In some embodiments, the voice wake-up device 10 may estimate which voice wake-up devices should wait for the wake-up word audio energy to be transmitted by based on the number of voice wake-up devices that are present in the voice wake-up system a and whose voice wake-up function is on. When the voice wake-up device 10 receives the wake-up word audio energy (such as the wake-up word audio energy sent by the voice wake-up devices 11 to 14) sent by the voice wake-up devices with all the voice wake-up functions turned on in the voice wake-up system a except for the voice wake-up device itself, the voice wake-up device 10 can determine the voice wake-up device that responds according to the wake-up word audio energy determined by the voice wake-up device itself and other voice wake-up devices. For example, the voice wake-up device 10 may compare the wake-up word audio energy determined by itself with other voice wake-up devices, and determine the voice wake-up device corresponding to the maximum wake-up word audio energy as the answering device. The voice awakening device corresponding to the maximum awakening word audio energy is the voice awakening device for responding.
In other embodiments, the voice wake apparatus 10 waits for the wake word audio energy transmitted by other voice wake apparatuses in the voice wake system a for a preset waiting period. When the waiting time reaches the preset waiting time period, the voice wake-up device 10 still does not receive the wake-up word audio energy sent by the voice wake-up devices with all the voice wake-up functions started except for the voice wake-up device in the voice wake-up system a, the voice wake-up device 10 can stop waiting, and the voice wake-up device for responding is determined according to the self wake-up word audio energy and the wake-up word audio energy of other voice wake-up devices already received. The voice wake-up devices 11 to 15 may also determine the voice wake-up device that responds according to the wake-up word audio energy, and the specific method may refer to the voice wake-up device 10 to determine the implementation method of the voice wake-up device that responds.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating a time distribution from when a user speaks a wake-up word to when a voice wake-up device determines a response device.
As shown in fig. 2, the user speaks the wake-up word "small art" from time t 11. It will be appreciated that the wake word has a plurality of syllables and that a user may take a certain amount of time to speak the wake word. Wherein the user speaks the wake-up word at time t 12. I.e. the time period from time t11 to time t12 is the time period when the user speaks the wake-up word. The user speaks the wake-up word, and the voice wake-up device 10 in the voice wake-up system a collects a sound containing the wake-up word and needs a certain time to detect the wake-up word from the sound. Then the voice wake-up device 10 detects wake-up voice at time t13, which is later than time t 12. When wake-up speech is detected, the voice wake-up device 10 may negotiate with other voice wake-up devices in the voice wake-up system a. Wherein, the time period from the time t13 to the time t14 is the time period of the above negotiation determination answering device. That is, the voice wake-up device in the voice wake-up system a may determine the answering device at time t 14. The voice assistant APP in the answering machine is then awakened to respond to the user.
As can be seen from the above embodiments, the voice wake-up device in the voice wake-up system a needs to start negotiating after the wake-up voice is detected to determine the answering device. In the negotiation process, one voice wakeup device needs to wait for the audio energy of the wake-up word sent by the other voice wakeup device. Then, the more voice wake-up devices in the voice wake-up system a, the more complex the above-mentioned negotiation process, the longer the voice wake-up device may need to wait to receive the wake-up word audio energy determined by the other voice wake-up devices. This results in the more voice wake-up devices, the slower the voice wake-up device responds to the user after he speaks the wake-up word.
In one possible implementation method, the voice wake-up device may utilize a wake-up word recognition model with lower complexity and less computation. The voice wake-up device may then detect wake-up words from the collected sound faster, thereby starting negotiations with other voice wake-up devices faster, determining the process of answering the device. This may promote the speed of response of the voice wake-up device after detection of wake-up voice. However, the recognition accuracy of the wake word recognition model, which is lower in complexity and lower in computation, is lowered. Namely, the probability that the voice awakening equipment wakes up by mistake is higher when the awakening voice is detected by utilizing the awakening word recognition model with lower complexity and smaller operation amount. For example, the user speaks a wake word, and the voice wake device does not detect the wake word using the wake word recognition model described above. Or the user does not speak the wake-up word, the voice wake-up device detects the wake-up word by using the wake-up word recognition model, and the voice wake-up device enters a wake-up state. Although the method can improve the response speed of the voice wake-up device after the wake-up voice is detected, the wake-up rate is reduced, and the user experience is still poor.
The application provides a device wake-up method. Each voice wake-up device in the voice wake-up system can detect whether the collected sound contains a pre-wake-up word. The pre-wake word may be part of a wake word. When the pre-awakening voice is detected, the voice awakening device can determine the pre-awakening word audio energy corresponding to the audio containing the pre-awakening word according to the audio acquired by the voice awakening device. The detection of the pre-wake-up speech may indicate that the speech wake-up device detected the pre-wake-up word from the collected sound. I.e. the pre-wake up speech is speech containing pre-wake up words. The voice wake-up devices in the voice wake-up system can mutually announce the determined pre-wake-up word audio energy to each other, so that a response device is determined in the voice wake-up system according to the pre-wake-up word audio energy. The answering machine may enter an awake state after detecting the wake-up voice, responding to the user.
It can be seen that the method can determine the timing of the start of the negotiation process by detecting pre-wake-up speech. The voice wake-up device may begin negotiating to determine the answering device when the user has not yet spoken the wake-up word. In this way, the answering machine can enter the wake-up state when wake-up speech is detected, i.e. after the user has finished speaking wake-up words. The method not only improves the response speed of the voice wake-up equipment after the voice wake-up equipment detects the wake-up voice, but also responds under the condition that the voice wake-up equipment detects the wake-up voice, and the wake-up rate is not affected. The voice wake-up method and the voice wake-up device can effectively improve the use experience of voice wake-up of the user in the scene that a plurality of wake-up words exist in the same voice wake-up device.
For ease of understanding, some concepts related to the embodiments of the present application are described herein.
1. Pre-awakening word
The pre-wake word may be part of a wake word. As can be seen from the foregoing embodiments, wake words typically have multiple syllables. The voice wake apparatus may intercept a portion of the wake word as a pre-wake word. The first few syllables in the wake word can typically be truncated as pre-wake words. For example, the wake-up word is "small art. The voice wake-up device may take "little skill" as a pre-wake-up word. As another example, the wake-up word is "heySiri", and the voice wake-up device may have "hey" as the pre-wake-up word.
It will be appreciated that when a wake-up voice is detected, the voice wake-up device may determine that the user has a need for voice interaction, requiring wake-up of one of the voice wake-up devices. When pre-wake-up speech is detected, the speech wake-up device may determine that the user may need to wake up one of the speech wake-up devices, and that the user may be about to speak a complete wake-up word. Then, when it is determined that the user may need to wake up one of the plurality of voice wake-up devices, the plurality of voice wake-up devices may start negotiating and select one of the answering devices. After the wake-up voice is detected, the answering device can directly enter a wake-up state to respond to the user. The voice wake-up device can effectively improve the response speed after the voice wake-up device detects wake-up voice.
2. Pre-awake state
When pre-wake speech is detected, the speech wake device may enter a pre-wake state. When in the pre-awakening state, the voice awakening device can determine pre-awakening word audio energy corresponding to the audio containing the pre-awakening word according to the audio acquired by the voice awakening device. The voice wake-up device can also send the pre-wake-up word audio energy of the voice wake-up device to other voice wake-up devices in the voice wake-up system, and receive the pre-wake-up word audio energy determined by the other voice wake-up devices. That is, the voice wake-up devices in the voice wake-up system may mutually notify each other of the pre-wake-up word audio energy determined by themselves in the pre-wake-up state, so as to determine a response device in the voice wake-up system according to the pre-wake-up word audio energy.
3. Wake state
While in the awake state, the voice wake apparatus may respond to the user. Specifically, the voice wakeup device being in the awake state may represent a state in which a voice assistant APP in the voice wakeup device is awake. That is, the voice wake-up device responds to the user may be a voice assistant APP in the voice wake-up device responds to the user.
When a wake-up voice is detected and no voice instruction has been detected, the voice assistant APP may invoke an audio output module (e.g. speaker) of the voice wake-up device, and the voice answer "i am". I.e. the voice assistant APP may respond to the wake-up word spoken by the user. The implementation method for responding to the wake-up word uttered by the user by the voice assistant APP in the embodiment of the present application is not limited.
The voice assistant APP can perform voice instruction recognition on a voice located after the wake-up word in the collected voice to recognize a voice instruction uttered by the user. When recognizing the voice command, the voice assistant APP may perform an operation corresponding to the voice command. The voice command may be a voice that instructs the voice wakeup device to complete a specified task. I.e. the voice assistant APP can respond to voice instructions uttered by the user. For example, the voice command is "turn on air conditioner". When the voice command is recognized, the voice wake-up device can send an opening command to the air conditioner to open the air conditioner. For another example, the voice command is "play music". When the voice command is recognized, the voice wake-up equipment has the capability of playing music, and the voice wake-up equipment can play music.
The voice wake-up device responding to the user can be a response device determined through negotiation in the voice wake-up system.
It should be noted that, the voice wake-up device may invoke the voice assistant APP when detecting wake-up voice and determining itself to be a response device. For example, the voice wake apparatus may run an application of the voice assistant APP. That is, the voice assistant APP in the voice wake apparatus may enter the wake state. When the voice assistant APP is in the awakening state, a process of the voice assistant APP is contained in a program operated by the voice awakening equipment. In the above-mentioned wake-up state, the voice assistant APP can detect and recognize the voice command, and execute the operation corresponding to the voice command.
The voice assistant APP of the voice wake-up device may not enter the wake-up state in case it detects wake-up voice but determines that it is not a answering device. For example, the process of the voice assistant APP is not included in the program run by the voice wake-up device. Alternatively, the program run by the voice wake-up device contains the process of the voice assistant APP, but the voice assistant APP will not respond to the user. I.e. the voice assistant APP will not respond to the user without entering the awake state. In addition, the voice assistant APP of the voice wake-up device may not enter the wake-up state in the case where the voice wake-up device determines itself to be the answering device (e.g., determines itself to be the answering device according to the detected pre-wake-up voice) but the wake-up voice is not detected.
It will be appreciated that in a scenario where there are multiple wake-up words of the same voice wake-up device, when the user speaks the wake-up word, the multiple voice wake-up devices detect pre-wake-up speech and determine a answering device from the multiple voice wake-up devices based on the pre-wake-up speech. Further, when the wake-up voice is detected, the voice assistant APP in the determined answering device may enter the wake-up state, detect and recognize the voice command, and execute the operation corresponding to the voice command. The voice assistant APP in the voice wake-up device except the response device in the plurality of voice wake-up devices may not enter the wake-up state. After the user speaks the wake-up word, the voice assistant APP in the voice wake-up devices responds to the user to cause trouble to the user, and the use experience of the user through the voice control device is improved.
In order to improve the response speed of the voice wake-up device without affecting the wake-up rate of the voice wake-up device, the embodiment of the application provides an electronic device 100. The electronic device 100 may be a voice wake-up device (e.g., voice wake-up devices 10-15) as in the previous embodiments.
The electronic device 100 may be a mounted deviceOr other operating system electronics, such as a cell phone, tablet, smart watch, smart bracelet, speaker, television, etc. The specific type of the electronic device 100 is not limited in the embodiments of the present application.
A schematic structural diagram of the electronic device 100 is described below.
As shown in fig. 3A, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, a subscriber identity module (subscriber identification module, SIM) card interface 195, and the like.
It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
In some embodiments, the processor 110 may include a voice wake module and a voice instruction recognition module. The voice wake-up module and the voice command recognition module may be integrated in different processor chips and executed by different chips. For example, the voice wake module may be integrated in a lower power coprocessor or DSP chip and the voice instruction recognition module may be integrated in an AP or NPU or other chip. Therefore, after the voice awakening module recognizes the preset awakening word, the chip where the voice instruction recognition module is located is started to trigger the voice instruction recognition function, so that the power consumption of the electronic equipment is saved. Alternatively, the voice wake module and the voice command recognition module may be integrated in the same processor chip, with the same chip performing the relevant functions. For example, the voice wake module and the voice command recognition module may each be integrated in an AP chip or NPU or other chip.
The processor 110 may also include a voice instruction execution module. After the voice command recognition module recognizes the voice command, the voice command execution module may execute an operation corresponding to the voice command. Such as playing music, making a call, sending a short message, etc.
It can be understood that the electronic device comprising the voice wake-up module, the voice command recognition module and the voice command execution module is an electronic device with voice interaction capability. The above-mentioned voice interaction capability may indicate that the electronic device 100 may respond to a voice command of a user and perform an operation corresponding to the voice command.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.
The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering.
The display screen 194 is used to display images, videos, and the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes.
The camera 193 is used to capture still images or video. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.
The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals.
A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal.
Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. In some embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.
The earphone interface 170D is used to connect a wired earphone.
The sensor module 180 may include a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
The keys 190 include a power-on key, a volume key, etc. The motor 191 may generate a vibration cue. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.
The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In this embodiment, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.
Fig. 3B is a software structural block diagram of the electronic device 100 according to the embodiment of the present application.
The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.
The application layer may include a series of application packages.
As shown in fig. 3B, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, short messages, voice assistants, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 3B, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, an activity manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
The activity manager is used for being responsible for managing activities (activities), starting, switching and scheduling all components in the system, managing and scheduling application programs and the like. The activity manager may invoke the upper layer application to open the corresponding activity.
Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.
The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a voice wake apparatus 10 according to an embodiment of the present application.
As shown in fig. 4, the voice wakeup device 10 may include: the pre-wake module 410, the arbitration module 420, the wake module 430, the voice assistant APP440, the communication module 450, the audio input module 460, the audio output module 470, the storage module 480 are coupled to each other via buses. Wherein:
the memory module 480 may have stored therein a pre-wake model 481 and a wake model 482. The pre-wake model 481 described above may be used to detect whether a pre-wake word is included in a sound. The wake model 482 described above may be used to detect whether wake words are included in the sound. The implementation methods of the pre-wake model 481 and the wake model 482 are not limited in this embodiment.
Illustratively, the voice wake apparatus 10 may collect ambient sounds through a microphone. Where the user speaks a pre-wake word (e.g., "small art") in the vicinity of the voice wake apparatus 10, the ambient sound may include pre-wake voice. After the ambient sound is collected, the voice wake apparatus 10 may utilize the pre-wake model 481 to separate the user's voice from the ambient sound and decode the phoneme sequence from the user's voice. When a phoneme sequence is obtained, the voice wakeup device 10 may use the pre-wakeup model 481 to determine whether the decoded phoneme sequence matches a stored pre-wakeup word phoneme sequence. If so, the voice wake up device 10 may determine that the voice of the user includes pre-wake up voice.
Alternatively, the pre-wake model 481 described above may also be a neural network based model. The implementation of the wake model 482 may be the same as or different from the implementation of the pre-wake model 481.
In some embodiments, the pre-wake model 481 and the wake model 482 may be the same model. It will be appreciated that the pre-wake word is part of the wake word. The voice wakeup device 10 may use a model to identify whether the collected sound contains a pre-wake word. When it is determined that the sound contains the pre-wake word, the one model may output a result that the pre-wake word is present. Further, the voice wake apparatus 10 may use the one model to continuously identify whether the collected sound includes a wake word. When it is determined that the sound contains a wake-up word, the one model may output a result that the wake-up word is present.
The storage module 480 may also store more content, not limited to storing the pre-wake model 481 and the wake model 482. The memory module 480 may correspond to the internal memory 121 shown in fig. 3A described above.
The audio input module 460 may be used to collect sound. The audio input module 460 may include a microphone. When the voice wake function in the voice wake apparatus 10 is turned on, the audio input module 460 may convert the collected sound into an audio electrical signal and send the audio electrical signal to the pre-wake module 410 and the wake module 430.
The audio output module 470 may be used to convert the audio electrical signal into sound, thereby playing the sound. The audio output module 470 may include one or more of the following: speaker, receiver. For example, when wake-up speech is detected, the voice assistant APP440 may invoke the audio output module 470 to answer "i am" in speech.
The pre-wake module 410 may be configured to obtain the pre-wake model 481 from the storage module 480 upon receiving an audio electrical signal from the audio input module 460 to detect whether a pre-wake word is present. Upon detecting the presence of a pre-wake word in the sound captured by the audio input module 460, the pre-wake module 410 may instruct the arbitration module 420 to determine a answering machine.
In one possible implementation, the pre-wake module 410 may determine pre-wake word audio energy corresponding to the audio containing the pre-wake word based on the audio obtained from the sound collected by the audio input module 460. The pre-wake module 410 may send the pre-wake word audio energy to the arbitration module 420. In addition, the pre-wake module 410 may also send the pre-wake word audio energy to other voice wake devices (e.g., voice wake devices 11-15) in the voice wake system through the communication module 450.
In another possible implementation, the pre-wake module 410 may also send audio containing pre-wake words to the arbitration module 420. The arbitration module 420 may determine the pre-wake word audio energy and send the pre-wake word audio energy to other voice wake devices in the voice wake system.
It should be noted that, when each voice wake-up device in the voice wake-up system mutually announces the determined pre-wake-up word audio energy, the pre-wake-up word audio energy sent to other voice wake-up devices may be normalized. That is, the pre-wake-up word audio energy from different voice wake-up devices used to determine the answering machine is calculated according to the same metric. In the same metering standard, the magnitude of the audio energy of the pre-wake-up word can reflect the distance between the voice wake-up device and the user. The larger the pre-wake-up word audio energy, the closer the voice wake-up device corresponding to the pre-wake-up word audio energy is to the user. The voice wake-up device corresponding to the pre-wake-up word audio energy may be a voice wake-up device that determines the pre-wake-up word audio energy. In this way, each voice wake-up device in the voice wake-up system may determine the answering device based on the magnitude of the pre-wake-up word audio energy.
The implementation method for carrying out normalization processing on the pre-wake-up word audio energy is not limited.
The arbitration module 420 may be configured to determine which voice wake-up device determines the maximum pre-wake-up word audio energy after obtaining the pre-wake-up word audio energy determined by the voice wake-up device 10 and other voice wake-up devices in the voice wake-up system. The arbitration module 420 may determine the one voice wakeup device corresponding to the largest pre-wakeup word audio energy as the answering device.
The arbitration module 420 described above is optional.
In some embodiments, each voice wakeup device in the voice wakeup system includes the arbitration module 420 described above. That is, the process of negotiating the determination of the answering device by each voice wakeup device in the voice wakeup system may include: the voice wake-up devices can mutually announce the determined pre-wake-up word audio energy to each other, and the voice wake-up devices can determine the response device by utilizing the pre-wake-up word audio energy through the own arbitration module.
In some embodiments, the arbitration module 420 may not be included in the voice wakeup device 10. For example, one voice wakeup device (e.g., voice wakeup device 11) in the voice wakeup system is the master device. After the voice wakeup device 10 detects the pre-wakeup word through the pre-wakeup module 410, the pre-wakeup word audio energy may be sent to the master device. The master device may include an arbitration module 420. The master device may obtain pre-wake word audio energy determined by a plurality of voice wake devices in the voice wake system and determine the answering device using the pre-wake word audio energy through the arbitration module 420. Alternatively, the master device may be an electronic device other than the voice wake-up devices. Alternatively, each voice wakeup device in the voice wakeup system may send the respective determined pre-wake word audio energy to a server (e.g., cloud server). The cloud server may determine the answering device and instruct the answering device to respond to the user after detecting the wake-up voice.
The wake module 430 may be configured to, upon receipt of an audio electrical signal from the audio input module 460, retrieve the wake model 482 from the storage module 480 to detect whether a wake word is present. When it is detected that a wake-up word exists in the sound collected by the audio input module 460, the wake-up module 430 may obtain a determination result of the answering machine from the arbitration module 420. In the event that a wake-up voice is detected and the answering machine is a voice wake-up machine 10, the wake-up module 430 can wake up the voice assistant APP440. The wake-up module 430 may comprise a voice wake-up module in the embodiment described above with respect to fig. 3A.
In some embodiments, the pre-wake module 410 and the wake module 430 may be the same module.
The voice assistant APP440 may be used to respond to the user after waking up. For example, the voice assistant APP440 may invoke the audio input module 460 to respond to a wake word spoken by the user. The voice assistant APP440 can recognize a voice instruction of the user and perform an operation corresponding to the voice instruction. The voice assistant APP440 may include the voice instruction recognition module and the voice instruction execution module of the embodiment described in fig. 3A.
The communication module 450 may be used to voice wake the device 10 to communicate with other electronic devices. For example, the voice wake apparatus 10 may discover voice wake apparatuses having the same wake words as itself through the communication module 450, and determine states of the voice wake apparatuses (e.g., an on state of a voice wake function, an operating state, etc.). When pre-wake-up speech is detected, the speech wake-up device 10 may also send its determined pre-wake-up word audio energy to other speech wake-up devices in the speech wake-up system via the communication module 450 and receive the determined pre-wake-up word audio energy from the other speech wake-up devices. The communication module 450 may send the pre-wake word audio energy determined by the other voice wake apparatus to the arbitration module 420.
Not limited to the modules shown in fig. 4, the voice wakeup device 10 may also include more or fewer modules. It will be appreciated that the structure of other voice wake-up devices in the embodiments of the present application may refer to the schematic structure of the voice wake-up device 10 shown in fig. 4. And will not be described in detail here.
The following describes a device wake-up method provided in the embodiment of the present application based on the voice wake-up device 10 shown in fig. 4.
The voice wake-up system composed of the voice wake-up device 10 and the voice wake-up device 11 is specifically described here as an example. It will be appreciated that when the voice wake-up system includes more voice wake-up devices (e.g., voice wake-up devices 10-15 are included in voice wake-up system a), the voice wake-up device 10 and the voice wake-up device 11 may also respectively communicate with other voice wake-up devices in voice wake-up system a, so as to negotiate to select the voice wake-up device closest to the user in voice wake-up system a. The voice wakeup system a may include a procedure of two-by-two communication between voice wakeup devices, and reference may be made to a communication procedure between the voice wakeup device 10 and the voice wakeup device 11. And are not deployed here.
Referring to fig. 5, fig. 5 illustrates a flowchart of a device wake-up method according to an embodiment of the present application.
The method may include steps S510-S560. Wherein:
s510, the voice awakening device 10 detects pre-awakening voice, enters a pre-awakening state, and determines the audio energy of a pre-awakening word corresponding to the pre-awakening word detected by the voice awakening device, wherein the pre-awakening word is a part of the awakening word.
S520, the voice awakening device 11 detects pre-awakening voice, enters a pre-awakening state, and determines the audio energy of a pre-awakening word corresponding to the pre-awakening word detected by the voice awakening device, wherein the pre-awakening word is a part of the awakening word.
During the user speaking a wake-up word (e.g. "small skill") the voice wake-up device 10 and the voice wake-up device 11 may enter a pre-wake-up state to negotiate to determine the answering device upon detection of a pre-wake-up voice. It will be appreciated that since the pre-wake word is part of the wake word, the voice wake apparatus 10 and the voice wake apparatus 11 may have already begun the negotiation process described above in the event that the user has not yet spoken the wake word.
The implementation method for detecting the pre-wake-up voice and determining the audio energy of the pre-wake-up word can refer to the description of the foregoing embodiment. And will not be described in detail here.
S530, the voice wakeup device 10 and the voice wakeup device 11 mutually announce the respective pre-wake word audio energy.
Wherein the voice wake-up device 10 may determine that the voice wake-up device having the same wake-up word as itself includes the voice wake-up device 11. The voice wakeup device 10 may send its own determined pre-wake word audio energy to the voice wakeup device 11 and wait for the pre-wake word audio energy from the voice wakeup device 11.
The voice wakeup device 11 may determine that the voice wakeup device having the same wake word as itself includes the voice wakeup device 10. The voice wakeup device 11 may send its own determined pre-wake word audio energy to the voice wakeup device 10 and wait for the pre-wake word audio energy from the voice wakeup device 10.
S540, the voice awakening device 10 determines that the audio energy of the pre-awakening word of the voice awakening device is maximum according to the audio energy of the pre-awakening word of the voice awakening device and the voice awakening device 11, and determines the voice awakening device as the response device.
When the pre-wake-up word audio energy from the voice wake-up device 11 is received, the voice wake-up device 10 can determine which voice wake-up device has the largest pre-wake-up word audio energy with the pre-wake-up word audio energy of the voice wake-up device 11. When it is determined that the pre-wake word audio energy of the voice wake-up device 10 is maximum, the voice wake-up device 10 may determine itself as a answering device.
S550, the voice wake-up device 11 determines that the pre-wake-up word audio energy of the voice wake-up device 10 is the largest according to the pre-wake-up word audio energy of the voice wake-up device 10 and the voice wake-up device 10, and determines the voice wake-up device 10 as the response device.
When the pre-wake-up word audio energy from the voice wake-up device 10 is received, the voice wake-up device 11 can determine which voice wake-up device has the largest pre-wake-up word audio energy with the pre-wake-up word audio energy of the voice wake-up device 10. When it is determined that the pre-wake word audio energy of the voice wake-up device 10 is maximum, the voice wake-up device 11 may determine the voice wake-up device 10 as a answering device.
In some embodiments, after each of the voice wake-up device 10 and the voice wake-up device 11 determines the answering device, the determination results of the answering devices of themselves may also be mutually notified to each other.
In some embodiments, in addition to the pre-wake word audio energy determined by each voice wake-up device, the voice wake-up device 10 and the voice wake-up device 11 may also determine answering devices in combination with device information (e.g., device type, device frequency of use, device capabilities, etc.) of each voice wake-up device. For example, the voice wake-up device 10 may first compare the magnitude between its pre-wake-up word audio energy and the pre-wake-up word audio energy of the voice wake-up device 11 to determine which voice wake-up device is the largest from the user's pre-wake-up word audio energy. If the pre-wake-up word audio energies of the plurality of voice wake-up devices are the same, the distances between the plurality of voice wake-up devices and the user can be indicated to be the same. In case it is determined that the pre-wake-up word audio energy of the voice wake-up device 10 and the voice wake-up device 11 are the same, the voice wake-up device 10 may determine the answering device according to the device capabilities. If the capability of the voice wakeup device 10 is higher than the capability of the voice wakeup device 11 (e.g., the voice wakeup device 10 has better sound effect), the voice wakeup device 10 may determine itself as the answering device. Likewise, the voice wake-up device 11 may also determine that the voice wake-up device 10 is a answering device according to the above-described method. The implementation method for determining the response device by combining the voice wake-up device with the pre-wake-up word audio energy determined by each voice wake-up device and the device information of each voice wake-up device is not limited.
S560, when the voice wake-up device 10 detects wake-up voice, the voice assistant APP is woken up and responds to the user according to the determination result of the answering device in step S540.
The following describes the method based on fig. 5, in which the time distribution of the answering machine is determined from the user speaking the wake-up word to the voice wake-up machine.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating a time distribution from when a user speaks a wake-up word to when a voice wake-up device determines a answering device.
As shown in fig. 6, the user speaks the wake-up word "small art" from time t 1. It will be appreciated that the wake word has a plurality of syllables and that a user may take a certain amount of time to speak the wake word. Wherein the user speaks the wake-up word at time t 3. I.e. the time period from time t1 to time t3 is the time period when the user speaks the wake-up word. Wherein the pre-wake word is part of the wake word. The user may have spoken the pre-wake word (e.g., "small art") in a period of time from time t1 to time t 2. The voice wakeup device 10 detects a pre-wakeup voice at time t2 and enters a pre-wakeup state. In the pre-wake state described above, the voice wake apparatus 10 may wait to receive pre-wake word audio energy of other voice wake apparatuses. The voice wakeup device 10 determines the answering device at time t 4. That is, the period from the time t2 to the time t4 is the period from the detection of the pre-wake-up voice by the voice wake-up device 10 to the determination of the answering machine.
Wherein the computing power of different voice wake-up devices is different. If the computing capability of the voice wake-up device is strong, the time required by the voice wake-up device to determine the response device according to the pre-wake-up word audio energy is short. If the computing power of the voice wake-up device is weak, the voice wake-up device determines that the time required by the response device is long according to the pre-wake-up word audio energy. Therefore, the time t4 may be before the time t 3. I.e. the voice wake-up device 10 has obtained the result of the determination of the answering machine before the user has finished speaking the wake-up word. The time t4 may be after the time t 3. I.e. the voice wake-up device 10 gets the result of the determination of the answering device after the user has finished speaking the wake-up word.
The voice wake-up device 10 detects wake-up voice at time t 5. When wake-up speech is detected and the speech wake-up device 10 is a answering device, the speech wake-up device 10 may enter a wake-up state. Wherein the voice wake-up device 10 determines the answering machine in parallel with detecting whether wake-up words are present in the collected sound. The above-mentioned determination of the answering machine and the detection of whether or not wake-up words are present in the collected sound all require a certain processing time. The time t5 may be after the time t 4. I.e. the voice wake-up device 10 detects the wake-up voice after determining the answering device. Then, in the case where the voice wake-up device 10 is a answering device, the voice wake-up device 10 may enter the wake-up state upon detecting a wake-up voice. The time t5 may be before the time t 4. I.e. the voice wake-up device 10 detects the wake-up voice, no answering device has yet been determined. The voice wake-up device 10 may enter a wake-up state when it determines itself to be a answering device after detecting wake-up voice. It can be seen that the method shown in fig. 5 can reduce the time from when the user speaks the wake-up word to when the answering device responds to the user, compared to negotiating with other voice wake-up devices to determine the answering device after the wake-up voice is detected, whether before or after the time t5 is the time t 4.
According to the implementation method, each voice wake-up device in the voice wake-up system can determine the starting time of the negotiation process by detecting the pre-wake-up voice. The voice wake-up device may begin negotiating to determine the answering device when the user has not yet spoken the wake-up word. In this way, the answering machine can enter the wake-up state when wake-up speech is detected, i.e. after the user has finished speaking wake-up words. The method not only improves the response speed of the voice wake-up equipment after the voice wake-up equipment detects the wake-up voice, but also responds under the condition that the voice wake-up equipment detects the wake-up voice, and the wake-up rate is not affected. The voice wake-up device can effectively improve the use experience of a user using the voice wake-up function in a scene that a plurality of wake-up words exist in the same voice wake-up device.
In some embodiments, in the event that the user speaks a wake word, the sound collected by the voice wake device typically contains ambient noise in addition to the user's voice. The above-mentioned ambient noise may refer to sounds other than the voice corresponding to the wake-up word. Such as the sound of a user walking, the sound of a television playing video, the sound of a sound box playing music, etc. The noise sources near different voice wakeup devices may be different. For example, the voice wake apparatus 10 is close to the air conditioner, and the volume of the environmental noise generated by the operation of the air conditioner is large in the sound collected by the voice wake apparatus 10. The voice wake-up device 11 is far away from the air conditioner, and the volume of environmental noise generated by the operation of the air conditioner in the sound collected by the voice wake-up device 11 is small. Then, a larger error may be generated when the voice wake-up device is determined to be far from the user by using the audio energy of the pre-wake-up word determined by the audio containing the voice corresponding to the pre-wake-up word and the environmental noise.
In order to improve the accuracy of the determination of the answering device, after the user speaks the wake-up word, the voice wake-up device closest to the user enters a wake-up state, and the voice wake-up device can reduce energy generated by environmental noise in the audio energy of the pre-wake-up word. In the following, a voice wake-up system composed of the voice wake-up device 10 and the voice wake-up device 11 is taken as an example, and another device wake-up method provided in the embodiment of the present application is described.
Referring to fig. 7, fig. 7 is a flowchart schematically illustrating another device wake-up method according to an embodiment of the present application.
The method may comprise steps S710 to S780. Wherein:
s710, the voice awakening device 10 collects the environmental sound and determines the environmental audio energy corresponding to the environmental sound.
S720, the voice awakening device 11 collects the environmental sound and determines the environmental audio energy corresponding to the environmental sound.
In one possible implementation, the voice wakeup device 10 and the voice wakeup device 11 may process the collected ambient sound at regular or irregular times to determine the ambient audio energy. The ambient sound may represent sound that can be collected by an audio input device of the voice wake apparatus. The environmental audio energy may include parameters such as sound intensity and sound pressure of the environmental sound. The method for calculating the environmental audio energy is not limited in the embodiment of the application.
S730, the voice awakening device 10 detects pre-awakening voice, enters a pre-awakening state, determines pre-awakening word audio energy corresponding to the pre-awakening word detected by the voice awakening device, obtains denoising pre-awakening word audio energy according to the pre-awakening word audio energy and environment audio energy, and the pre-awakening word is part of the awakening word.
S740, the voice awakening device 11 detects pre-awakening voice, enters a pre-awakening state, determines pre-awakening word audio energy corresponding to the pre-awakening word detected by the voice awakening device, obtains denoising pre-awakening word audio energy according to the pre-awakening word audio energy and environment audio energy, and the pre-awakening word is part of the awakening word.
The voice wake-up device 10 and the voice wake-up device 11 can also perform voice recognition on the collected environmental sound to determine whether the environmental sound contains a pre-wake-up word. When pre-wake-up speech is detected, the speech wake-up device 10 and the speech wake-up device 11 may enter a pre-wake-up state.
The voice wake apparatus 10 may determine the audio energy of the pre-wake word corresponding to the pre-wake word according to the audio obtained by the sound including the pre-wake word. Because the sound containing the pre-wake-up word contains environmental noise, the audio energy containing the environmental noise in the audio energy containing the pre-wake-up word. The voice wakeup device 10 may obtain environmental audio energy determined from the last acquired environmental sound that does not contain a pre-wakeup word. The voice wakeup device 10 may then use the pre-wake word audio energy to subtract the ambient audio energy to obtain the de-noised pre-wake word audio energy. That is, the voice wakeup device 10 may determine the denoising pre-wake word audio energy according to the following formula:
Denoising pre-wake-up word audio energy = pre-wake-up word audio energy-ambient audio energy
It will be appreciated that, since the above-mentioned environmental sound not including the pre-wake word is collected by the voice wake device 10 last time before the pre-wake voice is detected, the sound in the environment may be considered to be almost unchanged except for the pre-wake voice of the user during the process from the time when the voice wake device 10 collects the above-mentioned environmental sound not including the wake word to the time when the user speaks the pre-wake word. The sound containing the pre-wake-up word may be equivalent to the pre-wake-up voice of the user and the environmental sound not containing the pre-wake-up word. Thus, the de-noised pre-wake-up word audio energy may correspond to the audio energy generated by the pre-wake-up speech. Compared with the pre-wake-up word audio energy, the size of the denoising pre-wake-up word audio energy can more accurately reflect the distance between the voice wake-up device and the user.
In one possible implementation, the voice wakeup device 10 may perform noise cancellation processing on the audio obtained from the sound containing the pre-wake word before determining the pre-wake word audio energy. The voice wakeup device 10 may perform noise cancellation processing on the ambient sound that does not include the pre-wakeup word before determining the ambient audio energy. The noise cancellation process described above may be used to cancel a portion of the noise signal in the audio. That is, the pre-wake-up word audio energy and the environmental audio energy may be obtained through noise reduction. The noise cancellation process performed by the voice wake-up device 10 on the above-described ambient sound containing no wake-up word and the above-described audio obtained by the sound containing the pre-wake-up word may be the same. This eliminates the same noise signal in the audio obtained by the above-described ambient sound containing no wake-up word and the above-described sound containing a pre-wake-up word. The implementation method of the noise cancellation process is not limited in the embodiment of the present application.
The method for determining the denoising pre-wake-up word audio energy by the voice wake-up device 11 may refer to the method for determining the denoising pre-wake-up word audio energy by the voice wake-up device 10.
S750, the voice wakeup device 10 and the voice wakeup device 11 mutually announce the respective denoising pre-wake word audio energy.
S760, the voice awakening device 10 determines that the audio energy of the denoising pre-awakening word is maximum according to the audio energy of the denoising pre-awakening word of the voice awakening device 11 and the voice awakening device, and determines the voice awakening device as the response device.
S770, the voice wake-up device 11 determines that the denoising pre-wake-up word audio energy of the voice wake-up device 10 is the largest according to the denoising pre-wake-up word audio energy of the voice wake-up device 10 and the voice wake-up device 10, and determines the voice wake-up device 10 as the response device.
The implementation process of the voice wake-up device 10 and the voice wake-up device 11 by using the denoising pre-wake-up word audio energy may refer to the implementation process of the voice wake-up device 10 and the voice wake-up device 11 by using the pre-wake-up word audio energy in the method shown in fig. 5. And will not be described in detail here.
S780, when the voice wake-up device 10 detects wake-up voice, the voice assistant APP is woken up and responds to the user according to the determination result of the answering device in the step S760.
As can be seen from the method shown in fig. 7, each voice wake-up device in the voice wake-up system can remove the audio energy generated by the environmental noise in the pre-wake-up word audio energy in the process of determining the answering device by utilizing the pre-wake-up word audio energy negotiation. The influence of environmental noise on the determination of the response device can be reduced, and the accuracy of the determination result of the response device is improved. Through the denoising pre-wake-up word audio energy, the voice wake-up device can start negotiating to determine the response device when the user does not speak the wake-up word yet. In this way, the answering machine can enter the wake-up state when wake-up speech is detected, i.e. after the user has finished speaking wake-up words. The method not only improves the response speed of the voice wake-up equipment after the voice wake-up equipment detects the wake-up voice, but also responds under the condition that the voice wake-up equipment detects the wake-up voice, and the wake-up rate is not affected. The voice wake-up device can effectively improve the use experience of a user using the voice wake-up function in a scene that a plurality of wake-up words exist in the same voice wake-up device.
In some embodiments, a master device may be included in the voice wake system. The master device may be one of a plurality of voice wakeup devices having the same wake-up word contained in the voice wakeup system. Alternatively, the master device may be an electronic device other than the plurality of voice wakeup devices described above. The main equipment can be used for receiving the pre-wake-up word audio energy determined by each voice wake-up equipment in the voice wake-up system and determining a response equipment from the voice wake-up system according to the pre-wake-up word audio energy.
Referring to fig. 8, fig. 8 schematically illustrates a structure of a master device 200 according to an embodiment of the present application.
As shown in fig. 8, the master device 200 may include: an arbitration module 810, a communication module 820, and a storage module 830 coupled by a bus. Wherein:
the communication module 820 may be used to establish a communication connection with a voice wake apparatus in a voice wake system. The communication connection may include: a wired communication connection, a wireless communication connection (e.g., a bluetooth communication connection, a Wi-Fi communication connection, etc.). The specific method of the communication connection is not limited in the embodiment of the application. The master device 200 may receive pre-wake word audio energy from the voice wake device determination and send the determination of the answering device to each voice wake device via the communication module 820.
The communication module 820 may send the received pre-wake word audio energy to the arbitration module 810.
The arbitration module 810 can determine which voice wake-up device has the maximum pre-wake-up word audio energy according to the pre-wake-up word audio energy determined by the different voice wake-up devices. The arbitration module 810 may determine the one voice wakeup device corresponding to the largest pre-wakeup word audio energy as the answering device.
The arbitration module 810 may refer to the arbitration module 420 described above with respect to FIG. 4.
The storage module 830 may be used to store a computer program. For example, a computer program for determining answering machines using pre-wake-up word audio energy, etc.
Not limited to the modules shown in fig. 8, the master device 200 may also contain more or fewer modules. For example, the master device 200 may be a voice wakeup device in a voice wakeup system. Then, the main device 200 may further comprise a pre-wake module, a voice assistant APP, a voice input module and an audio output module. The storage module 830 may also store a pre-wake model and a wake model. When the master device 200 determines that the master device is a response device according to the pre-wake word audio energy determined by the master device and the pre-wake word audio energy determined by other voice wake devices, the master device 200 may enter a wake-up state after detecting wake-up voice.
In some embodiments, the voice wake system may further include a cloud server in addition to the voice wake device that includes multiple wake words that are the same. The cloud server can be used for receiving the pre-wake word audio energy determined by each voice wake device in the voice wake system and determining a response device from the voice wake system according to the pre-wake word audio energy. And the cloud server can send the determination result of the response device to each voice awakening device in the voice awakening system.
The following describes another device wake-up method provided in the embodiments of the present application based on a voice wake-up system including a plurality of voice wake-up devices (e.g., voice wake-up device 10, voice wake-up device 11, etc.) with the same wake-up words and a master device 200.
Fig. 9 is a flowchart illustrating another device wake-up method according to an embodiment of the present application.
As shown in fig. 9, the method may include steps S910 to S980. Wherein:
s910, the voice wake-up device 10 detects the pre-wake-up voice, enters a pre-wake-up state, and determines the audio energy of the pre-wake-up word corresponding to the pre-wake-up word detected by itself, wherein the pre-wake-up word is a part of the wake-up word.
S920, the voice awakening device 11 detects pre-awakening voice, enters a pre-awakening state, and determines the audio energy of a pre-awakening word corresponding to the pre-awakening word detected by the voice awakening device, wherein the pre-awakening word is a part of the awakening word.
Step S910 and step S920 may refer to step S510 and step S520 shown in fig. 5 described above. Not limited to the voice wakeup device 10 and the voice wakeup device 11, more voice wakeup devices may be included in the voice wakeup system. In the process that the user speaks the wake-up word, a plurality of voice wake-up devices in the voice wake-up system detect pre-wake-up voice and determine the audio energy of the pre-wake-up word corresponding to the pre-wake-up word detected by the voice wake-up devices.
S930, the voice wakeup device 10 sends pre-wakeup word audio energy to the master device 200.
S940, the voice wakeup device 11 sends the pre-wakeup word audio energy to the master device 200.
Multiple voice wakeup devices in a voice wakeup system may send their own determined pre-wake word audio energy to the master device 200.
S950, the main device 200 determines that the pre-wake-up word audio energy of the voice wake-up device 10 is the largest according to the pre-wake-up word audio energy of the voice wake-up devices, and determines the voice wake-up device 10 as the answering device.
In one possible implementation, the master device 200 may determine the number of voice wakeup devices included in the voice wakeup system. The master device 200 may begin to determine the answering device based on the pre-wake-up word audio energy after receiving the pre-wake-up word audio energy transmitted by all of the voice wake-up devices included in the voice wake-up system. This can reduce instances where the missed voice wakes up the device, resulting in inaccurate determination results for the answering device.
Alternatively, the master device 200 may wait for the voice wakeup device to transmit the pre-wakeup word audio energy for a preset wait period. If the preset waiting time period is within the preset waiting time period, the master device 200 receives the pre-wake-up word audio energy sent by a part of the voice wake-up devices in the voice wake-up system. The master device 200 may select one answering device from among the voice wakeup devices to which the received pre-wake-up word audio energy belongs, using the pre-wake-up word audio energy received in the above-mentioned preset waiting period. It will be appreciated that a voice wakeup device that does not transmit pre-wake word audio energy to the master device 200 for the above-described preset latency period may be considered a device that does not detect pre-wake voice. For example, during a user speaking a wake word, a voice wake device that is remote from the user may have difficulty capturing the sound that contains the pre-wake word. Then the master device 200 may begin determining the answering device again without having to wait until the pre-wake-up word audio energy sent by all of the voice wake-up devices in the voice wake-up system. This can increase the efficiency of determining the answering device, thereby increasing the response speed of the voice wake-up device in the voice wake-up system after detecting wake-up voice.
The specific method for determining the answering device by the main device 200 can refer to the foregoing embodiment. And will not be described in detail here.
S960, the master device 200 transmits the determination result of the answering device to the voice wakeup device 11.
S970, the master device 200 transmits the determination result of the answering device to the voice wakeup device 10.
When the determination of the answering device is made, the main device 200 can send the determination of the answering device to a plurality of voice wakeup devices in the voice wakeup system. For example, the determination result of the answering device indicates that the voice wakeup device 10 is an answering device. The master device 200 may transmit the determination result of the answering device to the voice wake-up device that transmitted the voice wake-up audio energy to the master device 200.
S980, when the voice wake-up device 10 detects wake-up voice, the voice wake-up device may enter a wake-up state according to the received determination result of the response device, and the voice assistant APP of the voice wake-up device 10 is woken up and responds to the user.
In the case where the determination result of the answering machine indicates that the voice wakeup machine 10 is the answering machine, the voice wakeup machine 10 may enter the awake state after detecting the wake-up voice.
Step S960 described above is optional. In some embodiments, the master device 200 may also send only the determination result of the above-described answering device to the answering device. For example, when it is determined that the answering machine is the voice wakeup machine 10, the main machine 200 may transmit the determination result of the answering machine to the voice wakeup machine 10. After detecting the wake-up voice, the voice wake-up device 10 may enter a wake-up state according to the determination result of the answering device. And the voice wake-up device 11 may wait for the determination result of the answering device after detecting the wake-up voice. When the wait times out, the voice wakeup device 11 may stop waiting. That is, the voice wake-up device that does not receive the determination result of the host device 200 transmitting the response device does not enter the wake-up state even if the wake-up voice is detected.
In some embodiments, multiple voice wake devices in the voice wake system may determine a de-noised pre-wake word audio energy based on the pre-wake word audio energy and send the de-noised pre-wake word audio energy to the master device 200. I.e., the master device 200 may utilize the de-noised pre-wake-up word audio energy to determine the answering device.
In some embodiments, the master device 200 shown in fig. 9 may also be replaced with a cloud server. That is, the cloud server may determine the answering device according to the pre-wake-up word audio energy, and send the determination result of the answering device to a plurality of voice wake-up devices in the voice wake-up system.
As can be seen from the method shown in fig. 9, each voice wake-up device in the voice wake-up system can send the pre-wake-up word audio energy to the master device 200 by detecting the pre-wake-up voice. The master device 200 may utilize the pre-wake-up word audio energy to determine the answering device. The above method may advance the process of determining the answering device compared to determining the answering device after the wake-up voice is detected. It is possible that each voice wake-up device obtains a determination of the answering device before the wake-up voice is detected. The voice wake-up device indicated by the determination result of the answering device may enter the wake-up state immediately after the wake-up voice is detected. The method not only improves the response speed of the voice wake-up equipment after the voice wake-up equipment detects the wake-up voice, but also responds under the condition that the voice wake-up equipment detects the wake-up voice, and the wake-up rate is not affected. The voice wake-up device can effectively improve the use experience of a user using the voice wake-up function in a scene that a plurality of wake-up words exist in the same voice wake-up device.
Referring to fig. 10, fig. 10 is a schematic diagram illustrating a voice wake system 1000 according to an embodiment of the present application.
As shown in fig. 10, in some embodiments, the voice wake system 1000 may include one or more voice wake devices (e.g., voice wake device 10, voice wake device 11, etc.). In other embodiments, the voice wake system 1000 may include the master device 200 in addition to one or more voice wake devices. In other embodiments, in addition to including one or more voice wake apparatuses, the voice wake system 1000 may also include a cloud server 201. That is, the above-described master device 200 and cloud server 201 are optional.
Illustratively, the voice wake system 1000 may include one or more voice wake devices. The one or more voice wakeup devices may determine that one voice wakeup device is responsive to the user after detecting the wakeup voice according to the method described above with respect to fig. 5.
Optionally, the voice wake system 1000 may include one or more voice wake devices, as well as the master device 200. The voice wake system 1000 may determine that a voice wake device is responsive to the user after detecting wake-up voice according to the method described above with respect to fig. 9. The master device 200 may also be a voice wake-up device.
Optionally, the voice wake system 1000 may include one or more voice wake devices, as well as a cloud server 201. The voice wake system 1000 may determine that a voice wake device is responsive to the user after detecting wake-up voice according to the method described above with respect to fig. 9.
It should be noted that, any feature in any embodiment of the present application, or any part of any feature may be combined under the condition that no contradiction or conflict occurs, and the combined technical solution is also within the scope of the embodiment of the present application.
The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.
Claims (16)
1. A method of waking up a device, the method comprising:
the method comprises the steps that first electronic equipment detects first pre-awakening voice containing pre-awakening words, and first audio energy is obtained according to the first pre-awakening voice;
The first electronic device receives M audio energies sent by M electronic devices, wherein one audio energy in the M audio energies is obtained by one electronic device in the M electronic devices according to the detected pre-wake-up voice containing the pre-wake-up word, and M is a positive integer;
the first electronic device determines the first electronic device as a device for responding according to the first audio energy and the M audio energies;
when a first wake-up voice containing a wake-up word is detected, a first application in the first electronic equipment enters a wake-up state;
the pre-wake word is a part of the wake word, and the first application is used for detecting and responding to a voice instruction in the wake state to execute an operation corresponding to the voice instruction.
2. The method of claim 1, wherein after the first electronic device detects a first pre-wake-up speech that includes a pre-wake-up word, the method further comprises:
and when the collected sound is detected not to contain the wake-up word, the first application in the first electronic equipment does not enter the wake-up state.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
The first electronic device detects a second pre-wake-up speech comprising the pre-wake-up word, obtains a second audio energy based on the second pre-wake-up speech,
the first electronic device receives K pieces of audio energy sent by K pieces of electronic devices, wherein one piece of audio energy in the K pieces of audio energy is obtained by one piece of electronic device in the K pieces of electronic devices according to detected pre-wake-up voice containing the pre-wake-up word, and K is a positive integer;
the first electronic device determines that a second electronic device in the K electronic devices is a device for responding according to the second audio energy and the K audio energies;
and under the condition that the second electronic device is determined to be a device for responding, the first application in the first electronic device does not enter the awakening state.
4. A method according to claim 3, characterized in that the method further comprises:
and under the condition that the second electronic equipment is determined to be the equipment for responding, the first electronic equipment sends a first message to the second electronic equipment, wherein the first message comprises a first result, the first result is used for indicating the second electronic equipment to be the equipment for responding, and the first message is used for indicating the second electronic equipment to enable the first application in the second electronic equipment to enter the awakening state after the second electronic equipment detects awakening voice containing the awakening word.
5. A method according to any of claims 1-3, wherein after the deriving first audio energy from the first pre-wake-up speech, the method further comprises:
the first electronic device sends the first audio energy to the M electronic devices.
6. The method according to any one of claims 1-5, further comprising:
the first electronic equipment collects first sound, wherein the first sound does not contain the pre-awakening words;
the first electronic device obtains third audio energy according to the first sound;
the first electronic device obtains fourth audio energy according to the first pre-awakening voice;
the obtaining the first audio energy according to the first pre-wake-up voice specifically includes:
the first electronic device subtracts the third audio energy from the fourth audio energy to obtain the first audio energy.
7. A device wake-up method, wherein the method is applied to a voice wake-up system, the voice wake-up system includes H electronic devices, the H electronic devices include a first electronic device, and H is a positive integer greater than 1, the method includes:
The first electronic equipment detects first pre-awakening voice containing a pre-awakening word, and obtains first audio energy according to the first pre-awakening voice;
the H1 pieces of electronic equipment in the H pieces of electronic equipment send H1 pieces of audio energy to the first electronic equipment, the H1 pieces of electronic equipment do not contain the first electronic equipment, and one piece of audio energy in the H1 pieces of audio energy is obtained by one piece of electronic equipment in the H1 pieces of electronic equipment according to detected pre-wake-up voice containing the pre-wake-up word; the H1 is a positive integer smaller than H;
the first electronic device determines the first electronic device as a device for responding according to the first audio energy and the H1 audio energy;
when a first wake-up voice containing a wake-up word is detected, a first application in the first electronic equipment enters a wake-up state;
the pre-wake word is a part of the wake word, and the first application is used for detecting and responding to a voice instruction in the wake state to execute an operation corresponding to the voice instruction.
8. The method of claim 7, wherein the method further comprises:
the first application in each of the H1 electronic devices does not enter the awake state.
9. The method of any of claims 7 or 8, wherein after the first electronic device detects a first pre-wake speech comprising a pre-wake word, the method further comprises:
and when the collected sound is detected not to contain the wake-up word, the first application in the first electronic equipment does not enter the wake-up state.
10. The method according to any one of claims 7-9, further comprising:
the first electronic device detects second pre-awakening voice containing the pre-awakening word, and second audio energy is obtained according to the second pre-awakening voice;
the H2 electronic devices in the H electronic devices send H2 audio energy to the first electronic device, the H2 electronic devices do not contain the first electronic device, and one audio energy in the H2 audio energy is obtained by one electronic device in the H2 electronic devices according to the detected pre-wake-up voice containing the pre-wake-up word; the H2 is a positive integer less than H;
the first electronic device determines that a second electronic device in the H2 electronic devices is a device for response according to the second audio energy and the H2 audio energy;
When a second wake-up voice containing the wake-up word is detected, the first application in the second electronic device enters the wake-up state, and neither the first application in the first electronic device nor the first application in each of (H2-1) electronic devices enters the wake-up state, wherein the (H2-1) electronic devices are devices of the H2 electronic devices except the second electronic device.
11. The method according to claim 10, wherein the first application in the second electronic device enters the wake state when a second wake speech comprising the wake word is detected, in particular comprising:
the first electronic device sends a first message to the second electronic device, wherein the first message comprises a first result, and the first result is used for indicating that the second electronic device is a device for responding;
based on the first message, when the second wake-up voice is detected, the first application in the second electronic device enters the wake-up state.
12. The method according to claim 10, wherein the method further comprises:
the first electronic device transmitting the second audio energy to the second electronic device;
The (H2-1) electronic equipment sends (H2-1) audio energy to the second electronic equipment, wherein the (H2-1) audio energy is audio energy obtained by the (H2-1) electronic equipment in the H2 audio energy;
the second electronic device determines that the second electronic device is a device for responding according to the second audio energy, the (H2-1) audio energy and the fifth audio energy obtained by the second electronic device according to the detected third pre-awakening voice containing the pre-awakening word, wherein the fifth audio energy is contained in the H2 audio energy.
13. The method according to any one of claims 7-12, further comprising:
the first electronic equipment collects first sound, wherein the first sound does not contain the pre-awakening words;
the first electronic device obtains third audio energy according to the first sound;
the first electronic device obtains fourth audio energy according to the first pre-awakening voice;
the obtaining the first audio energy according to the first pre-wake-up voice specifically includes:
the first electronic device subtracts the third audio energy from the fourth audio energy to obtain the first audio energy.
14. An electronic device comprising a microphone for capturing sound, a communication means, a memory for storing a computer program, and a processor for invoking the computer program to cause the electronic device to perform the method of any of claims 1-6.
15. A computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1-6.
16. A computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210075546.8A CN116524919A (en) | 2022-01-22 | 2022-01-22 | Equipment awakening method, related device and communication system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210075546.8A CN116524919A (en) | 2022-01-22 | 2022-01-22 | Equipment awakening method, related device and communication system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116524919A true CN116524919A (en) | 2023-08-01 |
Family
ID=87392678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210075546.8A Pending CN116524919A (en) | 2022-01-22 | 2022-01-22 | Equipment awakening method, related device and communication system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116524919A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117116263A (en) * | 2023-09-15 | 2023-11-24 | 广州易云信息技术有限公司 | Intelligent robot awakening method and device based on voice recognition and storage medium |
-
2022
- 2022-01-22 CN CN202210075546.8A patent/CN116524919A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117116263A (en) * | 2023-09-15 | 2023-11-24 | 广州易云信息技术有限公司 | Intelligent robot awakening method and device based on voice recognition and storage medium |
CN117116263B (en) * | 2023-09-15 | 2024-04-12 | 广州易云信息技术有限公司 | Intelligent robot awakening method and device based on voice recognition and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112397062B (en) | Voice interaction method, device, terminal and storage medium | |
CN110138959B (en) | Method for displaying prompt of human-computer interaction instruction and electronic equipment | |
CN114255745A (en) | Man-machine interaction method, electronic equipment and system | |
WO2020073288A1 (en) | Method for triggering electronic device to execute function and electronic device | |
CN115312068B (en) | Voice control method, equipment and storage medium | |
CN113488042B (en) | Voice control method and electronic equipment | |
US20210110837A1 (en) | Electronic device supporting improved speech recognition | |
WO2024093515A1 (en) | Voice interaction method and related electronic device | |
CN115083401A (en) | Voice control method and device | |
EP4343756A1 (en) | Cross-device dialogue service connection method, system, electronic device, and storage medium | |
WO2022143258A1 (en) | Voice interaction processing method and related apparatus | |
CN116524919A (en) | Equipment awakening method, related device and communication system | |
CN115333941A (en) | Method for acquiring application running condition and related equipment | |
CN115206308A (en) | Man-machine interaction method and electronic equipment | |
CN114765026A (en) | Voice control method, device and system | |
CN115132212A (en) | Voice control method and device | |
CN113380240B (en) | Voice interaction method and electronic equipment | |
EP4293664A1 (en) | Voiceprint recognition method, graphical interface, and electronic device | |
CN112102848A (en) | Method, chip and terminal for identifying music | |
CN116384342A (en) | Semantic conversion method, semantic conversion device, semantic conversion apparatus, semantic conversion storage medium, and semantic conversion computer program | |
CN117373445A (en) | Voice instruction processing method, device and system and storage medium | |
CN115550702A (en) | Awakening method and system | |
CN114093368A (en) | Cross-device voiceprint registration method, electronic device and storage medium | |
EP4425483A1 (en) | Voice interaction method and related apparatus | |
CN117133281B (en) | Speech recognition method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |