WO2022111579A1 - Procédé de réveil vocal et dispositif électronique - Google Patents

Procédé de réveil vocal et dispositif électronique Download PDF

Info

Publication number
WO2022111579A1
WO2022111579A1 PCT/CN2021/133119 CN2021133119W WO2022111579A1 WO 2022111579 A1 WO2022111579 A1 WO 2022111579A1 CN 2021133119 W CN2021133119 W CN 2021133119W WO 2022111579 A1 WO2022111579 A1 WO 2022111579A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
user
microphone
wake
relative
Prior art date
Application number
PCT/CN2021/133119
Other languages
English (en)
Chinese (zh)
Inventor
江昱成
赵安
林龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022111579A1 publication Critical patent/WO2022111579A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase

Definitions

  • the present application relates to the technical field of terminals, and in particular, to a voice wake-up method and an electronic device.
  • the user can wake up the electronic device by speaking a wake-up word, thereby realizing the interaction between the user and the electronic device.
  • the wake-up word is preset in the electronic device by the user, or the wake-up word is set before the electronic device leaves the factory.
  • the user may set the same wake-up word for multiple devices in order to facilitate memory. For example, the user sets the wake-up word for smart screens, smart speakers, and smart switches to be "Xiaoyi". Little Art”.
  • the present application provides a voice wake-up method and an electronic device, which help to improve the accuracy of voice wake-up of an electronic device in a multi-device scenario, thereby improving user experience.
  • an embodiment of the present application provides a voice wake-up method, which can be applied to a first electronic device, and relates to the field of terminal artificial intelligence (artificial intelligence, AI).
  • AI artificial intelligence
  • the first electronic device receives the user's voice wake-up instruction; the first electronic device also acquires the user image, and detects the user's face orientation; The user's position and the orientation of the user's face, and the target device to which the user's face is facing is determined from the first electronic device and at least one second electronic device; finally, the first electronic device instructs the target device to wake up in response to the voice instruction.
  • the first electronic device may have an image acquisition function, and the first electronic device acquires the user image from the image acquisition module, or the first electronic device may not have the image acquisition function, and the first electronic device acquires the user image from the second electronic device.
  • the first electronic device can determine the device that the user wants to wake up by using the relative positions of the first electronic device and at least one second electronic device and the face orientation of the user collected by the device. This method helps In order to improve the accuracy of device wake-up in multi-device scenarios, the application effect is relatively good.
  • the first electronic device when the second electronic device determines that the number of candidate devices to which the user's face faces is greater than or equal to two, the first electronic device needs to determine the relationship between the user and the at least two candidate devices. relative distance; then determine the priority of the candidate device according to the relative distance, wherein the smaller the relative distance, the smaller the priority of the candidate device; finally determine the candidate device corresponding to the highest priority as the target device.
  • the first electronic device needs to determine the user and the at least two candidate devices The relative distance; the candidate device corresponding to the minimum relative distance is finally determined as the target device.
  • the first electronic device may acquire information of the first audio of the first electronic device, and acquire information of the second audio from at least one second electronic device; then according to the information of the first audio and the second audio audio information to determine the user location.
  • the sound collected by the multi-microphone array of the electronic device can effectively determine the user's position, so as to ensure the accuracy of the user's position positioning result.
  • the first electronic device includes a first microphone and a second microphone; the information of the first audio includes: a first arrival time when the voice wake-up command reaches the first microphone, and a first time when the voice wake-up command reaches the second microphone.
  • the first electronic device may determine the user's location according to the information of the first audio and the information of the second audio, which specifically includes the following steps:
  • the sound collected by the multi-microphone array of the electronic device can effectively determine the user's position, so as to ensure the accuracy of the user's position positioning result.
  • the method further includes: the first electronic device acquiring historical audio information from the first electronic device and the at least one second electronic device;
  • the first electronic device obtains the arrival times and phases of the voice wake-up commands issued by the user N times to different electronic devices, where N is a positive integer; Determine the relative azimuth and distance difference corresponding to the voice wake-up command issued by the user N times;
  • the first electronic device uses the relative azimuth angle and the distance difference corresponding to the voice wake-up commands issued by the user N times as the observed values, and establishes an objective function; the first electronic device solves the objective function by an exhaustive search method to obtain the first electronic device.
  • the first electronic device may locate the relative positions of multiple devices in space according to the above method, and construct a location map including the relative positions of the devices.
  • the device can also reversely deduce the relative positions of multiple pickup devices through multiple voice recognition of the user's voice even in an interference environment. With the increase of voice wake-up messages, the positioning between devices will become more accurate.
  • the first electronic device and the at least one second electronic device are connected to the same local area network, or the first electronic device and the at least one second electronic device are pre-bound with the same user account , or the first electronic device and the at least one second electronic device are bound to different user accounts, and the different user accounts establish a binding relationship.
  • an embodiment of the present application provides a voice wake-up method, the method can be applied to a second electronic device, and the method includes:
  • the second electronic device collects the sound of the surrounding environment and converts it into the second audio, then the second electronic device sends the second audio to the first electronic device, and when the second electronic device detects the wake-up word in the second audio, it sends the second audio to the second electronic device.
  • the first electronic device sends a wake-up message, and the first electronic device can determine the user's location according to the information of the first audio and the information of the second audio of the first electronic device, and can determine the user's position according to the information of the first electronic device and the at least one second electronic device.
  • Relative position, user position and the face orientation of the user when it is determined that the target device is the second electronic device, send a wake-up response to the second electronic device, and after the second electronic device receives the wake-up response from the first electronic device , in response to the user's voice wake-up command.
  • the first electronic device determines that the target device is not the second electronic device according to the relative positions of the first electronic device and at least one second electronic device, the user's position, and the user's face orientation, The first electronic device does not send a wake-up response to the second electronic device, or sends a wake-up prohibition response to the second electronic device, and the second electronic device does not respond to the user's voice wake-up instruction.
  • the present application provides a voice wake-up system, which includes a first electronic device and at least one second electronic device.
  • the first electronic device can implement the method of any possible implementation manner of the first aspect, and at least two second electronic devices can implement the method of any possible implementation manner of the second aspect.
  • an electronic device provided by an embodiment of the present application includes: one or more processors and a memory, wherein program instructions are stored in the memory, and when the program instructions are executed by the device, the above aspects of the embodiments of the present application are implemented and any possible design method involved in the various aspects.
  • an embodiment of the present application provides a chip system, wherein the chip system is coupled with a memory in an electronic device, so that the chip system invokes program instructions stored in the memory when running to implement the embodiments of the present application.
  • a computer-readable storage medium stores program instructions, and when the program instructions are executed on an electronic device, the device enables the device to perform the above aspects of the embodiments of the present application. and any possible design method involved in the various aspects.
  • a computer program product of an embodiment of the present application when the computer program product runs on an electronic device, enables the electronic device to execute and implement the above-mentioned aspects and any possibility involved in the various aspects of the embodiments of the present application. method of design.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a mobile phone according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another application scenario provided by an embodiment of the present application.
  • FIG. 4 is an interactive schematic diagram of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a wake-up method provided by an embodiment of the present application.
  • FIG. 6A is a schematic diagram of a user location positioning method according to an embodiment of the present application.
  • FIGS. 6B to 6D are schematic diagrams of another application scenario provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of another application scenario provided by an embodiment of the present application.
  • FIG. 8A is a schematic diagram of a device location positioning method according to an embodiment of the present application.
  • 8B is a schematic diagram of a wake-up speech analysis method provided by an embodiment of the present application.
  • FIG. 8C is a schematic diagram of a device map provided by an embodiment of the present application.
  • FIG. 9 is an interactive schematic diagram of a device location positioning method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a group of perception capability layers according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a device according to an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of another device according to an embodiment of the present application.
  • the electronic device in the embodiment of the present application is an electronic device with a voice wake-up function, that is, a user can wake up the electronic device by voice. Specifically, the user wakes up the electronic device by speaking the wake-up word.
  • the wake-up word may be preset in the electronic device by the user according to his own needs, or may be set by the electronic device before leaving the factory, and the setting method of the wake-up word is not limited in this embodiment of the present application.
  • the user who wakes up the electronic device may be arbitrary or specific.
  • the specific user may be a user who pre-stores the sound of emitting the wake-up word in the electronic device, such as the owner of the device.
  • electronic devices trigger device wake-up by detecting whether a wake word is included in the audio. Specifically, when the wake-up word is included in the audio, the electronic device is awakened, otherwise the electronic device is not awakened. After the electronic device is awakened, the user can interact with the electronic device through voice. For example, the wake-up word is "Xiaoyi Xiaoyi", and when the electronic device detects that "Xiaoyi Xiaoyi" is included in the audio, the electronic device is woken up. The electronic device acquires audio by collecting or receiving ambient sound through a multi-microphone array on the device.
  • the voice including the "wake-up word" spoken by the user may be received or captured by multiple electronic devices, causing two or more electronic devices to wake up , which brings confusion to the user's voice interaction process and affects the user experience.
  • the priority of each device is usually specified manually. Assuming that the priority of the smart screen in Figure 1 is higher than the priority of the smart speaker, when both the smart screen and the smart speaker collect the "Xiaoyi Xiaoyi" spoken by the user, only the smart screen is awakened.
  • this method can restrict the wake-up of multiple devices by setting rules, it is not intelligent enough because the rules need to be manually set in advance, and the user can only actively modify the setting rules manually to adjust the wake-up according to actual needs. The priority of the device, so there is a problem of poor flexibility.
  • the embodiment of the present application provides a voice wake-up method.
  • the relative positions of the user and multiple devices in space can be located, and a location map can be constructed; In this way, by combining the location map and the user's face orientation collected by the main device, the device that the user wants to wake up can be determined.
  • This method helps to improve the accuracy of device wake-up in multi-device scenarios. , the application effect is relatively good.
  • FIG. 2 shows a schematic structural diagram of the electronic device 200 .
  • the electronic device may be a portable terminal including functions such as a personal digital assistant and/or a music player, such as a mobile phone, a tablet computer, a wearable device (such as a smart watch) with wireless communication capabilities, a vehicle-mounted device, etc. .
  • exemplary embodiments of the portable terminal include, but are not limited to, the Harmony operating system Or portable terminals of other operating systems.
  • the aforementioned portable terminal may also be, for example, a laptop computer (Laptop) having a touch-sensitive surface (eg, a touch panel). It should also be understood that, in some other embodiments, the above-mentioned terminal may also be a desktop computer having a touch-sensitive surface (eg, a touch panel).
  • FIG. 2 shows a schematic structural diagram of an electronic device 200 .
  • the electronic device 200 may include a processor 210, an external memory interface 220, an internal memory 221, a universal serial bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2 , mobile communication module 250, wireless communication module 260, audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, sensor module 280, buttons 290, motor 291, indicator 292, camera 293, display screen 294, and Subscriber identification module (subscriber identification module, SIM) card interface 295 and so on.
  • SIM Subscriber identification module
  • the sensor module 280 may include a pressure sensor 280A, a gyroscope sensor 280B, an air pressure sensor 280C, a magnetic sensor 280D, an acceleration sensor 280E, a distance sensor 280F, a proximity light sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, and ambient light.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 200 .
  • the electronic device 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 210 may include one or more processing units, for example, the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • the electronic device 200 implements a display function through a GPU, a display screen 294, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 294 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the electronic device 200 may implement a shooting function through an ISP, a camera 293, a video codec, a GPU, a display screen 294, an application processor, and the like.
  • the SIM card interface 295 is used to connect a SIM card.
  • the SIM card can be contacted and separated from the electronic device 200 by inserting into the SIM card interface 295 or pulling out from the SIM card interface 295 .
  • the electronic device 200 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 295 can support Nano SIM cards, Micro SIM cards, SIM cards, and the like.
  • the same SIM card interface 295 can insert multiple cards at the same time.
  • the types of the plurality of cards may be the same or different.
  • the SIM card interface 295 can also be compatible with different types of SIM cards.
  • the SIM card interface 295 is also compatible with external memory cards.
  • the electronic device 200 interacts with the network through the SIM card to realize functions such as call and data communication.
  • the electronic device 200 employs an eSIM, ie: an embedded SIM card.
  • the wireless communication function of the electronic device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 200 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 250 may provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the electronic device 200 .
  • the mobile communication module 250 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
  • the mobile communication module 250 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 250 can also amplify the signal modulated by the modulation and demodulation processor, and then convert it into electromagnetic waves for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 250 may be provided in the processor 210 .
  • at least part of the functional modules of the mobile communication module 250 may be provided in the same device as at least part of the modules of the processor 210 .
  • the wireless communication module 260 can provide applications on the electronic device 200 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (infrared radiation, IR) technology.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared radiation
  • the wireless communication module 260 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 260 receives electromagnetic waves via the antenna 2 , modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 210 .
  • the wireless communication module 260 can also receive the signal to be sent from the processor 210 , perform frequency modulation on the signal, amplify the signal, and then convert it into an electromagnetic wave for radiation through the antenna 2 .
  • the antenna 1 of the electronic device 200 is coupled with the mobile communication module 250, and the antenna 2 is coupled with the wireless communication module 260, so that the electronic device 200 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • FIG. 2 do not constitute a specific limitation on the electronic device 200, and the electronic device 200 may also include more or less components than those shown in the figure, or combine some components, or separate some components components, or a different arrangement of components.
  • the combination/connection relationship between the components in FIG. 2 can also be adjusted and modified.
  • the software system of the electronic device may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present application take a layered architecture as an example, where the layered architecture may include a Harmony operating system or other operating systems.
  • the voice wake-up method provided by the embodiment of the present application may be applicable to a terminal integrated with the foregoing operating system.
  • FIG. 2 is the hardware structure of the electronic device to which the embodiment of the present application is applied.
  • the embodiment of the present application provides a voice wake-up method, which can utilize a location map including the user's location and the device's location. And the face orientation of the user collected by the device can accurately determine the device the user wants to wake up, and improve the accuracy of the directional device wake-up result in the multi-device scenario.
  • FIG. 3 it is a schematic diagram of a multi-device scenario to which this embodiment of the present application is applied.
  • the electronic device 10 , the electronic device 20 and the electronic device 30 are all sound pickup devices with multi-microphone arrays, and the same wake-up word is preset, for example, the wake-up word is "Little Art”.
  • the electronic device 10 When the user speaks "Xiaoyi Xiaoyi", the electronic device 10, the electronic device 20 and the electronic device 30 can all collect or receive the voice.
  • the electronic device 30 can use the wake-up voice to determine the position of the user in the device map obtained by pre-training; and a location map including the user's location, to determine which of the electronic device 10 , the electronic device 20 and the electronic device 30 the target device the face is facing is.
  • FIG. 3 is only an example of a multi-device scenario, and the embodiment of the present application does not limit the number of electronic devices in the multi-device scenario, nor does it limit the pre-set wake-up words in the electronic devices.
  • the electronic device 30 may not collect the face image, but obtain the face image collection result from other devices (such as the electronic device 10 or the electronic device 20 ).
  • 30 may be a central device with strong data processing capabilities, such as a smart speaker or a smart screen in a smart home scenario.
  • the electronic device 30 has a face image acquisition function and is the central device for description.
  • a voice wake-up method according to an embodiment of the present application will be specifically described for the voice wake-up method according to the embodiment of the present application.
  • the method flow specifically includes the following steps.
  • the electronic device 10 In steps 401a to 401c, the electronic device 10, the electronic device 20, and the electronic device 30 all collect ambient sounds in real time, and convert the collected ambient sounds into audio.
  • the multi-microphone array of the electronic device 10 collects ambient sounds and converts the collected ambient sounds into audio
  • the multi-microphone array of the electronic device 20 collects ambient sounds and converts the collected ambient sounds into audio
  • the multi-microphone array of the device 30 collects ambient sound and converts the collected ambient sound into audio.
  • the user issues a voice wake-up command of "Xiaoyi Xiaoyi" facing the smart screen shown in Converted to audio, so the sound collected by smart screens, smart speakers and smart switches will include the user's wake-up voice.
  • the electronic device 10 In steps 402a to 402c, the electronic device 10, the electronic device 20 and the electronic device 30 all perform wake word detection on the generated audio.
  • the electronic device 10, the electronic device 20, and the electronic device 30 can perform one-dimensional convolution on the data in each sliding window in the audio collected by their own devices, and extract the characteristics of different frequency bands in the data, so that when the audio recognizes an audio segment that is consistent with the user's preset voice characteristics, it means that the audio segment includes a wake-up word; otherwise, it does not include a wake-up word.
  • steps 403a to 403b when the electronic device 10 and the electronic device 20 detect the wake-up word, both send a wake-up message to the electronic device 30, and send the wake-up message with audio information generated by their own devices.
  • the electronic device 10 when the electronic device 10 detects the wake-up word, the electronic device 10 sends a first wake-up message and audio information to the electronic device 30, where the first wake-up message is used to request confirmation of whether to wake up the electronic device 10; the electronic device 20 detects the wake-up The electronic device 20 sends a second wake-up message and audio information to the electronic device 30 when the word is activated, and the second wake-up message is used to request confirmation whether to wake up the electronic device 20 .
  • the audio information may include all data of the audio, or the audio information may include information such as arrival time and phase related to the voice wake-up command.
  • the information of the first audio generated by the electronic device 30 includes: the first time of arrival when the voice wake-up command reaches the first microphone, the second time of arrival when the voice wake-up command reaches the second microphone, and the voice The wake-up command reaches the first phase of the first microphone, and the voice wake-up command reaches the second phase of the second microphone.
  • the first arrival time refers to the time when the first microphone picks up sound to the voice wake-up command
  • the second arrival time refers to the first time when the second microphone picks up the voice wake-up command.
  • the electronic device 20 includes a third microphone and a fourth microphone; the information of the second audio generated by the electronic device 20 includes: the voice wake-up command reaches the third arrival time of the third microphone, the voice wake-up command reaches the fourth arrival time of the fourth microphone, And the voice wake-up command reaches the third phase of the third microphone, and the voice wake-up command reaches the fourth phase of the fourth microphone.
  • the third arrival time refers to the time when the third microphone picks up the voice at the earliest to the voice wake-up command
  • the fourth arrival time refers to the first time when the fourth microphone picks up the voice wake-up command.
  • the electronic device 10 , the electronic device 20 , and the electronic device 30 may further include other microphones, and the number of microphones is not limited in the embodiment of the present application, and other microphones may also collect sound according to the above method.
  • the electronic device 10 or the electronic device 20 does not detect the wake-up word, there is no need to send a wake-up message to the electronic device 30 , and the electronic device 10 or the electronic device 20 only needs to send audio information to the electronic device 30 .
  • the electronic device 20 does not detect a wake-up word, it does not need to send a wake-up message to the electronic device 30 , and the electronic device 20 only needs to send audio information to the electronic device 30 , which is not shown one by one in this embodiment.
  • step 403c when the electronic device 30 also detects the wake-up word, the third wake-up message is also generated, otherwise, the third wake-up message is not generated.
  • the third wake-up message is used to request confirmation whether to wake up the electronic device 30 .
  • Step 404 the electronic device 30 determines the relative position of the user in the device map according to the pre-trained device map and the audios collected by any two electronic devices in the electronic device 10, the electronic device 20 and the electronic device 30, thereby generating an image including the user Location map of the location.
  • the information between the electronic devices is synchronized based on a multi-device interconnection technology (such as HiLink (a multi-device interconnection technology)).
  • the electronic device 10, the electronic device 20, and the electronic device 30 can be connected to the same local area network.
  • electronic device 10 electronic device 20 and electronic device 30 can be pre-bound with the same user account (such as a HUAWEI ID), or electronic device 10, electronic device 20 and electronic device 30 can be bound with different users account, and different user accounts have a binding relationship (such as pre-binding a family member's user account, that is, authorizing one's own device to connect with the family's device) to ensure secure communication between devices.
  • a binding relationship such as pre-binding a family member's user account, that is, authorizing one's own device to connect with the family's device
  • the first microphone and the second microphone of the electronic device 30 both collect sound, and record the information of the first audio.
  • the electronic device 30 may determine the electronic device 30 and the electronic device 30 according to the phase difference between the first phase of the voice wake-up command reaching the first microphone of the electronic device 30 and the second phase of the voice wake-up command reaching the second microphone of the electronic device 30 .
  • the direction angle between the users; the electronic device 30 obtains the information of the second audio from the electronic device 20, so the electronic device 30 can determine the electronic device 20 and the user according to the phase difference between the third phase and the fourth phase of the electronic device 20 direction angle between.
  • the electronic device 30 can determine the first relative distance between the electronic device 30 and the user according to the time difference between the first arrival time and the second arrival time of the electronic device 30, and furthermore, the electronic device 30 can determine the first relative distance between the electronic device 30 and the user according to the time difference between the first arrival time and the second arrival time of the electronic device 30.
  • the time difference between the third arrival time and the fourth arrival time of the electronic device 20 determines the second relative distance between the electronic device 20 and the user. In this way, the user position can be determined by combining the first azimuth angle, the second azimuth angle, the first relative distance and the second relative distance.
  • point A refers to the position of the electronic device 20 in the pre-trained device map
  • point B refers to the position of the electronic device 30 in the pre-trained device map
  • ⁇ A is The first azimuth angle of the electronic device 20 relative to the user
  • ⁇ B is the second azimuth angle of the electronic device 30 relative to the user
  • PA is the second relative distance of the electronic device 20 relative to the user
  • PB is the first relative distance of the electronic device 30 relative to the user. a relative distance.
  • the electronic device 30 can also determine the direction angle between the electronic device 20 and the user according to the phase difference between different microphones of the electronic device 10 reached by the voice wake-up command.
  • TDoA a method for localization using time difference
  • MUSIC a method for sound source localization
  • the embodiments of the present application are not limited.
  • the user location determined in the embodiment of the present application refers to a relative location. For example, the user to be located is due south of the electronic device 30, and the distance between the user and the electronic device 30 is 1 meter.
  • step 405 the electronic device 20 collects an image of the user and detects the orientation of the face.
  • the user is facing the smart screen in Figure 1 and sends out a wake-up voice of "Xiaoyi Xiaoyi", and the camera on the smart screen takes pictures or videos of the user.
  • the smart screen analyzes the face image to determine that the user's face is facing the smart screen.
  • the user is facing the smart speaker next to the smart screen in Figure 1 and sends out a wake-up voice of "Xiaoyi Xiaoyi", and the camera on the smart screen takes pictures or videos of the user.
  • the smart screen determines that the user's face is facing the first azimuth (for example, the first azimuth is the front left of the user).
  • this embodiment takes the smart screen as the main control device (or the central device) as an example for description.
  • the smart screen has an image acquisition function, so the user image collected by the smart screen is preferentially used for face orientation detection. If the main control device (or the central device) does not have the image acquisition function, the user image may also be acquired from other electronic devices with the face acquisition function, and then analyzed based on the acquired user image. The examples are not shown one by one here.
  • Step 406 the electronic device 30 determines that the target device to which the user's face faces is the electronic device 10 according to the face orientation and the location map including the user's location.
  • FIG. 6B the location map determined by the electronic device 30 and corresponding to the multi-device scenario shown in FIG. 3 is shown in FIG. 6B .
  • the face of the user faces the corresponding
  • the field of view is the range shown by the angle ⁇ shown in the figure, and the electronic device 10 exists in this range.
  • Step 407a when the electronic device 30 determines that the target device is the electronic device 10, it sends a wake-up permission instruction to the electronic device to instruct the electronic device 10 to respond to the user's wake-up voice.
  • this embodiment may further include the following steps: step 407b, the electronic device 30 may further indicate that the electronic device 30 itself is not to be awakened when it is determined that the target device is the electronic device 10, and step 407c, the electronic device 30 The device 30 sends a wake-up prohibition instruction to the electronic device 20, where the wake-up prohibition instruction is used to instruct the electronic device 20 not to respond to the user's wake-up voice.
  • the electronic device 30 may not respond to the electronic device 30 itself and the electronic device 20 when it is determined that the target device is the electronic device 10, so that the electronic device 30 and the electronic device 20 also Does not respond to the user's wake-up voice.
  • the first electronic device receives the user's voice wake-up command; the first electronic device 30 obtains the user's image from other devices locally, and detects the user's face orientation; then according to the first electronic device and at least one The relative position of the second electronic device (such as the electronic device 10 and the electronic device 20), the user's position (such as the P point position) and the orientation of the user's face, determine the position of the user's face from the first electronic device and at least one second electronic device.
  • the facing target device eg, the target device is the electronic device 10 ), and instruct the target device to respond to the voice wake-up command.
  • the above method can effectively improve the problem of "multiple responses with one call” or “multiple calls with one call”, so that when the user issues a wake-up command to the electronic device 10, the electronic device 10 will make a voice interactive response to it. , and other devices will not respond.
  • the electronic devices in the multi-device scenario need to have an image capture function, that is, as long as one device in the multi-device scenario has an image capture function and can capture a face image, it can be combined with the location Maps and face images determine the device that the user is currently pointing to, and the pointed device may not have an image acquisition function at all, so to a certain extent, this method has wider application scenarios.
  • the current position of the user can still be relocated according to the methods shown in the above steps 401a to 404, so that the collected people can still be combined Face image, accurately locate the user's position after moving.
  • the electronic device 30 can locate the user's position from the position P1 to the position P2 according to the above method. It can be seen that this method can locate the current position of the vocal user in real time.
  • the target device can also be determined to be the electronic device 20 shown in FIG. 6C in combination with the currently collected face image and the current position P2 of the user. It can be seen that, according to the above method, the awakened device can always be located in a directional manner, and is not limited by whether the user's position changes.
  • the candidate device and the user's position can be further combined.
  • the relative distance between them determines the target device, that is, the candidate device with the shortest relative distance is selected from the two or more candidate devices as the target device. That is to say, the electronic device 30 calculates the relative distance between the candidate device within the face orientation range and the user's position, and then determines the wake-up priority of each candidate device according to the size of the relative distance. The device with the smaller relative distance has the wake-up priority.
  • the electronic device 30 can directionally wake up the candidate device with the highest priority.
  • the electronic device 30 can locate the user’s position from P1 to P2 according to the above method, and then combine the currently collected face image and the user’s face image.
  • candidate devices within the visual field corresponding to the face orientation are determined to include electronic device 20 (smart speaker as shown in the figure) and electronic device 40 (smart alarm clock as shown in the figure).
  • the electronic device 30 determines that the relative distance between the electronic device 20 and P2 is D2, and determines that the relative distance between the electronic device 40 and P2 is D1. Since D1 is greater than D2, the wake-up priority of the electronic device 20 is high, Therefore, the electronic device 30 determines that the electronic device 10 is the target device to be awakened.
  • a pre-trained device map is required, that is, a relative position map of each device in a multi-device scenario needs to be constructed first, and then the relative position of the user in the device map can be determined.
  • an embodiment of the present application provides a method for training a device map, which reversely deduces the relative positions of a plurality of sound pickup devices by performing voice analysis on a user's historical wake-up speech. The construction of the device map is exemplarily given below with reference to FIGS. 7 to 9 .
  • FIG. 7 it is a schematic diagram of a multi-device scenario provided by an embodiment of the present application.
  • a plurality of sound pickup devices are deployed, wherein the main device with the image acquisition module is the smart screen device 71 in the figure, and a plurality of sound pickup devices 72a to the sound pickup device are also deployed in the space where the smart screen device 71 is located. 72f.
  • multiple speakers are used to indicate the sound pickup device 72a to the sound pickup device 72f.
  • the speakers can also be replaced with devices such as smart alarm clocks, smart cameras, and smart switches. Again, one by one is illustrated with an illustration. In the figure, only a sound box is used to indicate the sound pickup devices around the smart screen device 71 .
  • the user may wake up any sound pickup device at any position in the scene.
  • the user may sit on the sofa and call "Xiaoyi Xiaoyi” to wake up the smart screen device.
  • the user calls "Xiaoyi Xiaoyi” facing the sound pickup device 72a (such as a speaker) to wake up the sound pickup device 72a.
  • each pickup device records the user's historical wake-up voice and synchronizes it to the central device.
  • the central device can be the smart screen device shown in Figure 7, or it can be a smart speaker or router.
  • the sounding position when the user calls "Xiaoyi Xiaoyi" for the first time is the sounding point P1
  • the sounding position when the user calls "Xiaoyi Xiaoyi” for the second time is the sounding point P2.
  • the pickup device 72a and the pickup device 72b are awakened by two wake-up voices as an example.
  • the sound pickup device 72a and the sound pickup device 72b convert the collected sounds into audio and then synchronize to the central device.
  • the central device can perform voiceprint analysis on the sounds collected by the sound pickup device 72a and the sound pickup device 72b respectively.
  • the audio device 72a and the pickup device 72b can perform one-dimensional convolution on the data in each sliding window in the audio collected by their own devices, and extract the voiceprint features of different frequency bands in the data, so that when the audio is identified from the audio with When an audio clip with the same voiceprint characteristics of the wake-up voice is preset, it means that the audio clip includes the wake-up word issued by the user; otherwise, the wake-up word issued by the user is not included. As shown in FIG.
  • the central device can further The audio is filtered, and the audio that overlaps with the audio in the same period is filtered out, so as to improve the accuracy of the calculation result of the device map.
  • a polar coordinate system A is established on the sound pickup device 72a
  • a polar coordinate system B is established on the sound pickup device 72b
  • a rectangular coordinate system C is established with two points AB as the X axis.
  • ⁇ A1 is the angle between the sounding point P1 to point A and the x-axis of the rectangular coordinate system C
  • ⁇ B1 is the sounding point P1 to point B and the rectangular coordinate system C.
  • ⁇ A2 is the angle between the sound point P2 to point A and the x-axis of the rectangular coordinate system C
  • ⁇ B2 is the sound point P2 to point B and the x-axis of the rectangular coordinate system C
  • the included angle of the axis dA2 is the distance from the sounding point P2 to point A
  • dB2 is the distance from the sounding point P2 to point B.
  • the distance dA1-dB1 can be calculated by using the time difference ⁇ 1 between the wake-up signal sent by the user at point P1 and the polar coordinate system A and the polar coordinate system B (that is, the sound pickup device 72a and the sound pickup device 72b ). poor, that is:
  • dA1 is the distance from sound point P1 to point A
  • dB1 is the distance from sound point P1 to point B
  • ⁇ 1 is the time difference between the wake-up signal sent by the user at point P1 and polar coordinate system A and polar coordinate system B, namely ⁇ 1 is the time difference between the wake-up signal sent by the user and two different electronic devices
  • v is the speed of sound transmission
  • a1 is the distance difference between dA1-dB1, that is, a1 is the distance difference between the user's voice position and two different electronic devices.
  • ⁇ A1 is the angle between the sounding point P1 and point A and the x-axis of the rectangular coordinate system C
  • ⁇ B1 is the angle between the sounding point P1 and point B and the x-axis of the rectangular coordinate system C
  • b1 is the distance from P1 to point A The angle between P1 and the distance from point B.
  • ⁇ A1 and ⁇ B1 can be obtained by using the time difference and phase of the wake-up speech reaching different microphones as shown in FIG. 6A , and the calculation process is not repeated here.
  • x1 is an unknown number, which refers to dA1.
  • x 2 is an unknown number, which refers to dA2.
  • ⁇ An is the angle between the sounding point Pn to point A and the x-axis of the Cartesian coordinate system C
  • ⁇ Bn is the sounding point Pn
  • dAn is the distance from the sound point Pn to point A
  • dBn is the distance from the sound point Pn to point B.
  • x n is an unknown number, which refers to dAn.
  • a number of equations can be created simultaneously, and by solving algorithms (such as numerical solutions), the distance between devices and the angle between devices of the optimal device can be obtained, and the device can be calculated.
  • the relative positions between exemplarily, are shown in FIG. 8C . That is to say, the side length calculated by the distance divergence attenuation formula is brought into the above equation, and the distance between the devices and the angle between the devices are searched through the grid search method (Grid search), and the least squares method is obtained to minimize the loss function of all equations.
  • the device spacing and the device angle that minimize the loss function are the precise device relative positions obtained by offline learning at night.
  • the optimal relative position between the devices can be obtained.
  • the device With the increase of the accumulated user's historical wake-up voice, the device will always record the dosing data when the user calls Xiaoyi, and the device can be used at night time. Use the saved data for offline learning and training to achieve the effect of getting smarter the more you use it, and the smaller the error calculated by the exhaustive search method.
  • a device map training method is used to locate the relative positions of each device in a multi-device scenario.
  • the method flow specifically includes the following steps.
  • the electronic device 10 In steps 901a to 901c, the electronic device 10, the electronic device 20, and the electronic device 30 all collect ambient sounds in real time, and convert the collected ambient sounds into audio.
  • the multi-microphone array of the electronic device 10 collects ambient sounds and converts the collected ambient sounds into audio
  • the multi-microphone array of the electronic device 20 collects ambient sounds and converts the collected ambient sounds into audio
  • the multi-microphone array of the device 30 collects ambient sound and converts the collected ambient sound into audio.
  • the user sends out the wake-up voice of "Xiaoyi Xiaoyi" facing the smart screen shown in Figure 1. Because the smart screen, smart speakers and smart switches all collect the sound of the surrounding environment in real time and convert it into audio , smart speakers and smart switches will include the user's wake-up voice.
  • steps 902 a to 902 b the electronic device 10 and the electronic device 20 synchronize the generated audio to the electronic device 30 .
  • the electronic device 10 synchronizes the generated audio to the electronic device 30 ; the electronic device 20 synchronizes the generated audio to the electronic device 30 . It should be noted that the electronic device 10 and the electronic device 20 may synchronize the generated audio to the electronic device 30 at regular intervals, or both may synchronize the generated audio to the electronic device 30 at a fixed time point (eg, one o'clock in the morning). This application does not limit this.
  • this embodiment is described by taking a multi-device scenario including the electronic device 10 , the electronic device 20 , and the electronic device 30 as an example, and the electronic device 30 is the central device. In other possible embodiments, other electronic devices may also be included, or the central device may be other electronic devices, and so on. For other electronic devices, reference may also be made to the above-mentioned electronic devices 10 to 30, which will not be described one by one here. .
  • Step 903 the electronic device 30 analyzes the audio generated by the electronic device 10, the electronic device 20, and the electronic device 30 itself according to the method shown in FIG.
  • the calculation process of the relative positions between the multiple devices relies on the information of the historical wake-up voices.
  • the above method can continuously use the historical user wake-up voice in the recent period of time to locate the device, even if the location of the device changes during the use of the device, for example, the smart speaker is moved from the living room to the dining room, according to the above method, By accumulating the information of the historical wake-up voice within a period of time after the smart speaker is moved, the latest relative position between the devices can still be updated. The feeling to the user is that the device gets smarter the longer it is used.
  • this embodiment does not require the use environment of the sound pickup device, that is to say, even if the device is in an environment with noise interference, it is possible to reversely deduce the relationship between multiple sound pickup devices through multiple voice recognition of the user's voice. relative position between.
  • a multi-layer perception capability layer of the device is constructed, as shown in FIG. 10 , which specifically includes: a basic perception capability layer, a second layer Perceptual ability layer, high-level perception ability layer.
  • the basic perception capability layer refers to the capability after the existing functions of some devices or software are simply encapsulated by the intelligent perception framework.
  • the chip layer provides the face orientation capability, the audio source positioning capability for multi-microphone devices, and the incoming and outgoing power status monitoring capability provided by the Android layer.
  • the second-layer perception capability layer refers to the computing result obtained after processing the basic perception capability through calculation, and the capability after being encapsulated by the framework.
  • the device-to-device localization capability and the user's voice localization capability are all calculation results obtained by processing the sound localization data (basic perception capability) reported by different underlying devices.
  • the high-level perception capability layer refers to the complex calculation results reported to the upper layer calculated through complex calculations, fusions, models, and rules. For example, the directional awakening ability, through the fusion of multiple basic perception capabilities and second-level perception capabilities, the calculated advanced capabilities are finally combined.
  • a fence mechanism is also provided, which means that a virtual fence encloses a virtual boundary.
  • the upper-layer application of the mobile phone can receive automatic notifications and warnings.
  • Fence thinking is the core of intelligent perception, and each ability will be triggered with a corresponding fence.
  • Capability and fences form a platform that provides the upper layer with the ability to perceive user behavior, and upper-layer applications can receive reports when users enter and exit specific events.
  • FIG. 11 which is a schematic diagram of software modules in a multi-device scenario provided by an embodiment of the present application
  • the modules in the slave device 1 to the slave device n and the master device can cooperate to implement the device map training method provided by the embodiment of the present application.
  • the master device refers to the above-mentioned central device for locating the device position, such as the electronic device 30, and the slave device refers to the above-mentioned device for collecting wake-up voice, such as the electronic device 10 and the electronic device 20.
  • each slave device includes an audio collection module 1101
  • the master device includes an audio collection module 1101 , an audio processing module 1102 , and an audio identification module 1103 .
  • the audio collection module 1101 is used to collect the sound in the environment by using a multi-microphone array, and convert the collected sound into audio.
  • the audio collection module 1101 of each device may send the audio corresponding to each sampling period to the audio processing module 1102 of the main device (eg, the electronic device 30 ) for processing.
  • the main device may be the smart screen device shown in FIG. 3 , or may be a device with strong computing power in a smart home scenario, such as a router or a smart speaker.
  • the audio processing module 1102 of the main device is used to preprocess the audio corresponding to each sampling period, such as channel conversion, smoothing, noise reduction, etc., so that the audio recognition module 1103 can perform subsequent wake-up speech detection.
  • the audio identification module 1103 of the main device is used to identify the wake-up signal for the audio of each sampling period after preprocessing, and identify the information of the same wake-up signal from the audio of different sampling periods, such as the arrival time of the audio where the wake-up signal is located, etc. .
  • the audio recognition module 1103 sends the information identifying the same wake-up signal to the device map calculation module 1104 .
  • the device map calculation module 1104 is configured to calculate the relative positions between different devices according to the arrival time of the same wake-up voice to different devices and the arrival time of the wake-up signal to different microphones of the same device.
  • each slave device in the scenario shown in FIG. 12 may further include a user position positioning module 1105 , a face orientation recognition module 1106 and a directional wake-up module 1107 .
  • the user position positioning module 1105 is used to use the multi-microphone array in a single device to calculate the arrival time of the wake-up voice currently sent by the user to the multi-microphone device, and the arrival time of the multi-microphone device based on the wake-up voice collected by at least two devices, Calculate the position of the user's voice.
  • the face orientation recognition module 1106 is used to calculate the face orientation of the user in the location map by using the face orientation recognition capability of the chip layer within the range of the wide-angle camera of the camera of the device.
  • the directional wake-up module 1107 is used to obtain the target device to be woken up by using the device map calculated and processed by the device map calculation module 1104, the user position located by the user position location module 1105, and the face orientation.
  • FIG. 11 is only an example.
  • the electronic device of the embodiments of the present application may have more or less modules than the electronic device shown in the figure, two or more modules may be combined, and so on.
  • the various modules shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the audio processing module 1102, the audio recognition module 1103, the device map calculation module 1104, the user position positioning module 1105, the face orientation recognition module 1106 and the directional wake-up module 1107 shown in FIG. 11 can be integrated in FIG. 2 Among the one or more processing units in the processor 210 shown, for example, the audio processing module 1102, the audio recognition module 1103, the device map calculation module 1104, the user location positioning module 1105, the face orientation recognition module 1106, and the directional wake-up Some or all of the modules 1107 may be integrated into one or more processors such as an application processor, a special purpose processor, or the like. It should be noted that, the dedicated processor in the embodiment of the present application may be a DSP, an application specific integrated circuit (application specific integrated circuit, ASIC) chip, or the like.
  • ASIC application specific integrated circuit
  • FIG. 12 shows a device 1200 provided by the present application.
  • Device 1200 includes at least one processor 1210 , memory 1220 and transceiver 1230 .
  • the processor 1210 is coupled with the memory 1220 and the transceiver 1230.
  • the coupling in this embodiment of the present application is an indirect coupling or communication connection between devices, units, or modules, which may be electrical, mechanical, or other forms for the device , information exchange between units or modules.
  • the connection medium between the transceiver 1230, the processor 1210, and the memory 1220 is not limited in the embodiments of the present application.
  • the memory 1220 , the processor 1210 , and the transceiver 1230 may be connected through a bus, and the bus may be divided into an address bus, a data bus, a control bus, and the like.
  • the memory 1220 is used to store program instructions.
  • the transceiver 1230 is used to receive or transmit data.
  • the processor 1210 is configured to invoke the program instructions stored in the memory 1220, so that the device 1200 executes the steps performed by the electronic device 30 or the steps performed by the electronic device 10 or the electronic device 20 in the above method.
  • the processor 1210 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which can implement Alternatively, each method, step, and logic block diagram disclosed in the embodiments of the present application are executed.
  • a general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the memory 1220 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), Such as random-access memory (random-access memory, RAM).
  • Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory in this embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data.
  • the device 1200 can be used to implement the methods shown in the embodiments of the present application, and the relevant features can be referred to above, which will not be repeated here.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage medium can be any available medium that a computer can access.
  • computer readable media may include RAM, ROM, electrically erasable programmable read only memory (EEPROM), compact disc read-Only memory (CD- ROM) or other optical disk storage, magnetic disk storage media, or other magnetic storage devices, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and that can be accessed by a computer. also. Any connection can be appropriately made into a computer-readable medium.
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • CD- ROM compact disc read-Only memory
  • Any connection can be appropriately made into a computer-readable medium.
  • disks and discs include compact discs (CDs), laser discs, optical discs, digital video discs (DVDs), floppy disks, and Blu-ray discs, wherein Disks usually reproduce data magnetically, while discs use lasers to reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Navigation (AREA)

Abstract

L'invention concerne un procédé de réveil vocal et un dispositif électronique, se rapportant au domaine de l'intelligence artificielle des terminaux. Le procédé comprend comme suit : par l'utilisation d'un son ambiant acquis par chaque dispositif, d'une part, des positions relatives d'un utilisateur et des multiples dispositifs dans un espace peuvent être positionnées pour construire une carte de positions ; et d'autre part, une orientation du visage de l'utilisateur peut être acquise par un dispositif maître ayant un module d'acquisition d'image dans les multiples dispositifs. Ainsi, un dispositif que l'utilisateur souhaite réveiller peut être déterminé en combinant la carte de positions et l'orientation du visage de l'utilisateur acquises par le dispositif maître. Le procédé facilite l'amélioration de la précision de réveil de dispositif dans un environnement ayant de multiples dispositifs, et l'effet de l'application est relativement bon.
PCT/CN2021/133119 2020-11-27 2021-11-25 Procédé de réveil vocal et dispositif électronique WO2022111579A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011362525.1 2020-11-27
CN202011362525.1A CN114566171A (zh) 2020-11-27 2020-11-27 一种语音唤醒方法及电子设备

Publications (1)

Publication Number Publication Date
WO2022111579A1 true WO2022111579A1 (fr) 2022-06-02

Family

ID=81711663

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133119 WO2022111579A1 (fr) 2020-11-27 2021-11-25 Procédé de réveil vocal et dispositif électronique

Country Status (2)

Country Link
CN (1) CN114566171A (fr)
WO (1) WO2022111579A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024000836A1 (fr) * 2022-06-29 2024-01-04 歌尔股份有限公司 Procédé et appareil de commande vocale pour dispositif domestique, dispositif portable et support de stockage

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273850A (zh) * 2022-09-28 2022-11-01 科大讯飞股份有限公司 一种自主移动设备语音控制方法及系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106772247A (zh) * 2016-11-30 2017-05-31 努比亚技术有限公司 一种终端和声源定位方法
CN107465986A (zh) * 2016-06-03 2017-12-12 法拉第未来公司 使用多个麦克风检测和隔离车辆中的音频的方法和装置
CN110415695A (zh) * 2019-07-25 2019-11-05 华为技术有限公司 一种语音唤醒方法及电子设备
US20200061822A1 (en) * 2017-04-21 2020-02-27 Cloundminds (Shenzhen) Robotics Systems Co., Ltd. Method for controlling robot and robot device
US20200072937A1 (en) * 2018-02-12 2020-03-05 Luxrobo Co., Ltd. Location-based voice recognition system with voice command
CN111176744A (zh) * 2020-01-02 2020-05-19 北京字节跳动网络技术有限公司 电子设备控制方法、装置、终端及存储介质
CN111312295A (zh) * 2018-12-12 2020-06-19 深圳市冠旭电子股份有限公司 一种全息声音的记录方法、装置及录音设备
CN111369988A (zh) * 2018-12-26 2020-07-03 华为终端有限公司 一种语音唤醒方法及电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465986A (zh) * 2016-06-03 2017-12-12 法拉第未来公司 使用多个麦克风检测和隔离车辆中的音频的方法和装置
CN106772247A (zh) * 2016-11-30 2017-05-31 努比亚技术有限公司 一种终端和声源定位方法
US20200061822A1 (en) * 2017-04-21 2020-02-27 Cloundminds (Shenzhen) Robotics Systems Co., Ltd. Method for controlling robot and robot device
US20200072937A1 (en) * 2018-02-12 2020-03-05 Luxrobo Co., Ltd. Location-based voice recognition system with voice command
CN111312295A (zh) * 2018-12-12 2020-06-19 深圳市冠旭电子股份有限公司 一种全息声音的记录方法、装置及录音设备
CN111369988A (zh) * 2018-12-26 2020-07-03 华为终端有限公司 一种语音唤醒方法及电子设备
CN110415695A (zh) * 2019-07-25 2019-11-05 华为技术有限公司 一种语音唤醒方法及电子设备
CN111176744A (zh) * 2020-01-02 2020-05-19 北京字节跳动网络技术有限公司 电子设备控制方法、装置、终端及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024000836A1 (fr) * 2022-06-29 2024-01-04 歌尔股份有限公司 Procédé et appareil de commande vocale pour dispositif domestique, dispositif portable et support de stockage

Also Published As

Publication number Publication date
CN114566171A (zh) 2022-05-31

Similar Documents

Publication Publication Date Title
WO2022111579A1 (fr) Procédé de réveil vocal et dispositif électronique
RU2663937C2 (ru) Способ и устройство для управления полетом, а также электронное устройство
CN106782540B (zh) 语音设备及包括所述语音设备的语音交互系统
WO2021027267A1 (fr) Procédé et appareil d'interaction parlée, terminal et support de stockage
CN102467574B (zh) 移动终端及其元数据设置方法
US11343613B2 (en) Prioritizing delivery of location-based personal audio
US20130300546A1 (en) Remote control method and apparatus for terminals
JP2020520206A (ja) アプリケーションエコシステムを備える、ウェアラブルマルチメディアデバイス及びクラウドコンピューティングプラットフォーム
CN108668077A (zh) 摄像头控制方法、装置、移动终端及计算机可读介质
US20180103197A1 (en) Automatic Generation of Video Using Location-Based Metadata Generated from Wireless Beacons
US9288594B1 (en) Auditory environment recognition
CN110691300B (zh) 音频播放设备及用于提供信息的方法
WO2017063283A1 (fr) Système et procédé de commande de terminal intelligent embarqué
CN103858497A (zh) 用于提供基于位置的信息的方法和设备
KR20180081922A (ko) 전자 장치의 입력 음성에 대한 응답 방법 및 그 전자 장치
CN111477225A (zh) 语音控制方法、装置、电子设备及存储介质
CN105959587A (zh) 快门速度获取方法和装置
CN112735403B (zh) 一种基于智能音响的智能家居控制系统
CN107864430A (zh) 一种声波定向传播控制系统及其控制方法
CN107707816A (zh) 一种拍摄方法、装置、终端及存储介质
CN114299951A (zh) 一种控制方法及装置
CN110719545B (zh) 音频播放设备及用于播放音频的方法
US9733714B2 (en) Computing system with command-sense mechanism and method of operation thereof
US20180143867A1 (en) Mobile Application for Capturing Events With Method and Apparatus to Archive and Recover
US10726270B2 (en) Selecting media from mass social monitoring devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21897079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21897079

Country of ref document: EP

Kind code of ref document: A1