WO2021254131A1 - 一种语音唤醒的方法、电子设备、可穿戴设备和系统 - Google Patents

一种语音唤醒的方法、电子设备、可穿戴设备和系统 Download PDF

Info

Publication number
WO2021254131A1
WO2021254131A1 PCT/CN2021/097124 CN2021097124W WO2021254131A1 WO 2021254131 A1 WO2021254131 A1 WO 2021254131A1 CN 2021097124 W CN2021097124 W CN 2021097124W WO 2021254131 A1 WO2021254131 A1 WO 2021254131A1
Authority
WO
WIPO (PCT)
Prior art keywords
wearable device
user
electronic device
speaking
confidence
Prior art date
Application number
PCT/CN2021/097124
Other languages
English (en)
French (fr)
Inventor
王骅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US18/001,961 priority Critical patent/US20230239800A1/en
Priority to EP21824995.1A priority patent/EP4156177A4/en
Publication of WO2021254131A1 publication Critical patent/WO2021254131A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0251Power saving arrangements in terminal devices using monitoring of local events, e.g. events related to user activity
    • H04W52/0254Power saving arrangements in terminal devices using monitoring of local events, e.g. events related to user activity detecting a user operation or a tactile contact or a motion of the device
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3231Monitoring the presence, absence or movement of users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0261Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level
    • H04W52/0264Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level by selectively disabling software applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/02Terminal devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Definitions

  • This application relates to the field of terminals, and more specifically, to a voice wake-up method, electronic equipment, and wearable equipment.
  • the present application provides a voice wake-up method, electronic device, wearable device, and system, which help improve the accuracy of the electronic device when performing voice wake-up.
  • a system in a first aspect, includes an electronic device and a wearable device, the electronic device communicates with the wearable device through a short-range wireless connection, and the electronic device is used to collect voice signals in an environment;
  • the electronic device is also used to send a query request to the wearable device when the voice signal meets a preset condition, and the query request is used to request information that the user is talking;
  • the wearable device is used to send to the electronic device
  • the query result the query result includes the information that the user is speaking; the electronic device is also used to enter the awake state when it is determined that the user is speaking according to the information that the user is speaking.
  • the electronic device can query the wearable device for information that the user is speaking.
  • the wake-up operation is performed, thereby entering the wake-up state.
  • the preset condition is that the voice signal contains a wake-up word; or, the preset condition is that the voiceprint information of the voice signal matches the preset voiceprint information; or, the preset condition is This is because the voice signal contains a wake-up word and the voiceprint information of the voice signal matches the preset voiceprint information.
  • the information that the user is speaking includes data detected by the sensor of the wearable device, and the electronic device is also used to determine that the user is speaking based on the data detected by the sensor of the wearable device.
  • the characteristics of the electronic device having strong computing capabilities can be used, and the wearable device only needs to send the data detected by the sensor to the electronic device, and the electronic device will make a judgment. This helps to reduce the time delay of the electronic device in the voice wake-up process, and also helps to improve the accuracy of voice wake-up.
  • the information that the user is speaking is used to indicate the first confidence and the second confidence, and the first confidence is the confidence that the user wears the wearable device ,
  • the second confidence is the confidence that the user is speaking;
  • the electronic device is specifically configured to: when the first confidence is greater than or equal to a first preset value and the second confidence is greater than or equal to a second preset value To enter the wake-up state.
  • the electronic device may determine whether to perform the wake-up operation according to the first confidence level and the second confidence level indicated in the information that the user is speaking.
  • the electronic device can perform a wake-up operation when it is determined that the user is wearing the wearable device and the user is talking, thereby entering the wake-up state.
  • the use of wearable devices to assist electronic devices for voice wake-up helps to improve the accuracy of voice wake-up.
  • the wearable device includes an acoustic sensor, and the wearable device is specifically configured to: according to the human heartbeat or respiration detected by the acoustic sensor within a preset detection period The frequency of the sound signal determines the first degree of confidence; the second degree of confidence is determined according to the strength of the sound signal within the preset frequency range.
  • the wearable device may use the acoustic sensor to detect the sound signal of the human heartbeat or breathing, and determine the first confidence level and the second confidence level. In this way, the electronic device can determine whether the user is speaking based on the first confidence level and the second confidence level.
  • the use of wearable devices to assist electronic devices for voice wake-up helps to improve the accuracy of voice wake-up.
  • the information that the user is speaking may include the frequency at which the sound sensor of the wearable device detects the human heartbeat or breathing sound signal within a preset period, and the sound signal is within the preset frequency range Strength of.
  • the electronic device may determine the first confidence level and the second confidence level according to the information that the user is speaking. Therefore, the electronic device determines whether the user is speaking according to the first confidence level and the second confidence level.
  • the wearable device includes a PPG sensor using photoplethysmography, and the wearable device is specifically configured to: according to the PPG detected by the PPG sensor in a preset detection period The frequency of the signal determines the first degree of confidence; the second degree of confidence is determined according to the strength of the PPG signal in the preset frequency range.
  • the wearable device may use the PPG signal detected by the PPG sensor to determine the first confidence level and the second confidence level. In this way, the electronic device can determine whether the user is speaking based on the first confidence level and the second confidence level.
  • the use of wearable devices to assist electronic devices for voice wake-up helps to improve the accuracy of voice wake-up.
  • the information that the user is speaking may include the frequency of the PPG signal detected by the PPG sensor of the wearable device during the preset detection period and the strength of the PPG signal within the preset frequency range.
  • the electronic device may determine the first confidence level and the second confidence level according to the information that the user is speaking. Therefore, the electronic device determines whether the user is speaking according to the first confidence level and the second confidence level.
  • the wearable device includes an acoustic sensor, and the wearable device is specifically configured to: collect a human heartbeat or breathing sound signal through the acoustic sensor; Input the first model, the second model, and the third model to obtain the first confidence level and the second confidence level.
  • the first model is obtained by collecting the noise signal when the user is not wearing the wearable device
  • the second model The third model is obtained by collecting the sound signal when the user is wearing the wearable device and not speaking
  • the third model is obtained by collecting the sound signal when the user is wearing the wearable device and is speaking.
  • the wearable device can input the sound signal collected by the acoustic sensor into the first model, the second model, or the third model by means of machine learning, so as to obtain the first confidence level and the second confidence level.
  • the electronic device can determine whether the user is speaking based on the first confidence level and the second confidence level.
  • the use of wearable devices to assist electronic devices for voice wake-up helps to improve the accuracy of voice wake-up.
  • the first model, the second model, and the third model are stored in the electronic device.
  • the information that the user is speaking includes the human heartbeat or breathing sound signal collected by the sound sensor of the wearable device.
  • the electronic device can input the sound signal into the first model, the second model, and the third model to obtain the first confidence level and the second confidence level. Therefore, the electronic device determines whether the user is speaking according to the first confidence level and the second confidence level.
  • the wearable device includes a PPG sensor, and the wearable device is specifically configured to: collect a PPG signal through the PPG sensor; input the PPG signal into the first model, In the second model and the third model, the first confidence level and the second confidence level are obtained.
  • the first model is obtained by collecting the noise signal when the user is not wearing the wearable device
  • the second model is obtained by collecting the user wearing the wearable device.
  • the PPG signal when the wearable device is not talking is obtained
  • the third model is obtained by collecting the PPG signal when the user is wearing the wearable device and is talking.
  • the wearable device can input the PPG signal collected by the PPG sensor into the first model, the second model, or the third model by means of machine learning, so as to obtain the first confidence level and the second confidence level.
  • the electronic device can determine whether the user is speaking based on the first confidence level and the second confidence level.
  • the use of wearable devices to assist electronic devices for voice wake-up helps to improve the accuracy of voice wake-up.
  • the first model, the second model, and the third model are stored in the electronic device.
  • the information that the user is speaking includes the PPG signal collected by the PPG sensor of the wearable device.
  • the electronic device can input the PPG signal into the first model, the second model, and the third model to obtain the first confidence level and the second confidence level. Therefore, the electronic device determines whether the user is speaking according to the first confidence level and the second confidence level.
  • the account logged in on the electronic device is associated with the account logged in on the wearable device.
  • the account logged in on the electronic device and the account logged in on the wearable device may be the same account; or, the account logged in on the electronic device and the account logged in on the wearable device are the same
  • the account in the family group; or, the account logged in on the wearable device may be an account authorized by the account logged in on the electronic device.
  • a voice wake-up method is provided.
  • the method is applied to an electronic device that communicates with a wearable device through a short-range wireless connection.
  • the method includes: the electronic device collects voice signals in an environment When the voice signal meets a preset condition, the electronic device sends a query request to the wearable device, the query request is used to request information that the user is talking; the electronic device receives the query result sent by the wearable device, the query The result includes the information that the user is speaking; when it is determined that the user is speaking according to the information that the user is speaking, the electronic device enters the awake state.
  • the electronic device can query the wearable device for information that the user is speaking.
  • the wake-up operation is performed, thereby entering the wake-up state.
  • the preset condition is that the voice signal contains a wake-up word; or, the preset condition is that the voiceprint information of the voice signal matches the preset voiceprint information; or, the preset condition is This is because the voice signal contains a wake-up word and the voiceprint information of the voice signal matches the preset voiceprint information.
  • the information that the user is speaking includes data detected by the sensor of the wearable device.
  • the electronic device determines that the user is speaking according to the information that the user is speaking, including: the electronic device
  • the data detected by the sensor confirms that the user is speaking.
  • the information that the user is speaking is used to indicate that the user is speaking.
  • the information that the user is speaking is used to indicate the first confidence and the second confidence
  • the first confidence is the confidence that the user wears the wearable device
  • the second confidence is the confidence that the user is speaking.
  • the electronic device may determine whether to perform the wake-up operation according to the first confidence level and the second confidence level indicated in the information that the user is speaking.
  • the electronic device can determine that the user is wearing the wearable device and the user is speaking, and perform a wake-up operation, thereby entering the wake-up state.
  • the use of wearable devices to assist electronic devices for voice wake-up helps to improve the accuracy of voice wake-up.
  • the information that the user is speaking may include the frequency at which the sound sensor of the wearable device detects the human heartbeat or breathing sound signal within a preset period, and the sound signal is within the preset frequency range Strength of.
  • the method further includes: the electronic device determines the first confidence level according to the frequency of the human heartbeat or breathing sound signal detected by the acoustic sensor in a preset detection period; the electronic device is within a preset frequency range according to the sound signal The strength of, determines the second degree of confidence.
  • the information that the user is speaking may include the frequency of the PPG signal detected by the PPG sensor of the wearable device during the preset detection period and the strength of the PPG signal within the preset frequency range.
  • the method further includes: the electronic device determines the first confidence level according to the frequency of the PPG signal detected by the PPG sensor in the preset detection period; and the electronic device determines the first confidence level according to the strength of the PPG signal within the preset frequency range The second degree of confidence.
  • the first model, the second model, and the third model are stored in the electronic device.
  • the first model is obtained by collecting the noise signal when the user is not wearing the wearable device
  • the second model is obtained by collecting the sound signal when the user is wearing the wearable device and not speaking
  • the third model is obtained by collecting the user wearing the wearable device.
  • the voice signal is obtained when the device is wearing and speaking.
  • the information that the user is speaking includes the human heartbeat or breathing sound signal collected by the sound sensor of the wearable device.
  • the method further includes: the electronic device inputs the sound signal into the first model, the second model, and the third model to obtain the first confidence level and the second confidence level.
  • the first model, the second model, and the third model are stored in the electronic device.
  • the first model is obtained by collecting the noise signal when the user is not wearing the wearable device
  • the second model is obtained by collecting the PPG signal when the user is wearing the wearable device and not speaking
  • the third model is obtained by collecting the user wearing the wearable device.
  • the PPG signal is obtained when the device is wearing and talking.
  • the information that the user is speaking includes the PPG signal collected by the PPG sensor of the wearable device.
  • the method further includes: the electronic device inputs the PPG signal into the first model, the second model, and the third model to obtain the first confidence level and the second confidence level.
  • the account logged in on the electronic device is associated with the account logged in on the wearable device.
  • the account logged in on the electronic device and the account logged in on the wearable device may be the same account; or, the account logged in on the electronic device and the account logged in on the wearable device are the same
  • the account in the family group; or, the account logged in on the wearable device may be an account authorized by the account logged in on the electronic device.
  • a voice wake-up method is provided.
  • the method is applied to a wearable device that communicates with an electronic device through a short-range wireless connection.
  • the method includes: receiving a query request sent by the electronic device, the The query request is used to request information that the user is speaking; sending the query result to the electronic device, and the query result includes the information that the user is speaking.
  • the information that the user is speaking is used to indicate the first confidence and the second confidence, and the first confidence is the confidence that the user wears the wearable device , The second confidence is the confidence that the user is speaking.
  • the wearable device includes an acoustic sensor
  • the method further includes: detecting in a preset detection period according to the acoustic sensor Determine the first confidence level according to the frequency of the received human heartbeat or breathing sound signal; determine the second confidence level according to the strength of the sound signal within the preset frequency range.
  • the wearable device includes a PPG sensor
  • the method further includes: detecting in a preset detection period according to the PPG sensor Determine the first confidence level according to the frequency of the received PPG signal; determine the second confidence level according to the strength of the PPG signal within the preset frequency range.
  • the wearable device includes an acoustic sensor
  • the method further includes: collecting the human heartbeat or breathing sound through the acoustic sensor Signal; input the sound signal into the first model, the second model, and the third model to obtain the first confidence level and the second confidence level.
  • the first model collects noise signals when the user is not wearing the wearable device It is obtained that the second model is obtained by collecting sound signals when the user is wearing the wearable device and not speaking, and the third model is obtained by collecting sound signals when the user is wearing the wearable device and is speaking.
  • the wearable device includes a PPG sensor
  • the method further includes: collecting a PPG signal through the PPG sensor; The signal is input into the first model, the second model, and the third model to obtain the first confidence level and the second confidence level.
  • the first model is obtained by collecting the noise signal when the user is not wearing the wearable device
  • the second The model is obtained by collecting the PPG signal when the user is wearing the wearable device and not speaking
  • the third model is obtained by collecting the PPG signal when the user is wearing the wearable device and is speaking.
  • the account logged in on the electronic device is associated with the account logged in on the wearable device.
  • a voice wake-up device is provided, the device is included in an electronic device, and the device has the function of the electronic device in the foregoing second aspect and the possible implementation of the foregoing second aspect.
  • the function can be realized by hardware, or the corresponding software can be executed by hardware.
  • the hardware or software includes one or more modules or units corresponding to the above-mentioned functions.
  • a voice wake-up device is provided, the device is included in a wearable device, and the device has the function of the wearable device in the foregoing third aspect and the possible implementation of the foregoing third aspect.
  • the function can be realized by hardware, or the corresponding software can be executed by hardware.
  • the hardware or software includes one or more modules or units corresponding to the above-mentioned functions.
  • an electronic device including: one or more processors; a memory; and one or more computer programs.
  • one or more computer programs are stored in the memory, and the one or more computer programs include instructions.
  • the electronic device is caused to execute the voice wake-up method in any one of the possible implementations of the second aspect described above.
  • a wearable device including: one or more processors; a memory; and one or more computer programs.
  • one or more computer programs are stored in the memory, and the one or more computer programs include instructions.
  • the wearable device is caused to execute the voice wake-up method in any one of the possible implementations of the third aspect.
  • a chip system is provided.
  • the chip system is located in an electronic device.
  • the chip system includes a system-on-chip SOC.
  • the SOC is used to control a microphone to collect voice signals in the environment in which the electronic device is located; the SOC also When it is determined that the voice signal meets the preset condition, the wireless communication module is used to control the wireless communication module to send a query request to the wearable device, and the query request is used to request information that the user is talking; the SOC is also used to control the wireless communication module to receive
  • the query result sent by the wearable device includes the information that the user is speaking; the SOC is also used to enter the awake state when it is determined that the user is speaking according to the information that the user is speaking.
  • a chip system is provided, the chip system is located in a wearable device, the wearable device includes a SOC, the SOC is used to control a wireless communication module to receive a query request sent by an electronic device, and the query request is used to request Information that the user is talking; the SOC is also used to control the wireless communication module to send query results to the electronic device, and the query results include information that the user is talking.
  • a computer storage medium including computer instructions, which when the computer instructions run on an electronic device, cause the electronic device to execute the voice wake-up method in any one of the possible implementations of the second aspect; or, make it possible to The wearable device executes the voice wake-up method in any one of the possible implementations of the third aspect described above.
  • a computer program product which when the computer program product runs on an electronic device, causes the electronic device to execute the voice wake-up method in any one of the possible implementations of the second aspect; or, makes a wearable device Perform the voice wake-up method in any one of the possible implementations of the third aspect described above.
  • FIG. 1 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application.
  • Figure 2 is a set of graphical user interfaces provided by an embodiment of the present application.
  • Fig. 3 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a wearable device provided by an embodiment of the present application.
  • Fig. 5 is another schematic diagram of a wearable device provided by an embodiment of the present application.
  • Fig. 6 is another schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • Fig. 7 is a schematic block diagram of a wearable device provided by an embodiment of the present application.
  • the electronic device may include other functions such as a portable electronic device, such as a mobile phone, a tablet computer, and the like. Exemplary embodiments of portable electronic devices include, but are not limited to, carrying Or portable electronic devices with other operating systems.
  • the aforementioned portable electronic device may also be other portable electronic devices, such as a laptop computer (Laptop) and the like. It should also be understood that, in some other embodiments, the above-mentioned electronic device may not be a portable electronic device, but a desktop computer.
  • the electronic device may be a smart home appliance, such as a smart speaker, a smart home device, and so on.
  • FIG. 1 shows a schematic structural diagram of an electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 can include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include the wake-up processing module and the voiceprint processing module described in the following embodiments.
  • the wake-up processing module in the processor 110 may analyze whether the voice signal in the environment contains a wake-up word, so as to determine whether it is a false wake-up.
  • the voiceprint processing module in the processor 110 may analyze the similarity between the voiceprint information in the voice signal and the voiceprint preset by the user.
  • the processor 110 may include one or more interfaces.
  • the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL).
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 140 is used to receive charging input from the charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
  • the electronic device 100 can send a query request to the wearable device through the wireless communication module 160, and the query request is used to request the wearable device to determine whether the user is talking; it can also receive from the wearable device through the wireless communication module 160.
  • the query result of the device can be sent to the wearable device through the wireless communication module 160, and the query request is used to request the wearable device to determine whether the user is talking; it can also receive from the wearable device through the wireless communication module 160.
  • the query result of the device can be sent to the wearable device through the wireless communication module 160, and the query request is used to request the wearable device to determine whether the user is talking; it can also receive from the wearable device through the wireless communication module 160. The query result of the device.
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the display screen 194 is used to display images, videos, and the like.
  • the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193.
  • the camera 193 is used to capture still images or videos.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU can realize applications such as intelligent cognition of the electronic device 100, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the aforementioned wake-up processing module and voiceprint processing module may also be included in the NPU computing processor.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • UFS universal flash storage
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the electronic device 100 may receive the voice signal in the environment through the microphone 170C.
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyroscope sensor 180B can be determined by the gyroscope sensor 180B.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching, pedometers and so on.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be provided on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • Wake-up word A string used to wake up an electronic device.
  • the wake word is "Xiaoyi Xiaoyi" and so on.
  • Voice wake-up operation includes two parts: wake-up operation and recognition operation.
  • the wake-up operation means that the user speaks a wake-up word to wake up the electronic device, so that the electronic device is in a state of waiting for a voice command; or the wake-up operation means that the user speaks a wake-up word, so that the electronic device enters the wake-up state; or the wake-up operation can be
  • the voiceprint information of the voice signal received by the electronic device matches the preset voiceprint information, so that the electronic device enters an awake state.
  • Voice instructions instructions for voice-controlled electronic devices to perform corresponding voice operations.
  • voice operations can be "book me a flight from Beijing to Shanghai tomorrow morning”, “navigate home”, “play music” and so on.
  • the audio of the playback device may interfere with the wake-up device, causing the wake-up device to be woken up by mistake or fail to wake up.
  • a smart device for example, a mobile phone
  • the word "Xiaoyi Xiaoyi” is spoken in the TV series, it is homophonic with the wake-up word " ⁇ " of the smart device, which will cause the smart device to wake up by mistake.
  • the human body is also a sound conductor, and the sensor on the side of the wearable device close to the skin of the human body detects the voice signal in the human body to determine whether the user is speaking. Accuracy.
  • Fig. 2 shows a set of graphical user interfaces (GUI) provided by an embodiment of the present application.
  • GUI graphical user interfaces
  • the user sends a voice command "Xiaoyi Xiaoyi" containing a wake-up word to the mobile phone.
  • the mobile phone After the mobile phone receives the user's voice command, it can determine whether the voice command contains a wake-up word and whether the voiceprint information in the voice command matches the voiceprint information preset in the mobile phone.
  • the mobile phone can send a query request to the user's wearable device, which is used to query whether the user is speaking; wearable The device can use the sensor close to the human body to determine whether the user is talking; the wearable device can send the query result to the mobile phone, and the query result includes the information that the user is talking; the mobile phone determines that the user is talking based on the information the user is talking After that, the wake-up operation is performed to enter the wake-up state.
  • the mobile phone can reply to the user "I heard a moving voice calling me". In the embodiment of this application, the mobile phone replies to the user "I heard a moving voice calling me", which can indicate that the mobile phone has entered the awake state.
  • the electronic device when performing a wake-up operation on the electronic device, may determine that the voice command contains the wake-up word and the voiceprint information of the voice command matches the voiceprint information preset in the electronic device, the electronic device may continue Sending a query request to the wearable device, the wearable device can assist the electronic device to determine whether the user is speaking, and when the electronic device determines that the user is speaking according to the query result, the electronic device can perform a wake-up operation.
  • the method of detecting the sound in the human body through the wearable device helps to improve the accuracy of the voice wake-up of the electronic device.
  • FIG. 3 shows a schematic flowchart of a voice wake-up method 300 provided by an embodiment of the present application. As shown in FIG. 3, the method 300 may be executed by an electronic device and a wearable device, and the method 300 includes:
  • S301 The electronic device receives a voice instruction in the environment.
  • the mobile phone can receive the voice command "Xiaoyi Xiaoyi" in the environment.
  • S302 The electronic device judges whether the voice command includes a wake-up word.
  • S302 may be executed by the wake-up processing module in the processor 110 in FIG. 1.
  • the wake-up processing module may be a digital signal processing (digital signal processing, DSP).
  • DSP digital signal processing
  • the DSP can process the voice command, so that it can analyze whether the voice command contains a wake-up word.
  • the wake-up processing module may include a speech recognition (automatic speech recognition, ASR) module and a semantic understanding (natural language understanding, NLU) module.
  • ASR automatic speech recognition
  • NLU natural language understanding
  • the main function of the ASR module is to recognize the user's voice as text content
  • the main function of the NLU module is to understand the user's intent and perform slot analysis.
  • the user sends out the voice command "Xiaoyi Xiaoyi”.
  • the mobile phone can send the voice command to the ASR module, and the ASR module converts the voice into text information (for example, " ⁇ "). Therefore, the electronic device can determine that the voice command contains a wake-up word.
  • S303 The electronic device judges whether the voiceprint information of the voice command matches the voiceprint information preset in the electronic device.
  • S303 may be executed by the voiceprint processing module in the processor 110 in FIG. 1.
  • the user can save a recording of the wake-up word in the electronic device in advance, and the electronic device can analyze the user's voiceprint information after acquiring the recording.
  • the electronic device can compare the voice command with the voiceprint of the recording, so as to determine the similarity between the voiceprint information in the voice command and the voiceprint information in the recording. If the similarity is greater than the preset similarity, the electronic device can determine that the voiceprint information of the voice command matches the voiceprint information preset by the electronic device.
  • S302 may be executed first and then S303 may be executed, or S303 may be executed first and then S302 may be executed.
  • the foregoing S302 and S303 can refer to the process of judging whether the voice command contains a wake-up word and whether the voiceprint information of the voice command matches the preset voiceprint information in the prior art.
  • This embodiment of the application does not make this decision. limited.
  • the electronic device may generate the voice feature vector corresponding to the voice instruction according to the voice instruction; the electronic device may match the voice feature vector with the user feature vector. In the case of a successful match, the electronic device sends a query request to the wearable device.
  • the electronic device may send a query request to the wearable device after determining that the voice signal meets the preset condition.
  • the preset condition may be that the voice instructions in S302 and S303 contain a wake-up word and the voiceprint information of the voice instruction matches the preset voiceprint information; or, the preset condition may also be that the voice instruction contains a wake-up word; or The preset condition may also be that the voiceprint information of the voice command matches the preset voiceprint information.
  • the mobile phone when the mobile phone is in the locked screen state and the screen is off, the mobile phone detects the user's voice command to "turn on the camera". At this time, although the voice command does not include the wake-up word, the mobile phone can determine that the voiceprint information of the voice command matches the preset voiceprint information. The phone can determine that the user wants to wake up the phone before turning on the camera. Then the mobile phone may also send a query request to the wearable device after determining that the voiceprint information of the voice command matches the preset voiceprint information.
  • the electronic device may send a query request to the wearable device, and the wearable device may receive the query request sent by the electronic device ,
  • the query request is used to request information that the user is speaking.
  • near field communication includes but is not limited to Wi-Fi, Bluetooth (bluetooth, BT), near field communication technology (near field communication, NFC) and other near field communication technologies.
  • the manner of sending the query request is not specifically limited.
  • the query request may be a message newly defined by the Wi-Fi protocol or the Bluetooth protocol, and the message may carry a field, which is used to request information that the user is speaking.
  • the query request can also be carried in an existing Wi-Fi protocol or Bluetooth protocol message.
  • the query request may be carried in a Bluetooth low energy (BLE) data packet.
  • the BLE data packet can be a directional broadcast packet, and the electronic device can learn the media access control (media access control, MAC) address of the wearable device in advance. Then, when the electronic device determines that the voice command contains the wake-up word and the voiceprint information of the voice command matches the preset voiceprint information, the electronic device can send the BLE data packet to the wearable device through the MAC address of the wearable device.
  • the BLE data packet may carry a field, which is used to request information that the user is speaking.
  • the electronic device and the wearable device may be devices under the same account. For example, Huawei account A is logged in to the electronic device, and Huawei account A is also logged in to the wearable device. Then the electronic device can learn the address information of the wearable device in advance, so that the electronic device can send the query request to the wearable device. Alternatively, the electronic device can also send the query request to the cloud server, and the cloud server forwards it to the wearable device.
  • S305 The wearable device determines the information that the user is speaking according to the query request.
  • the information that the user is speaking may be the confidence that the user is speaking.
  • the confidence that the user is speaking can be determined by sensors on the wearable device. Before judging the confidence that the user is speaking, the confidence of the voice signal can be judged first, where the confidence of the voice signal can also be understood as the confidence that the user is wearing a wearable device.
  • FIG. 4 shows a schematic diagram of a wearable device.
  • An acoustic sensor can be arranged on the side of the wearable device close to the skin, and the acoustic sensor can detect the sound in the body from the skin.
  • the principle of the acoustic sensor is to use the characteristic that the human body is also a sound conductor, and the parameters corresponding to the human heartbeat or breathing sound are determined by the acoustic sensor, for example, the frequency of the heartbeat.
  • a method for determining the confidence level of the sound signal may be to use the frequency and amplitude range of the human heartbeat and breathing sound as the detection rule. For example, if the resting heart rate of an adult is 60-100 beats/minute, and the highest during exercise is 200 beats/minute, then the periodic frequency of the heartbeat sound is 1Hz ⁇ 3.3Hz; the sound frequency of the heartbeat sound itself is between 20Hz-500Hz, which is Say that if a 20Hz-500Hz sound is detected with a period of 1Hz-3.3Hz, the electronic device can determine that the confidence level of the sound signal is 1. This method is relatively simple.
  • Another method to determine the confidence of sound signals is machine learning. Based on a sound detection model, using the heartbeat or breathing sound collected by the sound sensor on the wearable device as training data, it can be trained to obtain a detectable Models of heartbeat or breathing sounds. When there is a heartbeat or breathing sound in the input sound signal, the model can output the confidence level of whether the human heartbeat or breathing sound is in the input sound signal (the confidence level is the confidence level of the sound signal). This model has high detection accuracy and strong anti-interference ability.
  • the wearable device After judging the confidence of the voice signal, the wearable device can continue to judge the confidence that the user is speaking.
  • One method to determine the confidence of speaking is to extract the voice signal from the human background sound (heartbeat or breathing sound, etc.), and calculate the confidence that the wearer is speaking at the time.
  • the basic frequency of adult female voice is 350 Hz to 3 KHz
  • the basic frequency of adult male voice is 100 Hz to 900 Hz.
  • the wearable device may set the sound detection frequency range to [a, b]. Then the confidence that the user is speaking can be determined by the following formula (1):
  • S is the confidence that the user is speaking
  • P is the average intensity of the detected sound signal in the range of [a, b] during the detection period
  • P 0 is the preset basic sound intensity.
  • Another method of determining the degree of confidence is talking machine learning methods, the confidence level of the sound signal (S 1) and the user is speaking confidence (S 2) is calculated together.
  • the model can output S 1 and S 2 at the same time.
  • the above introduces the process of the wearable device setting the acoustic sensor on the side close to the skin to determine the confidence that the user is speaking; the following introduces the use of photoplethysmograph (PPG) sensors in the wearable device.
  • PPG photoplethysmograph
  • FIG. 5 shows a schematic diagram of a PPG sensor.
  • the PPG sensor includes a light emitting component 501 and a light receiving component 502.
  • the green light emitting diode (LED) lamp in the light emitting component 501 in the PPG sensor of the wearable device is matched with the photosensitive photodiode to illuminate the blood. Since different volumes of blood in the blood vessel absorb green light differently, when the heart beats, the blood As the flow rate increases, the amount of green light absorbed will increase accordingly; when the blood flow is in the gap between heart beats, the blood flow will decrease, and the absorbed green light will also decrease. Therefore, the heart rate can be measured based on the absorbance of blood.
  • LED green light emitting diode
  • the light receiving assembly 502 detects the light The intensity will be weakened.
  • the human body's skin, bones, meat, fat, etc. reflect light to a fixed value, while the capillaries are constantly getting bigger and smaller with the pulse volume under the action of the heart.
  • the peripheral blood volume is the largest, the light absorption is also the largest, and the light intensity detected by the light-receiving component 502 is the smallest.
  • the heart is diastolic, the opposite is true, and the detected light intensity is the highest, so that the light-receiving component 502 receives The intensity of the light then showed a pulsating change.
  • PPG sensors have been widely used to detect heart rate, blood cells will vibrate when the user speaks, resulting in a difference between the frequency and amplitude of the blood cell vibration detected when the user is speaking and when the user is not speaking, so that the wearable device can determine that the user is talking Confidence level.
  • the confidence of the voice signal may be determined first.
  • a method for determining the confidence level of the sound signal can be to use the vibration frequency and amplitude range of the blood cells of the human body as the detection rule.
  • the specific process can refer to the above-mentioned process of using the acoustic sensor to determine the confidence level of the sound signal. Go into details.
  • Another way to determine the confidence level of a sound signal is a machine learning method.
  • the frequency and amplitude of the PPG signal collected by the PPG sensor on the actual wearable device (or, the blood cell vibration frequency) And amplitude) as training data can be trained to obtain a model that can detect the frequency and amplitude of the PPG signal.
  • the model can output the confidence level of the sound signal.
  • This model has high detection accuracy and strong anti-interference ability.
  • the wearable device After judging the confidence level of the voice signal, the wearable device can then judge the confidence level that the user is speaking.
  • One way to determine the confidence of speaking is to extract the speech signal from the PPG signal and calculate and output the confidence that the wearer is speaking at the time.
  • the process of determining the confidence that the user is speaking through the frequency of the voice signal can refer to the above-mentioned process of extracting the voice signal from the human background sound (heartbeat or breathing sound, etc.) to calculate and output the confidence that the wearer is speaking at the time. It's concise, so I won't repeat it here.
  • Another user is speaking confidence determination method is a machine learning method, the confidence level of the sound signal (S 1) and the user is speaking confidence (S 2) calculated combination, i.e., the wearable device can be used on PPG
  • the wearable device determines the confidence of the sound signal and the confidence that the user is speaking.
  • the confidence level of the voice signal and the confidence level of the user's speaking may also be determined by the electronic device.
  • the query result sent by the wearable device to the electronic device may include data collected by a sensor (for example, an acoustic sensor or a PPG sensor) of the wearable device.
  • the query result may include the sound signal collected by the sound sensor.
  • the electronic device After the electronic device receives the sound signal, the electronic device can input the sound signal into the training data set A, training data set B, and training data set C stored in the electronic device, so that the electronic device can obtain the confidence level of the sound signal and The confidence that the user is speaking.
  • the query result may include the PPG signal collected by the PPG sensor.
  • the electronic device can input the PPG signal into the training data set D, training data set E, and training data set F stored in the electronic device, so that the electronic device can obtain the confidence level of the sound signal and The confidence that the user is speaking.
  • the process for the electronic device to determine the confidence of the sound signal and the confidence that the user is speaking through the data collected by the sensor of the wearable device can also refer to the above-mentioned wearable device to determine the confidence of the sound signal and the confidence that the user is speaking.
  • the process, for the sake of brevity, will not be repeated here.
  • the wearable device sends a query result to the electronic device, and the electronic device receives the query result sent by the wearable device, and the query result includes information that the user is speaking.
  • the information that the user is speaking includes the confidence level of the aforementioned sound signal and the confidence level that the user is speaking.
  • the wearable device may directly indicate in the query result that the user is not wearing the wearable device. At this time, the electronic device can determine whether to enter the awake state according to the prior art. Or, if the wearable device determines that the confidence level of the sound signal is lower than the first preset value, the wearable device can indicate in the query result that the confidence level of the user is speaking is unknown, and the electronic device can determine whether the confidence level of the user is speaking according to the prior art. Enter the awake state.
  • the wearable device may not send the query result to the electronic device. If the electronic device does not receive the query result within the preset time period, the electronic device can determine that the user is not wearing the wearable device. At this time, the electronic device can determine whether to enter the awake state according to the prior art.
  • the wearable device may indicate in the query result that the user is speaking or, it may indicate in the query result that the user is speaking or, the wearable device can also carry the confidence of the sound signal and the confidence that the user is speaking in the query result and send it to the electronic device, and the electronic device determines whether the user is speaking. For example, if the electronic device determines that the confidence level of the sound signal is greater than or equal to the first preset value and the confidence level that the user is speaking is greater than or equal to the second preset value, the electronic device may determine that the user is speaking, thereby entering the awake state.
  • the manner of sending the query result is not specifically limited.
  • the query result may be a message newly defined by the Wi-Fi protocol or the Bluetooth protocol, and the message may carry a field, which is used to indicate information that the user is speaking.
  • the query result can also be carried in an existing Wi-Fi protocol or Bluetooth protocol message.
  • the query result may be carried in a BLE data packet.
  • the BLE data packet may be a directional broadcast packet, and the wearable device can learn the MAC address of the electronic device in advance. Then the wearable device can send a BLE data packet to the electronic device based on the MAC address of the electronic device.
  • the BLE data packet may carry a field, which is used to request information that the user is speaking.
  • the information that the user is speaking can be indicated by 2 bits, "11” means that the user is wearing the wearable device and the user is talking; “10” means that the user is wearing the wearable device and the user is not speaking; “00” means that the user is not wearing the wearable device Wearable device.
  • the information that the user is speaking can also be indicated by 1 bit, “1" means that the user is speaking; “0” means that the user is not speaking.
  • the information that the user is speaking includes data collected by a sensor (for example, an acoustic sensor or a PPG sensor) of the wearable device.
  • a sensor for example, an acoustic sensor or a PPG sensor
  • the electronic device can perform a wake-up operation when it is determined that the user is speaking, and performing the wake-up operation can cause the electronic device to enter the wake-up state from the non-wake-up state.
  • the electronic device may determine that the confidence of the sound signal is greater than or equal to the first preset.
  • the electronic device may determine that the confidence of the sound signal is greater than or equal to the first preset.
  • the electronic device can directly perform a wake-up operation to enter the wake-up state.
  • the electronic device may not enter Wake up state if the query result received by the electronic device includes the confidence of the sound signal sent by the wearable device and the confidence that the user is speaking, and the confidence of the sound signal is less than the first preset value, the electronic device may not enter Wake up state.
  • the electronic device may not enter the awake state if the confidence of the sound signal is greater than or equal to the first preset value and the confidence that the user is speaking is less than the second preset value.
  • the electronic device can learn that the user is not wearing the wearable device, and the electronic device can determine whether to enter the awake state according to the prior art.
  • the information that the user is speaking includes data collected by a sensor (for example, an acoustic sensor or a PPG sensor) of the wearable device.
  • a sensor for example, an acoustic sensor or a PPG sensor
  • the electronic device After the electronic device receives the data collected by the sensor of the wearable device, it can determine the confidence of the sound signal and the confidence that the user is speaking through the data collected by the sensor of the wearable device. In turn, it can be determined whether the user is speaking according to the two confidence levels.
  • the wearable device can send information that the user is speaking from the electronic device, and the electronic device determines whether the user is speaking based on the information that the user is speaking, which helps to improve the accuracy of voice wake-up. Rate.
  • FIG. 6 shows a schematic flowchart of another method 600 for awakening by voice according to an embodiment of the present application. As shown in FIG. 6, the method 600 may be executed by an electronic device, and the method 600 includes:
  • S601 Receive a voice command in the environment.
  • the electronic device determines that the voice command meets the preset condition, it sends a query request to the wearable device, where the query request is used to request information that the user is speaking.
  • the process for the electronic device to determine that the voice command satisfies the preset condition can refer to the process of S302-303 in the above method 300. For the sake of brevity, details are not repeated here.
  • S603 The electronic device determines whether the query result is received within a preset time period.
  • the electronic device can determine that the wearable device is far from the electronic device, or the electronic device can also determine that the user is not wearing the wearable device, then the electronic device can According to the existing technology, it is judged whether to enter the awake state. For example, the electronic device enters the wake-up state when it is determined that the voice command contains the wake-up word and the voiceprint information of the voice command matches the preset voiceprint information.
  • S604 If the electronic device receives the query result within the preset time period and the query result contains information that the user is speaking, the electronic device determines whether the user is speaking according to the information that the user is speaking.
  • the information that the user is speaking includes the confidence of the voice signal and the confidence that the user is speaking. Then the electronic device may enter the awake state when it is determined that the confidence of the sound signal is greater than or equal to the first preset value and the confidence that the user is speaking is greater than or equal to the second preset value.
  • the information that the user is speaking directly indicates whether the user is speaking.
  • the information that the user is talking for example, the value of the field carried in the BLE data packet is "1" indicates that when the user is talking, the electronic device can enter the wake-up state; the information that the user is talking (for example, the BLE data packet carries The value of the field of "0") indicates that when the user is not speaking, the electronic device may not enter the awake state.
  • the information that the user is speaking may also be data collected by a sensor (for example, an acoustic sensor or a PPG sensor) of the wearable device. Then the electronic device can determine whether the user is speaking according to the data collected by the sensor of the wearable device. For example, the electronic device may determine the confidence of the sound signal and the confidence that the user is speaking according to the data collected by the sensor of the wearable device, so as to determine whether the user is speaking according to the two confidences.
  • a sensor for example, an acoustic sensor or a PPG sensor
  • the process of the electronic device judging whether the user is speaking according to the information that the user is speaking can refer to the process of S307 in the above method 300. For the sake of brevity, it will not be repeated here.
  • the electronic device may not enter the awake state if the electronic device determines that the user is not speaking according to the information that the user is speaking.
  • the wearable device can send information that the user is speaking from the electronic device, and the electronic device determines whether the user is speaking based on the information that the user is speaking, which helps to improve the accuracy of voice wake-up. Rate.
  • An embodiment of the present application also provides an electronic device.
  • the electronic device may include the processor 110 and the wireless communication module 160 as shown in FIG. 1.
  • the wireless communication module 160 can be used for the steps of sending a query request to the wearable device in S602 and receiving the query result sent by the wearable device in S604; The information determines whether the user is speaking and the step of S605.
  • Fig. 7 shows a schematic block diagram of a wearable device provided by an embodiment of the present application.
  • the wearable device may include a processor 710 and a wireless communication module 720, and the wireless communication module 720 may be used to perform the steps of receiving the query request sent by the electronic device in S304 and sending the query result to the electronic device in S306.
  • the processor 710 may be used to perform the step of determining the information that the user is speaking in S305.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种语音唤醒的方法、电子设备、可穿戴设备和系统,该系统包括电子设备和可穿戴设备,电子设备通过近距离无线连接与可穿戴设备通信,电子设备用于采集所处环境中的语音信号;在语音信号满足预设条件时,向可穿戴设备发送查询请求,查询请求用于请求用户正在说话的信息;可穿戴设备用于向电子设备发送查询结果,查询结果包括用户正在说话的信息;电子设备还用于在根据用户正在说话的信息确定用户正在说话时,进入唤醒状态。该系统有助于提升电子设备进行语音唤醒时的准确度。

Description

一种语音唤醒的方法、电子设备、可穿戴设备和系统
本申请要求于2020年6月16日提交中国专利局、申请号为202010550402.4、申请名称为“一种语音唤醒的方法、电子设备、可穿戴设备和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端领域,并且更具体地,涉及一种语音唤醒的方法、电子设备和可穿戴设备。
背景技术
当前市面上的智能语音设备,在语音唤醒、降噪、识别的准确率上,虽然已经有了非常大的进步,但在有背景噪声的情况下,对人声识别就会比较差。特别是在用户距离待唤醒设备较远并且有背景噪声时,唤醒率会更低,而且会产生比较高的误唤醒。
发明内容
本申请提供一种语音唤醒的方法、电子设备、可穿戴设备和系统,有助于提升电子设备进行语音唤醒时的准确度。
第一方面,提供了一种系统,该系统包括电子设备和可穿戴设备,该电子设备通过近距离无线连接与该可穿戴设备通信,该电子设备,用于采集所处环境中的语音信号;该电子设备,还用于在该语音信号满足预设条件时,向该可穿戴设备发送查询请求,该查询请求用于请求用户正在说话的信息;该可穿戴设备,用于向该电子设备发送查询结果,该查询结果包括用户正在说话的信息;该电子设备,还用于在根据该用户正在说话的信息确定用户正在说话时,进入唤醒状态。
本申请实施例中,电子设备在确定语音信号中包含唤醒词且声纹信息与预置的声纹信息匹配后,可以向可穿戴设备查询用户正在说话的信息。当电子设备确定用户正在说话时进行唤醒操作,从而进入唤醒状态。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
在一些可能的实现方式中,该预设条件为该语音信号中包含唤醒词;或者,该预设条件为该语音信号的声纹信息与预置的声纹信息匹配;或者,该预设条件为该语音信号中包含唤醒词且该语音信号的声纹信息与预置的声纹信息匹配。
在一些可能的实现方式中,用户正在说话的信息中包括可穿戴设备的传感器检测到的数据,该电子设备,还用于根据可穿戴设备的传感器检测到的数据,确定用户正在说话。
本申请实施例中,可以利用电子设备具有较强计算能力的特点,可穿戴设备只需要将传感器检测到的数据发送给电子设备,由电子设备进行判断。这样有助于降低电子设备在语音唤醒过程中的时延,也有助于提升语音唤醒的准确度。
结合第一方面,在第一方面的某些实现方式中,该用户正在说话的信息用于指示第一置信度和第二置信度,该第一置信度为用户佩戴该可穿戴设备的置信度,该第二置信度为用户正在说话的置信度;该电子设备具体用于:在该第一置信度大于或者等于第一预设值且该第二置信度大于或者等于第二预设值时,进入唤醒状态。
本申请实施例中,电子设备可以通过用户正在说话的信息中指示的第一置信度和第二置信度,来确定是否进行唤醒操作。电子设备可以在确定用户正在佩戴可穿戴设备且用户正在说话时,进行唤醒操作,从而进入唤醒状态。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
结合第一方面,在第一方面的某些实现方式中,该可穿戴设备包括声传感器,该可穿戴设备具体用于:根据该声传感器在预设检测周期内检测到的人体心跳或者呼吸的声音信号的频率,确定该第一置信度;根据该声音信号在预设频率范围内的强度,确定该第二置信度。
本申请实施例中,可穿戴设备可以利用声传感器检测人体心跳或者呼吸的声音信号,确定第一置信度和第二置信度。这样电子设备就可以通过第一置信度和第二置信度来判断用户是否正在说话。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
在一些可能的实现方式中,该用户正在说话的信息中可以包括可穿戴设备的声传感器在预设周期内检测到人体心跳或者呼吸的声音信号的频率,以及该声音信号在预设频率范围内的强度。电子设备可以根据用户正在说话的信息确定第一置信度和第二置信度。从而电子设备根据第一置信度和第二置信度确定用户是否正在说话。
结合第一方面,在第一方面的某些实现方式中,该可穿戴设备包括利用光电容积描记PPG传感器,该可穿戴设备具体用于:根据该PPG传感器在预设检测周期内检测到的PPG信号的频率,确定该第一置信度;根据该PPG信号在预设频率范围内的强度,确定该第二置信度。
本申请实施例中,可穿戴设备可以利用PPG传感器检测的PPG信号,确定第一置信度和第二置信度。这样电子设备就可以通过第一置信度和第二置信度来判断用户是否正在说话。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
在一些可能的实现方式中,该用户正在说话的信息中可以包括可穿戴设备的PPG传感器在预设检测周期内检测到的PPG信号的频率以及该PPG信号在预设频率范围内的强度。电子设备可以根据用户正在说话的信息确定第一置信度和第二置信度。从而电子设备根据第一置信度和第二置信度确定用户是否正在说话。
结合第一方面,在第一方面的某些实现方式中,该可穿戴设备中包括声传感器,该可穿戴设备具体用于:通过该声传感器采集人体心跳或者呼吸的声音信号;将该声音信号输入第一模型、第二模型和第三模型中,得到该第一置信度和该第二置信度,该第一模型通过采集用户未佩戴该可穿戴设备时的噪声信号得到,该第二模型通过采集用户佩戴该可穿戴设备且未说话时的声音信号得到,该第三模型通过采集用户佩戴该可穿戴设备且正在说话时的声音信号得到。
本申请实施例中,通过机器学习的方式,可穿戴设备可以将声传感器采集的声音信号输入第一模型、第二模型或者第三模型中,从而得到第一置信度和第二置信度。这样电子 设备就可以通过第一置信度和第二置信度来判断用户是否正在说话。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
在一些可能的实现方式中,电子设备中保存该第一模型、第二模型和第三模型。该用户正在说话的信息包括可穿戴设备的声传感器采集的人体心跳或者呼吸的声音信号。电子设备可以将该声音信号输入第一模型、第二模型和第三模型中,从而得到第一置信度和第二置信度。从而电子设备根据第一置信度和第二置信度确定用户是否正在说话。
结合第一方面,在第一方面的某些实现方式中,该可穿戴设备中包括PPG传感器,该可穿戴设备具体用于:通过该PPG传感器采集PPG信号;将该PPG信号输入第一模型、第二模型和第三模型中,得到该第一置信度和该第二置信度,该第一模型通过采集用户未佩戴该可穿戴设备时的噪声信号得到,该第二模型通过采集用户佩戴该可穿戴设备且未说话时的PPG信号得到,该第三模型通过采集用户佩戴该可穿戴设备且正在说话时的PPG信号得到。
本申请实施例中,通过机器学习的方式,可穿戴设备可以将PPG传感器采集的PPG信号输入第一模型、第二模型或者第三模型中,从而得到第一置信度和第二置信度。这样电子设备就可以通过第一置信度和第二置信度来判断用户是否正在说话。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
在一些可能的实现方式中,电子设备中保存该第一模型、第二模型和第三模型。该用户正在说话的信息包括可穿戴设备的PPG传感器采集的PPG信号。电子设备可以将该PPG信号输入第一模型、第二模型和第三模型中,从而得到第一置信度和第二置信度。从而电子设备根据第一置信度和第二置信度确定用户是否正在说话。
结合第一方面,在第一方面的某些实现方式中,该电子设备上登录的账号和该可穿戴设备上登录的账号相关联。
在一些可能的实现方式中,该电子设备上登录的账号和该可穿戴设备上登录的账号可以是相同的账号;或者,该电子设备上登录的账号和该可穿戴设备上登录的账号为同一家庭群组中的账号;或者,该可穿戴设备上登录的账号可以是该电子设备上登录的账号授权过的账号。
第二方面,提供了一种语音唤醒的方法,该方法应用于电子设备中,该电子设备通过近距离无线连接与可穿戴设备通信,该方法包括:该电子设备采集所处环境中的语音信号;在该语音信号满足预设条件时,该电子设备向该可穿戴设备发送查询请求,该查询请求用于请求用户正在说话的信息;该电子设备接收该可穿戴设备发送的查询结果,该查询结果包括用户正在说话的信息;在根据该用户正在说话的信息确定用户正在说话时,该电子设备进入唤醒状态。
本申请实施例中,电子设备在确定语音信号中包含唤醒词且声纹信息与预置的声纹信息匹配后,可以向可穿戴设备查询用户正在说话的信息。当电子设备确定用户正在说话时进行唤醒操作,从而进入唤醒状态。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
在一些可能的实现方式中,该预设条件为该语音信号中包含唤醒词;或者,该预设条件为该语音信号的声纹信息与预置的声纹信息匹配;或者,该预设条件为该语音信号中包含唤醒词且该语音信号的声纹信息与预置的声纹信息匹配。
在一些可能的实现方式中,用户正在说话的信息中包括可穿戴设备的传感器检测到的数据,该电子设备根据该用户正在说话的信息确定用户正在说话,包括:该电子设备根据可穿戴设备的传感器检测到的数据,确定用户正在说话。
在一些可能的实现方式中,该用户正在说话的信息用于指示用户正在说话。
结合第二方面,在第二方面的某些实现方式中,该用户正在说话的信息用于指示第一置信度和第二置信度,该第一置信度为用户佩戴该可穿戴设备的置信度,该第二置信度为用户正在说话的置信度,该在根据该用户正在说话的信息确定用户正在说话时,进入唤醒状态,包括:在该第一置信度大于或者等于第一预设值且该第二置信度大于或者等于第二预设值时,进入唤醒状态。
本申请实施例中,电子设备可以通过用户正在说话的信息中指示的第一置信度和第二置信度,来确定是否进行唤醒操作。电子设备可以确定用户正在佩戴可穿戴设备且用户正在说话时,进行唤醒操作,从而进入唤醒状态。通过可穿戴设备辅助电子设备进行语音唤醒,有助于提升语音唤醒的准确度。
在一些可能的实现方式中,该用户正在说话的信息中可以包括可穿戴设备的声传感器在预设周期内检测到人体心跳或者呼吸的声音信号的频率,以及该声音信号在预设频率范围内的强度。该方法还包括:电子设备根据该声传感器在预设检测周期内检测到的人体心跳或者呼吸的声音信号的频率,确定该第一置信度;该电子设备根据该声音信号在预设频率范围内的强度,确定该第二置信度。
在一些可能的实现方式中,该用户正在说话的信息中可以包括可穿戴设备的PPG传感器在预设检测周期内检测到的PPG信号的频率以及该PPG信号在预设频率范围内的强度。该方法还包括:该电子设备根据该PPG传感器在预设检测周期内检测到的PPG信号的频率,确定该第一置信度;该电子设备根据该PPG信号在预设频率范围内的强度,确定该第二置信度。
在一些可能的实现方式中,电子设备中保存该第一模型、第二模型和第三模型。该第一模型通过采集用户未佩戴该可穿戴设备时的噪声信号得到,该第二模型通过采集用户佩戴该可穿戴设备且未说话时的声音信号得到,该第三模型通过采集用户佩戴该可穿戴设备且正在说话时的声音信号得到。该用户正在说话的信息包括可穿戴设备的声传感器采集的人体心跳或者呼吸的声音信号。该方法还包括:该电子设备将该声音信号输入第一模型、第二模型和第三模型中,得到第一置信度和第二置信度。
在一些可能的实现方式中,电子设备中保存该第一模型、第二模型和第三模型。该第一模型通过采集用户未佩戴该可穿戴设备时的噪声信号得到,该第二模型通过采集用户佩戴该可穿戴设备且未说话时的PPG信号得到,该第三模型通过采集用户佩戴该可穿戴设备且正在说话时的PPG信号得到。该用户正在说话的信息包括可穿戴设备的PPG传感器采集的PPG信号。该方法还包括:该电子设备将该PPG信号输入第一模型、第二模型和第三模型中,得到该第一置信度和该第二置信度。
结合第二方面,在第二方面的某些实现方式中,该电子设备上登录的账号和该可穿戴设备上登录的账号相关联。
在一些可能的实现方式中,该电子设备上登录的账号和该可穿戴设备上登录的账号可以是相同的账号;或者,该电子设备上登录的账号和该可穿戴设备上登录的账号为同一家 庭群组中的账号;或者,该可穿戴设备上登录的账号可以是该电子设备上登录的账号授权过的账号。
第三方面,提供了一种语音唤醒的方法,该方法应用于可穿戴设备中,该可穿戴设备通过近距离无线连接与电子设备通信,该方法包括:接收该电子设备发送的查询请求,该查询请求用于请求用户正在说话的信息;向该电子设备发送查询结果,该查询结果包括用户正在说话的信息。
结合第三方面,在第三方面的某些实现方式中,该用户正在说话的信息用于指示第一置信度和第二置信度,该第一置信度为用户佩戴该可穿戴设备的置信度,该第二置信度为用户正在说话的置信度。
结合第三方面,在第三方面的某些实现方式中,该可穿戴设备包括声传感器,该向该电子设备发送查询结果之前,该方法还包括:根据该声传感器在预设检测周期内检测到的人体心跳或者呼吸的声音信号的频率,确定该第一置信度;根据该声音信号在预设频率范围内的强度,确定该第二置信度。
结合第三方面,在第三方面的某些实现方式中,该可穿戴设备包括PPG传感器,该向该电子设备发送查询结果之前,该方法还包括:根据该PPG传感器在预设检测周期内检测到的PPG信号的频率,确定该第一置信度;根据该PPG信号在预设频率范围内的强度,确定该第二置信度。
结合第三方面,在第三方面的某些实现方式中,该可穿戴设备包括声传感器,该向该电子设备发送查询结果之前,该方法还包括:通过该声传感器采集人体心跳或者呼吸的声音信号;将该声音信号输入第一模型、第二模型和第三模型中,得到该第一置信度和该第二置信度,该第一模型通过采集用户未佩戴该可穿戴设备时的噪声信号得到,该第二模型通过采集用户佩戴该可穿戴设备且未说话时的声音信号得到,该第三模型通过采集用户佩戴该可穿戴设备且正在说话时的声音信号得到。
结合第三方面,在第三方面的某些实现方式中,该可穿戴设备包括PPG传感器,该向该电子设备发送查询结果之前,该方法还包括:通过该PPG传感器采集PPG信号;将该PPG信号输入第一模型、第二模型和第三模型中,得到该第一置信度和该第二置信度,该第一模型通过采集用户未佩戴该可穿戴设备时的噪声信号得到,该第二模型通过采集用户佩戴该可穿戴设备且未说话时的PPG信号得到,该第三模型通过采集用户佩戴该可穿戴设备且正在说话时的PPG信号得到。
结合第三方面,在第三方面的某些实现方式中,该电子设备上登录的账号和该可穿戴设备上登录的账号相关联。
第四方面,提供了一种语音唤醒的装置,该装置包含在电子设备中,该装置具有实现上述第二方面及上述第二方面的可能实现方式中电子设备的功能。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。
第五方面,提供了一种语音唤醒的装置,该装置包含在可穿戴设备中,该装置具有实现上述第三方面及上述第三方面的可能实现方式中可穿戴设备的功能。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。
第六方面,提供了一种电子设备,包括:一个或多个处理器;存储器;以及一个或多个计算机程序。其中,一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令。当指令被电子设备执行时,使得电子设备执行上述第二方面中任一项可能的实现中的语音唤醒的方法。
第七方面,提供了一种可穿戴设备,包括:一个或多个处理器;存储器;以及一个或多个计算机程序。其中,一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令。当指令被可穿戴设备执行时,使得可穿戴设备执行上述第三方面中任一项可能的实现中的语音唤醒的方法。
第八方面,提供了一种芯片系统,所述芯片系统位于电子设备中,该芯片系统包括片上系统SOC,该SOC用于控制麦克风采集该电子设备所处环境中的语音信号;该SOC,还用于在确定该语音信号满足预设条件时,控制无线通信模块向该可穿戴设备发送查询请求,该查询请求用于请求用户正在说话的信息;该SOC,还用于控制该无线通信模块接收该可穿戴设备发送的查询结果,该查询结果包括用户正在说话的信息;该SOC,还用于在根据该用户正在说话的信息确定用户正在说话时,进入唤醒状态。
第九方面,提供了一种芯片系统,该芯片系统位于可穿戴设备中,该可穿戴设备包括SOC,该SOC,用于控制无线通信模块接收电子设备发送的查询请求,该查询请求用于请求用户正在说话的信息;该SOC,还用于控制无线通信模块向该电子设备发送查询结果,该查询结果包括用户正在说话的信息。
第十方面,提供了一种计算机存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述第二方面任一项可能的实现中的语音唤醒方法;或者,使得可穿戴设备执行上述第三方面任一项可能的实现中的语音唤醒方法。
第十一方面,提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行上述第二方面任一项可能的实现中的语音唤醒方法;或者,使得可穿戴设备执行上述第三方面任一项可能的实现中的语音唤醒方法。
附图说明
图1是本申请实施例提供的一种电子设备的硬件结构示意图。
图2是本申请实施例提供的一组图形用户界面。
图3是本申请实施例提供的语音唤醒的方法的示意性流程图。
图4是本申请实施例提供的可穿戴设备的示意图。
图5是本申请实施例提供的可穿戴设备的另一示意图。
图6是本申请实施例提供的语音唤醒的方法的另一示意性流程图。
图7是本申请实施例提供的可穿戴设备的示意性框图。
具体实施方式
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。还应当理解,在本申请以下各实施例中, “至少一个”、“一个或多个”是指一个、两个或两个以上。术语“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
以下介绍了本申请实施例提供设计的电子设备、用于这样的电子设备的用户界面、和用于使用这样的电子设备的实施例。在一些实施例中,电子设备可以是包含其它功能诸如便携式电子设备,诸如手机、平板电脑等。便携式电子设备的示例性实施例包括但不限于搭载
Figure PCTCN2021097124-appb-000001
或者其它操作系统的便携式电子设备。上述便携式电子设备也可以是其它便携式电子设备,诸如膝上型计算机(Laptop)等。还应当理解的是,在其他一些实施例中,上述电子设备也可以不是便携式电子设备,而是台式计算机。在一些实施例中,电子设备可以是智能家电,诸如智能音箱、智能家居设备等等。
示例性的,图1示出了电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器 110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
本申请实施例中,处理器110可以包括以下实施例中描述的唤醒处理模块、声纹处理模块。
示例性的,处理器110中的唤醒处理模块可以分析环境中的语音信号中是否包含唤醒词,从而确定是否是误唤醒。
示例性的,处理器110中的声纹处理模块可以分析语音信号中的声纹信息与用户预设的声纹的相似度。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。UART接口是一种通用串行数据总线,用于异步通信。MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号, 对其进行调频,放大,经天线2转为电磁波辐射出去。
本申请实施例中,电子设备100作可以通过无线通信模块160向可穿戴设备发送查询请求,该查询请求用于请求可穿戴设备判断用户是否正在说话;也可以通过无线通信模块160接收来自可穿戴设备的查询结果。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。显示屏194用于显示图像,视频等。电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。ISP用于处理摄像头193反馈的数据。摄像头193用于捕获静态图像或视频。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
示例性的,上述唤醒处理模块和声纹处理模块也可以包含在NPU计算处理器中。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
本申请实施例中,电子设备100可以通过麦克风170C接收环境中的语音信号。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C 测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。
在介绍本申请实施例之前,首先介绍几个语音唤醒中的概念。
唤醒词:用于唤醒电子设备的字符串。比如,唤醒词是“小艺小艺”等。
语音唤醒操作:语音唤醒操作包括唤醒操作和识别操作两个部分。
其中,唤醒操作是指用户说出唤醒词来唤醒电子设备,使得电子设备处于等待语音指令的状态;或者唤醒操作是指用户说出唤醒词,从而使得电子设备进入唤醒状态;或者唤醒操作可以是电子设备接收到的语音信号的声纹信息与预置声纹信息匹配,从而使得电子设备进入唤醒状态。
语音指令:语音控制电子设备执行相应的语音操作的指令。比如,语音操作可以是“帮我订一张明天上午从北京飞往上海的机票”,“导航回家”,“播放音乐”等等。
识别操作:电子设备被唤醒后,用户说出语音指令来控制电子设备执行相应的语音操作。
目前在进行设备唤醒时,存在以下问题:如果在播放设备播放音频的同时,用户对唤醒设备进行唤醒,播放设备的音频可能会对唤醒设备造成干扰,导致唤醒设备被误唤醒或者唤醒不了。例如,当智能设备(例如,手机)放在电视边上,由于电视剧中说出了“小 姨小姨”,与智能设备的唤醒词“小艺小艺”同音,会带来智能设备的误唤醒。
本申请实施例中,利用人体也是声音导体的特性,通过可穿戴设备贴近人体皮肤的一侧的传感器,探测人体内的语音信号来判断用户是否正在说话,有助于提升电子设备进行语音唤醒时的准确性。
图2示出了本申请实施例提供的一组图形用户界面(graphical user interface,GUI)。
如图2所示,用户向手机发出包含唤醒词的语音指令“小艺小艺”。手机在接收到用户的语音指令后,可以确定语音指令中的是否包含唤醒词以及该语音指令中的声纹信息与手机中预置的声纹信息是否匹配。若手机确定语音指令中包含唤醒词且声纹信息与手机中预置的声纹信息匹配,那么手机可以向用户的可穿戴设备发送查询请求,该查询请求用于查询用户是否正在说话;可穿戴设备可以利用贴近人体一侧的传感器来确定用户是否正在说话;可穿戴设备可以将查询结果发送给手机,查询结果中包括了用户正在说话的信息;手机在根据用户正在说话的信息确定用户正在说话后,进行唤醒操作,从而进入唤醒状态。如图2所示,手机可以向用户回复“我听到有个动人的声音在呼唤我”。本申请实施例中,手机在向用户回复“我听到有个动人的声音在呼唤我”,就可以表示手机进入了唤醒状态。
本申请实施例中,在对电子设备进行唤醒操作时,电子设备可以在确定语音指令中包含唤醒词且语音指令的声纹信息与电子设备中预置的声纹信息匹配后,电子设备可以继续向可穿戴设备发送查询请求,可穿戴设备可以辅助电子设备判断用户是否正在说话,当电子设备根据该查询结果确定用户正在说话时,电子设备可以进行唤醒操作。通过可穿戴设备探测人体内声音的方式,有助于提升电子设备语音唤醒的准确度。
图3示出了本申请实施例提供的语音唤醒的方法300的示意性流程图。如图3所示,该方法300可以由电子设备和可穿戴设备执行,该方法300包括:
S301,电子设备接收环境中的语音指令。
示例性的,如图2所示,手机可以接收到环境中的语音指令“小艺小艺”。
S302,电子设备判断该语音指令中是否包含唤醒词。
示例性的,S302可以由图1中处理器110中的唤醒处理模块执行。
一个实施例中,该唤醒处理模块可以为数字信号处理器(digital signal processing,DSP)。DSP可以对语音指令进行处理,从而可以分析得到该语音指令中是否包含唤醒词。
一个实施例中,该唤醒处理模块可以包括语音识别(automatic speech recognition,ASR)模块、语义理解(natural language understanding,NLU)模块。
其中,ASR模块主要作用是将用户的语音识别为文字内容,NLU模块的主要作用是理解用户的意图(intent),进行槽位(slot)解析。
示例性的,如图2所示,用户发出语音指令“小艺小艺”,手机可以在接收到该语音指令后,将该语音指令发送给ASR模块,ASR模块将该语音转化为文字信息(例如,“小艺小艺”)。从而电子设备可以确定该语音指令中包含唤醒词。
S303,电子设备判断该语音指令的声纹信息与电子设备中预置的声纹信息是否匹配。
示例性的,S303可以由图1中处理器110中的声纹处理模块执行。
用户可以提前在电子设备中保存一段包含唤醒词的录音,电子设备在获取到录音后可以分析用户的声纹信息。在电子设备接收到包含唤醒词的语音指令后,电子设备可以将语音指令与录音的声纹进行对比,从而确定语音指令中声纹信息和录音中的声纹信息的相似 度。若该相似度大于预设相似度,那么电子设备可以确定该语音指令的声纹信息与电子设备预置的声纹信息匹配。
应理解,上述S302与上述S303之间并没有实际的先后顺序,可以先执行S302后执行S303,也可以是先执行S303后执行S302。
还应理解,上述S302和S303可以参考现有技术中判断语音指令中是否包含唤醒词,以及语音指令的声纹信息和预置的声纹信息是否匹配的过程,本申请实施例对此并不作限定。示例性的,电子设备可以根据语音指令生成该语音指令对应的语音特征向量;电子设备可以将该语音特征向量与用户特征向量进行匹配。在匹配成功的情况下,电子设备向可穿戴设备发送查询请求。
还应理解,本申请实施例中,电子设备可以在确定语音信号满足预设条件后,向可穿戴设备发送查询请求。该预设条件可以是S302和S303中的语音指令中包含唤醒词且语音指令的声纹信息和预置的声纹信息匹配;或者,该预设条件也可以是语音指令中包含唤醒词;或者,该预设条件还可以是语音指令的声纹信息和预置的声纹信息匹配。
示例性的,手机处于锁屏状态且处于熄屏状态时,手机检测到用户的语音指令“打开相机”。此时,虽然语音指令中不包含唤醒词,但是手机可以确定该语音指令的声纹信息与预置的声纹信息匹配。手机可以确定用户希望先唤醒手机再打开相机。那么手机也可以在确定语音指令的声纹信息与预置的声纹信息匹配后,向可穿戴设备发送查询请求。
S304,若电子设备确定语音指令中包含唤醒词且语音指令的声纹信息和预置的声纹信息匹配,那么电子设备可以向可穿戴设备发送查询请求,可穿戴设备接收电子设备发送的查询请求,该查询请求用于请求用户正在说话的信息。
一个实施例中,电子设备在向可穿戴设备发送查询请求之前,电子设备与可穿戴设备通过近场通讯建立连接。近场通讯包括但不限于Wi-Fi,蓝牙(bluetooth,BT),近距离无线通信技术(near field communication,NFC)等等近场通讯技术。
本申请实施例中,对查询请求发送的方式并不作具体限定。例如,该查询请求可以是Wi-Fi协议或者蓝牙协议新定义的一条消息,该消息中可以携带字段,该字段用于请求用户正在说话的信息。
或者,该查询请求也可以携带在现有的Wi-Fi协议或者蓝牙协议的消息中。
例如,该查询请求可以携带在蓝牙低功耗(Bluetooth low energy,BLE)数据包中。该BLE数据包可以为定向广播包,电子设备可以提前获知可穿戴设备的媒体接入控制(media access control,MAC)地址。那么在电子设备确定语音指令中包含唤醒词且语音指令的声纹信息和预置的声纹信息匹配时,电子设备可以通过针对可穿戴设备的MAC地址,向可穿戴设备发送BLE数据包。该BLE数据包中可以携带字段,该字段用于请求用户正在说话的信息。
一个实施例中,电子设备和可穿戴设备可以是同一账号下的设备。例如,电子设备上登录了华为账号A,可穿戴设备上也登录了华为账号A。那么电子设备可以提前获知可穿戴设备的地址信息,从而电子设备可以向可穿戴设备发送该查询请求。或者,电子设备也可以将该查询请求发送到云端服务器,由云端服务器转发给可穿戴设备。
S305,可穿戴设备根据该查询请求,确定用户正在说话的信息。
本申请实施例中,用户正在说话的信息可以为用户正在说话的置信度。可以通过可穿 戴设备上的传感器来确定用户正在说话的置信度。在判断用户正在说话的置信度之前,可以先判断声音信号的置信度,其中声音信号的置信度也可以理解为用户佩戴着可穿戴设备的置信度。
一个实施例中,图4示出了一种可穿戴设备的示意图。可以在可穿戴设备贴近皮肤的一侧设置一个声传感器,声传感器可以从皮肤探测体内声音。
本申请实施例中,声传感器的原理是利用人体也是声音导体的特性,通过声传感器确定人体心跳或者呼吸声音对应的参数,例如,心跳的频率等。
一种确定声音信号的置信度的方法可以是将人体心跳、呼吸声音的频率和幅度范围作为检测规则。比如成年人的静止心率在60-100次/分钟,运动时最高达200次/分钟,那么心跳声音发生的周期频率就在1Hz~3.3Hz;心跳声本身声音频率为20Hz-500Hz间,也就是说如果以1Hz-3.3Hz的周期,检测到20Hz-500Hz的声音,则电子设备可以确定声音信号的置信度是1。这种方法比较简单。
另一种确定声音信号的置信度的方法是机器学习的方法,以一个声音检测模型为基础,用可穿戴设备上声传感器采集到的心跳或者呼吸的声音作为训练数据,能够训练得到一个可检测心跳或者呼吸声音的模型。当输入的声音信号中存在心跳或者呼吸声音时,该模型可以输出人体心跳或者呼吸声音是否在输入声音信号中的置信度(该置信度即为声音信号的置信度)。该模型检测精度较高,抗干扰能力较强。
在判断完声音信号的置信度后,可穿戴设备可以继续判断用户正在说话的置信度。
一种确定正在说话的置信度的方法是将语音信号从人体背景声音(心跳或者呼吸声音等)提取出来,计算穿戴者当时正在说话的置信度。
例如,如果成年女性语音的基础频率为350Hz至3KHz,成年男性语音的基本频率为100Hz至900Hz。类似的,也可以使用声音检测规则,根据用户的性别,设置不同的声音检测频率。
一个实施例中,可穿戴设备可以设置声音检测频率范围为[a,b]。那么用户正在说话的置信度可以由以下公式(1)确定:
S=P/P 0       (1)
其中,S为用户正在说话的置信度,P为检测周期内探测到的声音信号在[a,b]范围时的平均强度,P 0为预置的基础声音强度。
另一种确定正在说话的置信度的方法是机器学习方法,把声音信号的置信度(S 1)和用户正在说话的置信度(S 2)计算结合起来。可以使用可穿戴设备上声传感器采集到的用户佩戴着可穿戴设备且正在说话时的声音信号(包括心跳、呼吸声和说话声音)作为训练数据集A(S 1=1,S 2=1)、用户佩戴着可穿戴设备且未说话时采集到的声音信号(包括心跳、呼吸声)作为训练数据集B(S 1=1,S 2=0)、用户未佩戴可穿戴设备时采集到的噪声数据(不包含心跳、呼吸和说话声音)作为训练数据集C(S 1=0、S 2=0),能够训练得到一个可同时检测心跳、呼吸声音和说话声音的模型。当在该模型中输入声传感器检测到的声音信号或者噪声信号时,该模型可同时输出S 1和S 2
以上介绍了可穿戴设备通过在贴近皮肤的一侧设置声传感器的方式来确定用户正在说话的置信度的过程;下面介绍通过可穿戴设备中的利用光电容积描记(photo plethysmo graph,PPG)传感器来确定用户正在说话的置信度的过程。
图5示出了PPG传感器的示意图,如图5所示,PPG传感器包括发射光组件501和接收光组件502。可穿戴设备的PPG传感器中的发射光组件501中绿色发光二极管(light emitting diode,LED)灯搭配感光光电二极管照射血液,由于血管内不同容积的血液对绿光吸收不同,在心脏跳动时,血液流速增多,绿光的吸收量会随之变大;处于心脏跳动的间隙时血流会减少,吸收的绿光也会随之降低。因此,根据血液的吸光度可以测量心率。
具体而言,当一定波长的光束照射到皮肤表面时,光束将穿过皮肤传送到接收光组件502,在此过程中由于受到皮肤肌肉和血液吸收的衰减作用,接收光组件502检测到光的强度将减弱。其中,人体的皮肤、骨骼、肉、脂肪等对光的反射是固定值,而毛细血管在心脏的作用下随着脉搏容积不停地变大变小。当心脏收缩时,外周血容量最多,光吸收量也最大,接收光组件502检测到的光强度最小;而在心脏舒张时,正好相反,检测到的光强度最大,使接收光组件502接收到的光强度随之呈脉动性变化。
由于PPG传感器已经广泛应用于探测心率,用户说话时也会导致血液细胞震动,导致说话时探测到的血液细胞震动频率和幅度和用户不说话时有差别,从而可以使得可穿戴设备确定用户正在说话的置信度。
一个实施例中,在利用PPG传感器确定用户正在说话的置信度之前,可以先确定声音信号的置信度。
一种确定声音信号的置信度的方法可以是将人体的血液细胞震动频率和幅度范围作为检测规则,具体过程可以参考上述利用声传感器确定声音信号的置信度的过程,为了简洁,在此不再赘述。
另一种确定声音信号的置信度的方法是机器学习的方法,以一个PPG信号检测模型为基础,用实际可穿戴设备上PPG传感器采集到的PPG信号的频率和幅度(或者,血液细胞震动频率和幅度)作为训练数据,能够训练得到一个可检测PPG信号的频率和幅度的模型。当在该模型中输入PPG传感器检测到的PPG信号的频率和幅度时,该模型可以输出声音信号的置信度。该模型检测精度较高,抗干扰能力较强。
在判断完声音信号的置信度后,可穿戴设备可以接着判断用户正在说话的置信度。
一种确定正在说话的置信度的方法是将语音信号从PPG信号提取出来,计算输出穿戴者当时正在说话的置信度。
通过语音信号的频率来确定用户正在说话的置信度的过程可以参考上述通过将语音信号从人体背景声音(心跳或者呼吸声音等)提取出来,计算输出穿戴者当时正在说话的置信度的过程,为了简洁,在此不再赘述。
另一种确定用户正在说话的置信度的方法是机器学习方法,把声音信号的置信度(S 1)和用户正在说话的置信度(S 2)计算结合起来,即可以使用可穿戴设备上PPG传感器采集到的用户佩戴着可穿戴设备且正在说话时的PPG信号作为训练数据集D(S 1=1,S 2=1)、用户佩戴可穿戴设备且未说话时采集到的PPG信号作为训练数据集E(S 1=1,S 2=0)、用户未佩戴可穿戴设备时采集到的噪声数据(不包含PPG信号)作为训练数据集F(S 1=0、S 2=0),能够训练得到一个可同时检测PPG信号的模型。当在该模型中输入可穿戴设备检测到的PPG信号时,该模型可同时输出S 1和S 2
以上是通过可穿戴设备确定声音信号的置信度以及用户正在说话的置信度为例进行说明的。本申请实施例中,也可以由电子设备来确定声音信号的置信度以及用户正在说话 的置信度。
示例性的,可穿戴设备发送给电子设备的查询结果中可以包括可穿戴设备的传感器(例如,声传感器或者PPG传感器)采集的数据。例如,查询结果中可以包括声传感器采集的声音信号。当电子设备接收到该声音信号后,电子设备可以将该声音信号输入电子设备中保存的训练数据集A、训练数据集B和训练数据集C中,从而电子设备可以获得声音信号的置信度以及用户正在说话的置信度。
又例如,查询结果中可以包括PPG传感器采集的PPG信号。当电子设备接收到该PPG信号后,电子设备可以将该PPG信号输入电子设备中保存的训练数据集D、训练数据集E和训练数据集F中,从而电子设备可以获得声音信号的置信度以及用户正在说话的置信度。
应理解,电子设备通过可穿戴设备的传感器采集的数据确定声音信号的置信度以及用户正在说话的置信度的过程还可以参考上述可穿戴设备确定声音信号的置信度以及用户正在说话的置信度的过程,为了简洁,在此不再赘述。
S306,可穿戴设备向电子设备发送查询结果,电子设备接收可穿戴设备发送的查询结果,该查询结果中包括用户正在说话的信息。
一个实施例中,用户正在说话的信息中包括上述声音信号的置信度以及用户正在说话的置信度。
一个实施例中,如果可穿戴设备判断声音信号的置信度低于第一预设值时,可穿戴设备可以直接在查询结果中指示用户未佩戴可穿戴设备。此时电子设备可以按照现有技术来判断是否进入唤醒状态。或者,如果可穿戴设备判断声音信号的置信度低于第一预设值时,可穿戴设备可以在查询结果中指示用户正在说话的置信度为不可知,电子设备可以按照现有技术来判断是否进入唤醒状态。
或者,如果可穿戴设备确定声音信号的置信度低于第一预设值时,可穿戴设备可以不向电子设备发送该查询结果。如果电子设备在预设时长内未收到该查询结果,那么电子设备可以确定用户未佩戴可穿戴设备。此时电子设备可以按照现有技术来判断是否进入唤醒状态。
本申请实施例中,可穿戴设备在判断声音信号的置信度大于或者等于第一预设值且用户正在说话的置信度大于或者等于第二预设值后,可以在查询结果中指示用户正在说话;或者,可穿戴设备也可以将声音信号的置信度和用户正在说话的置信度携带在查询结果中发送给电子设备,由电子设备判断用户是否正在说话。例如,如果电子设备确定声音信号的置信度大于或者等于第一预设值且用户正在说话的置信度大于或者等于第二预设值,那么电子设备可以确定用户正在说话,从而进入唤醒状态。
本申请实施例中,对查询结果发送的方式并不作具体限定。例如,该查询结果可以是Wi-Fi协议或者蓝牙协议新定义的一条消息,该消息中可以携带字段,该字段用于指示用户正在说话的信息。
或者,该查询结果也可以携带在现有的Wi-Fi协议或者蓝牙协议的消息中。
示例性的,该查询结果可以携带在BLE数据包中。该BLE数据包可以为定向广播包,可穿戴设备可以提前获知电子设备的MAC地址。那么在可穿戴设备可以针对电子设备的MAC地址,向电子设备发送BLE数据包。该BLE数据包中可以携带字段,该字段用于 请求用户正在说话的信息。
例如,用户正在说话的信息可以用2比特指示,“11”表示用户正在佩戴可穿戴设备且用户正在说话;“10”表示用户正在佩戴可穿戴设备且用户未说话;“00”表示用户未佩戴可穿戴设备。
又例如,用户正在说话的信息也可以用1比特指示,“1”表示用户正在说话;“0”表示用户未说话。
一个实施例中,用户正在说话的信息中包括可穿戴设备的传感器(例如,声传感器或者PPG传感器)采集的数据。
S307,若电子设备根据用户正在说话的信息确定用户正在说话,则电子设备进入唤醒状态。
应理解,本申请实施例中,电子设备在确定用户正在说话时,可以执行唤醒操作,执行唤醒操作可以使得电子设备从非唤醒状态进入唤醒状态。
一个实施例中,若电子设备接收到的查询结果中包括可穿戴设备发送的声音信号的置信度以及用户正在说话的置信度,那么电子设备可以在确定声音信号的置信度大于或者等于第一预设值且用户正在说话的置信度大于或者等于第二预设值时,进行唤醒操作,从而进入唤醒状态。
一个实施例中,若电子设备接收到的查询结果中指示用户正在说话,那么电子设备可以直接进行唤醒操作,从而进入唤醒状态。
应理解,若电子设备接收到的查询结果中包括可穿戴设备发送的声音信号的置信度以及用户正在说话的置信度,且声音信号的置信度小于第一预设值时,电子设备可以不进入唤醒状态。
或者,若声音信号的置信度大于或者等于第一预设值且用户正在说话的置信度小于第二预设值时,电子设备也可以不进入唤醒状态。
还应理解,若电子设备在预设时长内未接收到查询结果,那么电子设备可以获知用户未佩戴可穿戴设备,那么电子设备可以按照现有技术来判断是否进入唤醒状态。
一个实施例中,用户正在说话的信息中包括可穿戴设备的传感器(例如,声传感器或者PPG传感器)采集的数据。电子设备在接收到可穿戴设备的传感器采集的数据后,可以通过可穿戴设备的传感器采集的数据来确定声音信号的置信度以及用户正在说话的置信度。进而可以根据这两个置信度来确定用户是否正在说话。
本申请实施例中,通过探测人体声音的方式,可穿戴设备可以电子设备发送用户正在说话的信息,由电子设备通过用户正在说话的信息来判断用户是否正在说话,这样有助于提升语音唤醒准确率。
图6示出了本申请实施例提供的另一语音唤醒的方法600的示意性流程图。如图6所示,该方法600可以由电子设备执行,该方法600包括:
S601,接收环境中的语音指令。
S602,若电子设备确定语音指令满足预设条件时,向可穿戴设备发送查询请求,该查询请求用于请求用户正在说话的信息。
应理解,电子设备确定语音指令满足预设条件的过程可以参考上述方法300中S302-303的过程,为了简洁,在此不再赘述。
还应理解,电子设备向可穿戴设备发送查询请求的过程可以参考上述方法300中的S304,为了简洁,在此不再赘述。
S603,电子设备判断在预设时长内是否接收到查询结果。
一个实施例中,若电子设备在预设时长内没有接收到查询结果,那么电子设备可以确定可穿戴设备距离电子设备较远,或者电子设备也可以确定用户未佩戴可穿戴设备,那么电子设备可以按照现有技术来判断是否进入唤醒状态。例如,电子设备在确定语音指令中包含唤醒词且语音指令的声纹信息与预置的声纹信息匹配时,进入唤醒状态。
S604,若电子设备在预设时长内接收到了查询结果且该查询结果中包含用户正在说话的信息,那么电子设备根据该用户正在说话的信息判断用户是否正在说话。
一个实施例中,用户正在说话的信息包括声音信号置信度以及用户正在说话的置信度。那么电子设备可以在确定声音信号的置信度大于或者等于第一预设值且用户正在说话的置信度大于或者等于第二预设值时,电子设备可以进入唤醒状态。
一个实施例中,用户正在说话的信息直接指示用户是否正在说话。例如,用户正在说话的信息(例如,BLE数据包中携带的字段取值为“1”)指示用户正在说话时,电子设备可以进入唤醒状态;用户正在说话的信息(例如,BLE数据包中携带的字段取值为“0”)指示用户未说话时,电子设备可以不进入唤醒状态。
一个实施例中,用户正在说话的信息也可以为可穿戴设备的传感器(例如,声传感器或者PPG传感器)采集的数据。那么电子设备可以根据可穿戴设备的传感器采集的数据来确定用户是否正在说话。例如,电子设备可以根据可穿戴设备的传感器采集的数据来确定声音信号的置信度以及用户正在说话的置信度,从而根据这两个置信度来确定用户是否正在说话。
应理解,电子设备根据用户正在说话的信息判断用户是否正在说话的过程可以参考上述方法300中的S307的过程,为了简洁,在此不再赘述。
S605,若电子设备确定用户正在说话,那么电子设备进入唤醒状态。
一个实施例中,若电子设备根据用户正在说话的信息确定用户未说话,那么电子设备可以不进入唤醒状态。
本申请实施例中,通过探测人体声音的方式,可穿戴设备可以电子设备发送用户正在说话的信息,由电子设备通过用户正在说话的信息来判断用户是否正在说话,这样有助于提升语音唤醒准确率。
本申请实施例中还提供了一种电子设备,该电子设备可以包括如图1所示的处理器110以及无线通信模块160。其中,无线通信模块160可以用于上述S602中向可穿戴设备发送查询请求、S604中接收可穿戴设备发送的查询结果的步骤;处理器110可以用于执行S603、S604中根据该用户正在说话的信息判断用户是否正在说话以及S605的步骤。
图7示出了本申请实施例提供的可穿戴设备的示意性框图。该可穿戴设备可以包括处理器710和无线通信模块720,该无线通信模块720可以用于执行上述S304中接收电子设备发送的查询请求以及S306中向电子设备发送查询结果的步骤。该处理器710可以用于执行S305中确定用户正在说话的信息的步骤。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以 硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (29)

  1. 一种系统,所述系统包括电子设备和可穿戴设备,所述电子设备通过近距离无线连接与所述可穿戴设备通信,其特征在于,
    所述电子设备,用于采集所处环境中的语音信号;
    所述电子设备,还用于在所述语音信号满足预设条件时,向所述可穿戴设备发送查询请求,所述查询请求用于请求用户正在说话的信息;
    所述可穿戴设备,用于向所述电子设备发送查询结果,所述查询结果包括用户正在说话的信息;
    所述电子设备,还用于在根据所述用户正在说话的信息确定用户正在说话时,进入唤醒状态。
  2. 根据权利要求1所述的系统,其特征在于,所述用户正在说话的信息用于指示第一置信度和第二置信度,所述第一置信度为用户佩戴所述可穿戴设备的置信度,所述第二置信度为用户正在说话的置信度;
    所述电子设备具体用于:在所述第一置信度大于或者等于第一预设值且所述第二置信度大于或者等于第二预设值时,进入唤醒状态。
  3. 根据权利要求2所述的系统,其特征在于,所述可穿戴设备包括声传感器,所述可穿戴设备具体用于:
    根据所述声传感器在预设检测周期内检测到的人体心跳或者呼吸的声音信号的频率,确定所述第一置信度;
    根据所述声音信号在预设频率范围内的强度,确定所述第二置信度。
  4. 根据权利要求2所述的系统,其特征在于,所述可穿戴设备包括利用光电容积描记PPG传感器,所述可穿戴设备具体用于:
    根据所述PPG传感器在预设检测周期内检测到的PPG信号的频率,确定所述第一置信度;
    根据所述PPG信号在预设频率范围内的强度,确定所述第二置信度。
  5. 根据权利要求2所述的系统,其特征在于,所述可穿戴设备中包括声传感器,所述可穿戴设备具体用于:
    通过所述声传感器采集人体心跳或者呼吸的声音信号;
    将所述声音信号输入第一模型、第二模型和第三模型中,得到所述第一置信度和所述第二置信度,所述第一模型通过采集用户未佩戴所述可穿戴设备时的噪声信号得到,所述第二模型通过采集用户佩戴所述可穿戴设备且未说话时的声音信号得到,所述第三模型通过采集用户佩戴所述可穿戴设备且正在说话时的声音信号得到。
  6. 根据权利要求2所述的系统,其特征在于,所述可穿戴设备中包括PPG传感器,所述可穿戴设备具体用于:
    通过所述PPG传感器采集PPG信号;
    将所述PPG信号输入第一模型、第二模型和第三模型中,得到所述第一置信度和所述第二置信度,所述第一模型通过采集用户未佩戴所述可穿戴设备时的噪声信号得到,所 述第二模型通过采集用户佩戴所述可穿戴设备且未说话时的PPG信号得到,所述第三模型通过采集用户佩戴所述可穿戴设备且正在说话时的PPG信号得到。
  7. 根据权利要求1至6中任一项所述的系统,其特征在于,所述电子设备上登录的账号和所述可穿戴设备上登录的账号相关联。
  8. 一种电子设备,所述电子设备通过近距离无线连接与可穿戴设备通信,其特征在于,所述电子设备包括:
    一个或多个处理器;
    一个或多个存储器;
    所述一个或多个存储器存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令,当所述指令被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:
    采集所处环境中的语音信号;
    在所述语音信号满足预设条件时,向所述可穿戴设备发送查询请求,所述查询请求用于请求用户正在说话的信息;
    接收所述可穿戴设备发送的查询结果,所述查询结果包括用户正在说话的信息;
    在根据所述用户正在说话的信息确定用户正在说话时,进入唤醒状态。
  9. 根据权利要求8所述的电子设备,其特征在于,所述用户正在说话的信息用于指示第一置信度和第二置信度,所述第一置信度为用户佩戴所述可穿戴设备的置信度,所述第二置信度为用户正在说话的置信度,当所述指令被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:
    在所述第一置信度大于或者等于第一预设值且所述第二置信度大于或者等于第二预设值时,进入唤醒状态。
  10. 根据权利要求8或9所述的电子设备,其特征在于,所述电子设备上登录的账号和所述可穿戴设备上登录的账号相关联。
  11. 一种可穿戴设备,所述可穿戴设备通过近距离无线连接与电子设备通信,其特征在于,所述可穿戴设备包括:
    一个或多个处理器;
    一个或多个存储器;
    所述一个或多个存储器存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令,当所述指令被所述一个或多个处理器执行时,使得所述可穿戴设备执行以下步骤:
    接收所述电子设备发送的查询请求,所述查询请求用于请求用户正在说话的信息;
    向所述电子设备发送查询结果,所述查询结果包括用户正在说话的信息。
  12. 根据权利要求11所述的可穿戴设备,其特征在于,所述用户正在说话的信息用于指示第一置信度和第二置信度,所述第一置信度为用户佩戴所述可穿戴设备的置信度,所述第二置信度为用户正在说话的置信度。
  13. 根据权利要求12所述的可穿戴设备,其特征在于,所述可穿戴设备包括声传感器,当所述指令被所述一个或多个处理器执行时,使得所述可穿戴设备执行以下步骤:
    根据所述声传感器在预设检测周期内检测到的人体心跳或者呼吸的声音信号的频率,确定所述第一置信度;
    根据所述声音信号在预设频率范围内的强度,确定所述第二置信度。
  14. 根据权利要求12所述的可穿戴设备,其特征在于,所述可穿戴设备包括PPG传感器,当所述指令被所述一个或多个处理器执行时,使得所述可穿戴设备执行以下步骤:
    根据所述PPG传感器在预设检测周期内检测到的PPG信号的频率,确定所述第一置信度;
    根据所述PPG信号在预设频率范围内的强度,确定所述第二置信度。
  15. 根据权利要求12所述的可穿戴设备,其特征在于,所述可穿戴设备包括声传感器,当所述指令被所述一个或多个处理器执行时,使得所述可穿戴设备执行以下步骤:
    通过所述声传感器采集人体心跳或者呼吸的声音信号;
    将所述声音信号输入第一模型、第二模型和第三模型中,得到所述第一置信度和所述第二置信度,所述第一模型通过采集用户未佩戴所述可穿戴设备时的噪声信号得到,所述第二模型通过采集用户佩戴所述可穿戴设备且未说话时的声音信号得到,所述第三模型通过采集用户佩戴所述可穿戴设备且正在说话时的声音信号得到。
  16. 根据权利要求12所述的可穿戴设备,其特征在于,所述可穿戴设备包括PPG传感器,当所述指令被所述一个或多个处理器执行时,使得所述可穿戴设备执行以下步骤:
    通过所述PPG传感器采集PPG信号;
    将所述PPG信号输入第一模型、第二模型和第三模型中,得到所述第一置信度和所述第二置信度,所述第一模型通过采集用户未佩戴所述可穿戴设备时的噪声信号得到,所述第二模型通过采集用户佩戴所述可穿戴设备且未说话时的PPG信号得到,所述第三模型通过采集用户佩戴所述可穿戴设备且正在说话时的PPG信号得到。
  17. 根据权利要求11至16中任一项所述的可穿戴设备,其特征在于,所述电子设备上登录的账号和所述可穿戴设备上登录的账号相关联。
  18. 一种语音唤醒的方法,所述方法应用于电子设备,所述电子设备通过近距离无线连接与可穿戴设备通信,其特征在于,所述方法包括:
    采集所处环境中的语音信号;
    在所述语音信号满足预设条件时,向所述可穿戴设备发送查询请求,所述查询请求用于请求用户正在说话的信息;
    接收所述可穿戴设备发送的查询结果,所述查询结果包括用户正在说话的信息;
    在根据所述用户正在说话的信息确定用户正在说话时,进入唤醒状态。
  19. 根据权利要求18所述的方法,其特征在于,所述用户正在说话的信息用于指示第一置信度和第二置信度,所述第一置信度为用户佩戴所述可穿戴设备的置信度,所述第二置信度为用户正在说话的置信度,所述在根据所述用户正在说话的信息确定用户正在说话时,进入唤醒状态,包括:
    在所述第一置信度大于或者等于第一预设值且所述第二置信度大于或者等于第二预设值时,进入唤醒状态。
  20. 根据权利要求18或19所述的方法,其特征在于,所述电子设备上登录的账号和所述可穿戴设备上登录的账号相关联。
  21. 一种语音唤醒的方法,所述方法应用于可穿戴设备,所述可穿戴设备通过近距离无线连接与电子设备通信,其特征在于,所述方法包括:
    接收所述电子设备发送的查询请求,所述查询请求用于请求用户正在说话的信息;
    向所述电子设备发送查询结果,所述查询结果包括用户正在说话的信息。
  22. 根据权利要求21所述的方法,其特征在于,所述用户正在说话的信息用于指示第一置信度和第二置信度,所述第一置信度为用户佩戴所述可穿戴设备的置信度,所述第二置信度为用户正在说话的置信度。
  23. 根据权利要求22所述的方法,其特征在于,所述可穿戴设备包括声传感器,所述向所述电子设备发送查询结果之前,所述方法还包括:
    根据所述声传感器在预设检测周期内检测到的人体心跳或者呼吸的声音信号的频率,确定所述第一置信度;
    根据所述声音信号在预设频率范围内的强度,确定所述第二置信度。
  24. 根据权利要求22所述的方法,其特征在于,所述可穿戴设备包括PPG传感器,所述向所述电子设备发送查询结果之前,所述方法还包括:
    根据所述PPG传感器在预设检测周期内检测到的PPG信号的频率,确定所述第一置信度;
    根据所述PPG信号在预设频率范围内的强度,确定所述第二置信度。
  25. 根据权利要求22所述的方法,其特征在于,所述可穿戴设备包括声传感器,所述向所述电子设备发送查询结果之前,所述方法还包括:
    通过所述声传感器采集人体心跳或者呼吸的声音信号;
    将所述声音信号输入第一模型、第二模型和第三模型中,得到所述第一置信度和所述第二置信度,所述第一模型通过采集用户未佩戴所述可穿戴设备时的噪声信号得到,所述第二模型通过采集用户佩戴所述可穿戴设备且未说话时的声音信号得到,所述第三模型通过采集用户佩戴所述可穿戴设备且正在说话时的声音信号得到。
  26. 根据权利要求22所述的方法,其特征在于,所述可穿戴设备包括PPG传感器,所述向所述电子设备发送查询结果之前,所述方法还包括:
    通过所述PPG传感器采集PPG信号;
    将所述PPG信号输入第一模型、第二模型和第三模型中,得到所述第一置信度和所述第二置信度,所述第一模型通过采集用户未佩戴所述可穿戴设备时的噪声信号得到,所述第二模型通过采集用户佩戴所述可穿戴设备且未说话时的PPG信号得到,所述第三模型通过采集用户佩戴所述可穿戴设备且正在说话时的PPG信号得到。
  27. 根据权利要求21至26中任一项所述的方法,其特征在于,所述电子设备上登录的账号和所述可穿戴设备上登录的账号相关联。
  28. 一种计算机可读存储介质,其特征在于,包括计算机指令,
    当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求18至20中任一项所述的语音唤醒的方法;或者,
    当所述计算机指令在可穿戴设备上运行时,使得所述可穿戴设备执行如权利要求21至27中任一项所述的语音唤醒的方法。
  29. 一种计算机程序产品,其特征在于,
    当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求18至20中任一项所述的语音唤醒的方法;或者,
    当所述计算机程序产品在可穿戴设备上运行时,使得所述可穿戴设备执行如权利要求 21至27中任一项所述的语音唤醒的方法。
PCT/CN2021/097124 2020-06-16 2021-05-31 一种语音唤醒的方法、电子设备、可穿戴设备和系统 WO2021254131A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/001,961 US20230239800A1 (en) 2020-06-16 2021-05-31 Voice Wake-Up Method, Electronic Device, Wearable Device, and System
EP21824995.1A EP4156177A4 (en) 2020-06-16 2021-05-31 VOICE AWAKENING METHOD, ELECTRONIC DEVICE, WEARABLE DEVICE AND SYSTEM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010550402.4A CN113823288A (zh) 2020-06-16 2020-06-16 一种语音唤醒的方法、电子设备、可穿戴设备和系统
CN202010550402.4 2020-06-16

Publications (1)

Publication Number Publication Date
WO2021254131A1 true WO2021254131A1 (zh) 2021-12-23

Family

ID=78924321

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097124 WO2021254131A1 (zh) 2020-06-16 2021-05-31 一种语音唤醒的方法、电子设备、可穿戴设备和系统

Country Status (4)

Country Link
US (1) US20230239800A1 (zh)
EP (1) EP4156177A4 (zh)
CN (1) CN113823288A (zh)
WO (1) WO2021254131A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115881118A (zh) * 2022-11-04 2023-03-31 荣耀终端有限公司 一种语音交互方法及相关电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230230612A1 (en) * 2022-01-18 2023-07-20 Google Llc Privacy-preserving social interaction measurement

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160037251A1 (en) * 2012-07-19 2016-02-04 Shavar Daniels Communication system and method
CN106714023A (zh) * 2016-12-27 2017-05-24 广东小天才科技有限公司 一种基于骨传导耳机的语音唤醒方法、系统及骨传导耳机
WO2018075170A1 (en) * 2016-10-20 2018-04-26 Qualcomm Incorporated Systems and methods for in-ear control of remote devices
CN108021349A (zh) * 2016-10-28 2018-05-11 中兴通讯股份有限公司 调整耳机工作状态的方法、装置、耳机及智能终端
CN109920451A (zh) * 2019-03-18 2019-06-21 恒玄科技(上海)有限公司 语音活动检测方法、噪声抑制方法和噪声抑制系统
CN110928583A (zh) * 2019-10-10 2020-03-27 珠海格力电器股份有限公司 一种终端唤醒方法、装置、设备和计算机可读存储介质
CN111169422A (zh) * 2019-10-10 2020-05-19 中国第一汽车股份有限公司 一种车辆控制系统及方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10360910B2 (en) * 2016-08-29 2019-07-23 Garmin Switzerland Gmbh Automatic speech recognition (ASR) utilizing GPS and sensor data
US10488831B2 (en) * 2017-11-21 2019-11-26 Bose Corporation Biopotential wakeup word
US10325596B1 (en) * 2018-05-25 2019-06-18 Bao Tran Voice control of appliances
EP3790006A4 (en) * 2018-06-29 2021-06-09 Huawei Technologies Co., Ltd. VOICE COMMAND PROCESS, PORTABLE DEVICE AND TERMINAL
CN110033870A (zh) * 2019-05-30 2019-07-19 广东工业大学 一种智能医疗系统
CN110265036A (zh) * 2019-06-06 2019-09-20 湖南国声声学科技股份有限公司 语音唤醒方法、系统、电子设备及计算机可读存储介质
CN111210829A (zh) * 2020-02-19 2020-05-29 腾讯科技(深圳)有限公司 语音识别方法、装置、系统、设备和计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160037251A1 (en) * 2012-07-19 2016-02-04 Shavar Daniels Communication system and method
WO2018075170A1 (en) * 2016-10-20 2018-04-26 Qualcomm Incorporated Systems and methods for in-ear control of remote devices
CN108021349A (zh) * 2016-10-28 2018-05-11 中兴通讯股份有限公司 调整耳机工作状态的方法、装置、耳机及智能终端
CN106714023A (zh) * 2016-12-27 2017-05-24 广东小天才科技有限公司 一种基于骨传导耳机的语音唤醒方法、系统及骨传导耳机
CN109920451A (zh) * 2019-03-18 2019-06-21 恒玄科技(上海)有限公司 语音活动检测方法、噪声抑制方法和噪声抑制系统
CN110928583A (zh) * 2019-10-10 2020-03-27 珠海格力电器股份有限公司 一种终端唤醒方法、装置、设备和计算机可读存储介质
CN111169422A (zh) * 2019-10-10 2020-05-19 中国第一汽车股份有限公司 一种车辆控制系统及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4156177A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115881118A (zh) * 2022-11-04 2023-03-31 荣耀终端有限公司 一种语音交互方法及相关电子设备
CN115881118B (zh) * 2022-11-04 2023-12-22 荣耀终端有限公司 一种语音交互方法及相关电子设备

Also Published As

Publication number Publication date
EP4156177A4 (en) 2023-12-06
CN113823288A (zh) 2021-12-21
US20230239800A1 (en) 2023-07-27
EP4156177A1 (en) 2023-03-29

Similar Documents

Publication Publication Date Title
WO2021000876A1 (zh) 一种语音控制方法、电子设备及系统
JP7426470B2 (ja) 音声起動方法及び電子デバイス
WO2020151580A1 (zh) 一种屏幕控制和语音控制方法及电子设备
WO2021254131A1 (zh) 一种语音唤醒的方法、电子设备、可穿戴设备和系统
WO2021036568A1 (zh) 辅助健身的方法和电子装置
CN110070863A (zh) 一种语音控制方法及装置
WO2020207376A1 (zh) 一种去噪方法及电子设备
WO2022033556A1 (zh) 电子设备及其语音识别方法和介质
CN113347560B (zh) 蓝牙连接方法、电子设备及存储介质
CN112334977B (zh) 一种语音识别方法、可穿戴设备及系统
EP4199488A1 (en) Voice interaction method and electronic device
CN114242037A (zh) 一种虚拟人物生成方法及其装置
WO2022199405A1 (zh) 一种语音控制方法和装置
CN112651510A (zh) 模型更新方法、工作节点及模型更新系统
WO2022161077A1 (zh) 语音控制方法和电子设备
EP4070719A1 (en) Electronic device, method for controlling same to perform ppg detection, and medium
CN113742460A (zh) 生成虚拟角色的方法及装置
WO2023006033A1 (zh) 语音交互方法、电子设备及介质
WO2022156438A1 (zh) 一种唤醒方法及电子设备
CN113572798B (zh) 设备控制方法、系统、设备和存储介质
CN115731923A (zh) 命令词响应方法、控制设备及装置
CN114360206B (zh) 一种智能报警方法、耳机、终端和系统
WO2022252858A1 (zh) 一种语音控制方法及电子设备
WO2021238338A1 (zh) 语音合成方法及装置
CN115230634B (zh) 提醒佩戴安全带的方法及可穿戴设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21824995

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021824995

Country of ref document: EP

Effective date: 20221222