WO2022068544A1 - 语音唤醒方法、电子设备及芯片系统 - Google Patents

语音唤醒方法、电子设备及芯片系统 Download PDF

Info

Publication number
WO2022068544A1
WO2022068544A1 PCT/CN2021/117227 CN2021117227W WO2022068544A1 WO 2022068544 A1 WO2022068544 A1 WO 2022068544A1 CN 2021117227 W CN2021117227 W CN 2021117227W WO 2022068544 A1 WO2022068544 A1 WO 2022068544A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
voice
preset
power
user
Prior art date
Application number
PCT/CN2021/117227
Other languages
English (en)
French (fr)
Inventor
王瑞珉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022068544A1 publication Critical patent/WO2022068544A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3293Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions

Definitions

  • the present application relates to the technical field of terminals, and in particular, to a voice wake-up method, an electronic device and a chip system.
  • Voice wake-up is the entry point of voice assistant applications, and its performance indicators and wake-up delay greatly affect user experience.
  • Voice wake-up means that a wake-up word is preset in an electronic device.
  • the voice assistant application wakes up from the sleep state and responds, thereby greatly improving the efficiency of human-computer interaction.
  • dual systems can be used to improve the battery life of the electronic devices.
  • the electronic device can select the corresponding system to work to achieve the effect of reducing power consumption and improving battery life.
  • the smart watch can be driven by the first system and the second system.
  • the smart watch can run the first system.
  • the smart watch can be switched from the first system to the second system to run.
  • the first system when the smart watch is in an off-screen state, the first system is in a powered-on state, and the second system is in a powered-off state.
  • the voice wake-up process of the smart watch is that after the user sends out the voice information for waking up the voice assistant application in the smart watch, the first system performs voice recognition on the voice information. After the first system successfully recognizes the voice information, it restores power to the second system. Afterwards, the second system performs speech recognition on the sound information, and after the second system successfully recognizes the sound information, the second system starts up the voice assistant application to complete the voice wake-up.
  • the present application provides a voice wake-up method, an electronic device and a chip system, which can reduce the voice wake-up delay.
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes a first system, a second system, and a power management system.
  • the first system When the electronic device is in an off-screen state, the first system is in a powered-on state, and the second system is in a power-on state.
  • the power-off state; the first system is used to light up the screen of the electronic device; the first system is also used to detect the preset power-on condition and perform the first voice recognition on the sound information when the sound information is received;
  • the first system is further configured to power on the second system through the power management unit when the preset power-on condition is detected; the second system is configured to power on the second system after the power-on is completed and the first voice recognition result satisfies the first preset. If the conditions are set, start the target application.
  • the first system in the above-mentioned electronic equipment When the first system in the above-mentioned electronic equipment receives the sound information, it can not only perform the first voice recognition on the sound information, but also can detect the preset power-on condition, and when the preset power-on condition is detected, it can be the second system power on. Electricity. In this case, the first speech recognition process and the second system power-on process do not interfere with each other and are independent of each other, so they can be performed simultaneously. Therefore, compared with the related art, the voice wake-up process of the electronic device can shorten or eliminate the delay caused by powering on the second system, so that the voice wake-up delay can be greatly reduced. After the user issues a voice command including the wake-up word, the electronic device can start the target application relatively quickly, reducing the waiting time of the user and improving the user experience.
  • the screen of the electronic device is turned on.
  • the first system may include a sensor control central processor and an acceleration gyroscope unit.
  • the accelerometer unit can be used to detect the posture of the electronic device.
  • the accelerometer unit can send the detected posture of the electronic device to the sensor control center processor.
  • the sensor control center processor sends a screen-on signal to the display screen to request to light up the screen.
  • the acceleration gyroscope unit may send a screen bright signal to the sensor control center processor when the posture of the electronic device is a preset posture.
  • the sensor control center processor requests the display screen to light up the screen according to the screen bright signal.
  • the acceleration gyroscope unit may send a screen-on signal to the display screen when the posture of the electronic device is a preset posture.
  • the display screen lights up the screen in response to the screen-on signal.
  • the screen of the electronic device is illuminated.
  • the first system may further include devices such as proximity light sensors, ultrasonic devices, and the like.
  • the first system can detect whether a user's limb is close to an electronic device by approaching a device such as a light sensor, an ultrasonic device, or the like. If it is detected that the user's limb is close to the electronic device, the proximity light sensor, ultrasonic device and other devices send a bright-screen signal to the sensor control center processor. In response to the screen-on signal, the sensor control center processor requests the display screen to light up the screen. Or, devices such as proximity light sensors, ultrasonic devices, etc. send a bright-screen signal to the display screen. The display responds to the screen-on signal and lights the screen.
  • the screen of the electronic device is turned on.
  • the first system may further include a camera, and the camera is in a working state when the electronic device is in a screen-off state.
  • the first system can detect whether the user is looking at the screen of the electronic device through the camera. If the camera detects that the user is looking at the screen of the electronic device, it can send a screen bright signal to the sensor control center processor. The camera detects that the user is looking at the screen of the electronic device, indicating that the user is likely to use the electronic device. In response to the screen-on signal, the sensor control center processor requests the display screen to light up the screen.
  • the above-mentioned preset power-on conditions include one or more of the following: the sound information includes a human voice; there is a preset user behavior; and the posture of the electronic device satisfies the preset posture condition.
  • the first system can detect whether the posture of the electronic device satisfies the preset posture condition through the acceleration gyroscope unit.
  • the preset posture condition may be: the posture of the electronic device is the preset posture and the holding time is greater than a time threshold.
  • the first system can detect whether the holding time of the preset posture is greater than the time threshold through the acceleration gyroscope unit.
  • the first system detects whether the posture of the electronic device satisfies the preset posture condition through the acceleration gyroscope unit. If the posture of the electronic device satisfies the preset posture condition, the acceleration gyroscope unit may send a power-on signal to the power management unit. The power management unit powers on the second system in response to the power-on signal.
  • the above-mentioned time threshold may be: the user behavior of the user checking the time through the electronic device, and the corresponding duration of the electronic device in the preset posture.
  • the time threshold can be determined according to information obtained in the past.
  • the above-mentioned first time threshold may be determined according to the time length between the first time and the second time obtained in the past.
  • the first time is the time when the electronic device lights up the screen
  • the second time is the time when the first system performs voice activity detection on the acquired sound information and detects human voices after the electronic device lights up the screen.
  • the preset user behavior includes one or more of the following: the user's limb approaches the electronic device; the user gazes at the screen of the electronic device.
  • the first system may detect whether the user's limb is close to the electronic device by approaching a device such as a light sensor or an ultrasonic device. The first system can detect whether the user is looking at the screen of the electronic device through the camera.
  • the second system is specifically configured to: perform second speech recognition on the sound information when the power-on is completed and the first speech recognition result satisfies the first preset condition; When the second speech recognition result satisfies the second preset condition, the target application corresponding to the second speech recognition result is started.
  • the second system is further configured to: generate prompt information for prompting the user to perform voice wake-up again when the second voice recognition result does not meet the second preset condition.
  • the prompt information can prompt the user that voice wake-up fails, and can prompt the user to send out the voice information containing the preset wake-up word again.
  • the first system performs first voice recognition on the voice information sent out by the user again. If the first voice recognition result satisfies the first preset condition, the second system performs second voice recognition on the voice information sent out by the user again. If the second speech recognition result satisfies the second preset condition, the second system starts the target application corresponding to the second speech recognition result.
  • an embodiment of the present application provides a voice wake-up method, which is applied to an electronic device.
  • the electronic device includes a first system, a second system, and a power management unit.
  • the first system is in a powered-on state.
  • the second system is in a power-off state; the method includes: the first system lights up the screen of the electronic device; if the first system receives the sound information, detecting the preset power-on condition, and performing the first voice recognition on the sound information; If the first system detects the preset power-on condition, the second system is powered on through the power management unit; if the second system is powered on and the first voice recognition result satisfies the first preset condition, the second system starts up target application.
  • the first system when the voice information is received, the first system can not only perform the first voice recognition on the voice information, but also can detect the preset power-on condition, and when the preset power-on condition is detected, the second system can be switched on. Electricity. Therefore, compared with the related art, the above voice wake-up method can shorten or eliminate the time delay caused by powering on the second system, thereby greatly reducing the voice wake-up time delay.
  • the electronic device After the user issues a voice command including the wake-up word, the electronic device can start the target application relatively quickly, reducing the waiting time of the user and improving the user experience.
  • the first system may light up the screen of the electronic device when it is detected that the posture of the electronic device is a preset posture.
  • the first system may include a sensor control central processor and an acceleration gyroscope unit.
  • the accelerometer unit can be used to detect the posture of the electronic device.
  • the accelerometer unit can send the detected posture of the electronic device to the sensor control center processor.
  • the sensor control center processor sends a screen-on signal to the display screen to request to light up the screen.
  • the acceleration gyroscope unit may send a screen bright signal to the sensor control center processor when the posture of the electronic device is a preset posture.
  • the sensor control center processor requests the display screen to light up the screen according to the screen bright signal.
  • the acceleration gyroscope unit may send a screen-on signal to the display screen when the posture of the electronic device is a preset posture.
  • the display screen lights up the screen in response to the screen-on signal.
  • the first system may light up the screen of the electronic device when it is detected that the user's limb is approaching the electronic device.
  • the first system may further include devices such as proximity light sensors, ultrasonic devices, and the like.
  • the first system can detect whether a user's limb is close to an electronic device by approaching a device such as a light sensor, an ultrasonic device, or the like. If it is detected that the user's limb is close to the electronic device, the proximity light sensor, ultrasonic device and other devices send a bright-screen signal to the sensor control center processor. In response to the screen-on signal, the sensor control center processor requests the display screen to light up the screen. Or, devices such as proximity light sensors, ultrasonic devices, etc. send a bright-screen signal to the display screen. The display responds to the screen-on signal and lights the screen.
  • the first system may light up the screen of the electronic device when it is detected that the user is looking at the screen of the electronic device.
  • the first system may further include a camera, and the camera is in a working state when the electronic device is in a screen-off state.
  • the first system can detect whether the user is looking at the screen of the electronic device through the camera. If the camera detects that the user is looking at the screen of the electronic device, it can send a screen bright signal to the sensor control center processor. The camera detects that the user is looking at the screen of the electronic device, indicating that the user is likely to use the electronic device. In response to the screen-on signal, the sensor control center processor requests the display screen to light up the screen.
  • the preset power-on condition is a condition in which it is determined that the user is likely to perform voice wake-up.
  • the above preset power-on conditions are one or more of the following: the sound information includes human voice; there is a preset user behavior; the posture of the electronic device satisfies the preset posture condition.
  • the first system can detect whether the posture of the electronic device satisfies the preset posture condition through the acceleration gyroscope unit.
  • the above-mentioned preset posture condition is: the holding time of the posture of the electronic device being the preset posture is greater than a time threshold.
  • the first system may detect whether the holding time of the preset posture is greater than the time threshold through the acceleration gyroscope unit.
  • the above-mentioned preset user behavior includes one or more of the following: the user's limb approaches the electronic device; the user stares at the screen of the electronic device.
  • the first system may detect whether the user's limb is close to the electronic device by approaching a device such as a light sensor or an ultrasonic device. The first system can detect whether the user is looking at the screen of the electronic device through the camera.
  • the second system if the second system is powered on and the first speech recognition result satisfies the first preset condition, the second system starts the target application, including: if the second system is powered on And the first voice recognition result satisfies the first preset condition, then the second system performs second voice recognition on the sound information; if the second voice recognition result satisfies the second preset condition, the second system starts and the second voice recognition result the corresponding target application.
  • the above method further includes: if the second voice recognition result does not meet the second preset condition, the second system generates prompt information for prompting the user to perform voice wake-up again.
  • the prompt information can prompt the user that voice wake-up fails, and can prompt the user to send out the voice information containing the preset wake-up word again.
  • the first system performs first voice recognition on the voice information sent out by the user again. If the first voice recognition result satisfies the first preset condition, the second system performs second voice recognition on the voice information sent out by the user again. If the second speech recognition result satisfies the second preset condition, the second system starts the target application corresponding to the second speech recognition result.
  • an embodiment of the present application provides a voice wake-up device.
  • the voice wake-up device is applied to an electronic device, and the electronic device includes a first system, a second system, and a power management unit.
  • the first system is in a powered-on state
  • the second system is in a powered-off state.
  • the voice wake-up device includes: a screen lighting unit for lighting the screen of the electronic device.
  • the detection and recognition unit is configured to detect the preset power-on condition when the first system receives the sound information, and perform first voice recognition on the sound information.
  • the power-on unit is configured to power on the second system through the power management unit when the first system detects a preset power-on condition.
  • the application startup unit is configured to start the target application when the second system is powered on and the first speech recognition result satisfies the first preset condition.
  • embodiments of the present application provide an electronic device, including: one or more processors, a memory, and a display screen; the memory and the display screen are coupled to the one or more processors, and the memory for storing computer program code, the computer program code comprising computer instructions; when executed by the one or more processors, the computer instructions cause the electronic device to perform the method according to any one of the first aspects .
  • an embodiment of the present application provides a chip system, where the chip system includes a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory to implement the first aspect The method of any of the above.
  • the chip system may be a single chip, or a chip module composed of multiple chips.
  • an embodiment of the present application provides a chip system, where the chip system includes a memory and a processor, and the processor executes a computer program stored in the memory to implement any one of the first aspects.
  • the chip system may be a single chip, or a chip module composed of multiple chips.
  • an embodiment of the present application provides a computer program product that, when the computer program product runs on a terminal device, enables an electronic device to execute the method described in any one of the foregoing first aspects.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the method according to any one of the first aspects .
  • the voice wake-up device according to the third aspect, the electronic device according to the fourth aspect, the chip system according to the fifth and sixth aspects, the computer program product according to the seventh aspect, the The computer-readable storage medium described in the eighth aspect is all used to execute the method provided in the second aspect. Therefore, for the beneficial effects that can be achieved, reference may be made to the beneficial effects in the corresponding method, which will not be repeated here.
  • FIG. 1 is a schematic diagram of a voice wake-up process provided by the related art
  • FIG. 2 is a schematic diagram of the time delay of the voice wake-up process provided by the embodiment of FIG. 1;
  • FIG. 3 is a schematic diagram of a voice wake-up process provided by the related art
  • FIG. 4 is a schematic diagram of the time delay of the voice wake-up process provided by the embodiment of FIG. 3;
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a scenario in which the smart watch lifts the wrist to brighten the screen according to an embodiment of the present application
  • FIG. 7 is a schematic diagram of a time delay of a voice wake-up process provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 14 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • FIG. 15 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • 16 is a schematic structural diagram of a voice wake-up device provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting” .
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • the steps involved in the voice wake-up method provided in the embodiments of the present application are only examples, and not all steps are required to be performed, or not all information or contents of messages are required to be selected. It can be increased or decreased as needed.
  • Voice wake-up means that a wake-up word is preset in an electronic device.
  • the voice assistant application wakes up from the sleep state and responds, thereby greatly improving the efficiency of human-computer interaction.
  • Voice wake-up is the entry point of voice assistant applications, and its performance indicators and wake-up delay greatly affect user experience.
  • the best value for the user's experience of the voice wake-up delay is generally 650ms (milliseconds).
  • Wake-on-voice may be performed on an electronic device with dual systems (which may be referred to as a first system and a second system).
  • a first system For light-weight application scenarios (such as ordinary dial display, local music playback, etc.), the electronic device can run the first system.
  • heavy-duty application scenarios such as 3D dials, WeChat three-party applications, etc.
  • the electronic device can be switched from the first system to the second system to run.
  • the following describes the voice wake-up process in the related art by taking the electronic device as a smart watch, the first system as a Sensor Hub system, and the second system as an Application Processor (AP) system as an example.
  • AP Application Processor
  • the power-off state may be the power-off of the CPU core of the AP system, that is, the Suspend to RAM (STR) state.
  • the power-off state may be a complete power-off of the AP system, that is, a Fast Suspend Resume (FSR) state.
  • FSR Fast Suspend Resume
  • FIG. 1 is a schematic diagram of a voice wake-up process provided by the related art.
  • the Sound information may include: human voice information (for example: Hello Xiaoyi) issued by the user to wake up the voice assistant application in the smart watch.
  • the Sensor Hub system performs Voice Activity Detection (VAD) on the sound information. If it is detected that the sound information contains human voices, the first voice recognition is performed on the sound information. After the first voice recognition result satisfies the preset condition (that is, the first voice recognition is successful), the Sensor Hub system sends an instruction to power on the second system to the power management unit.
  • the power management unit responds to the command and restores power to the AP system.
  • powering on the AP system from the STR state is powering on the CPU core of the AP system
  • powering on the AP system from the FSR state is powering on the System on Chip (SOC) of the AP system.
  • SOC System on Chip
  • both the first speech recognition and the second speech recognition are used for recognizing whether the sound information contains a preset wake-up word. If the wake-up word is included, the corresponding voice recognition is successful, and the next voice recognition is entered or the voice assistant application is started.
  • both the first speech recognition and the second speech recognition may be implemented by a speech recognition model and/or a related speech recognition algorithm.
  • the speech recognition model or speech recognition algorithm of the second speech recognition is generally a larger and more accurate model or algorithm than the first speech recognition. For example, in the case where the sound information contains a pseudo wake-up word similar to the preset wake-up word, the first voice recognition is likely to recognize that the sound information contains the preset wake-up word, while the second voice recognition can recognize that the sound information does not contain the preset wake-up word. Contains threshold wake words. In this way, compared with the first speech recognition, the second voice recognition can more accurately identify whether the voice information contains wake-up words, thereby reducing the false wake-up rate.
  • FIG. 2 is a schematic diagram of time delay analysis of a voice wake-up process provided by the embodiment of FIG. 1 .
  • the time when the user sends out the sound information for voice wake-up is T0 to T2
  • the Sensor Hub system continues to acquire the sound information within the time T0 to T2.
  • the Sensor Hub system starts the above-mentioned VAD processing from time T0, and detects that the voice information contains human voice at time T1, that is, the Sensor Hub system completes the VAD processing within the time from T0 to T1. After the Sensor Hub system completes the VAD processing, the Sensor Hub system starts to perform the first speech recognition on the acquired sound information.
  • the Sensor Hub system completes the first speech recognition of the sound information within the time of T2 to T3.
  • the power management unit powers on the AP system within the time period of T3 to T4.
  • the power-on process of the CPU core takes about 200ms
  • the power-on process of the SOC chip takes about 1000ms.
  • the AP system After the AP system is powered on, the AP system performs second voice recognition on the voice information within a time period of T4 to T5. After the second voice recognition is successful, the AP system starts the voice assistant application within T5 to T6 to complete the voice wake-up.
  • the related art provides another voice wake-up method.
  • the voice wake-up process in this embodiment removes the second voice recognition.
  • the AP system After the AP system is powered on at time T4, the AP system directly starts the voice Assistant application (corresponding time is T4 ⁇ T7).
  • the voice wake-up process in this embodiment can save the time T4-T5 required for the second voice recognition, and reduce the voice wake-up delay.
  • omitting the second voice recognition will lead to an increase in the false wake-up rate of voice wake-up, which will reduce the wake-up performance of the electronic device and affect the user experience.
  • the electronic device includes a first system, a second system and a power management unit.
  • the first system When the electronic device is in an off-screen state, the first system is in a powered-on state, and the second system is in a powered-off state.
  • the first system After lighting the screen of the electronic device, the first system detects a preset power-on condition if it receives sound information, and performs first voice recognition on the sound information. If the first system detects the preset power-on condition, the second system is powered on. If the second system is powered on and the first voice recognition result satisfies the preset condition, the second system starts the target application.
  • the first system can detect the preset power-on condition and perform the first voice recognition on the sound information at the same time, and when the preset power-on condition is detected, the second system can be switched on. Electricity.
  • the first speech recognition process and the second system power-on process do not interfere with each other and are independent of each other, so they can be performed simultaneously. Therefore, compared with the related art, the electronic device and the voice wake-up method can shorten or eliminate the voice wake-up delay corresponding to the above T3-T4.
  • the electronic devices involved in the embodiments of the present application may include, but are not limited to, smart watches, mobile phones, personal digital assistants (Personal Digital Assistants, PDAs), tablet computers, portable devices (for example, portable computers), personal computers (personal computers, PCs) ) and the like may have dual-system devices, which are not limited in this embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
  • the electronic device 100 may include a first system 110 , a second system 120 and a power management unit 140 .
  • the first system 110 is in a powered-on state
  • the second system 120 is in a powered-off state.
  • the first system 110 is used to light up the screen of the electronic device 100 .
  • the first system 110 detects that the posture of the electronic device 100 is a preset posture, the screen of the electronic device 100 is turned on.
  • the electronic device 100 further includes a display screen 130
  • the first system 110 may include a sensor control center processor 111 and an acceleration gyroscope unit 112 .
  • the accelerometer unit 112 may be used to detect the posture of the electronic device 100 .
  • the sensor control center processor 111 and the acceleration gyroscope unit 112 may be integrated on one chip or device.
  • the acceleration gyro unit 112 may send the detected posture of the electronic device 100 to the sensor control center processor 111 .
  • the sensor control center processor 111 sends a screen-on signal to the display screen 130 to request to light up the screen.
  • the acceleration gyroscope unit 112 may send a screen bright signal to the sensor control center processor 111 when the posture of the electronic device 100 is a preset posture.
  • the sensor control center processor 111 requests the display screen 130 to light up the screen according to the screen bright signal.
  • the acceleration gyroscope unit 112 may send a screen-on signal to the display screen 130 when the posture of the electronic device 100 is a preset posture.
  • the display screen 130 lights up the screen in response to the screen-on signal.
  • the acceleration gyroscope unit 112 may be a chip integrating the functions of the acceleration Acc and the gyroscope Gyro.
  • the chip may include an accelerometer sensor, a gyroscope sensor, and a processor.
  • the acceleration sensor and the gyro sensor are used to monitor the posture of the electronic device 100 and send the monitoring data to the processor.
  • the processor is configured to send the above-mentioned bright screen signal to the sensor control center processor 111 or the display screen 130 when the posture of the electronic device 100 is a preset posture.
  • the smart watch is in the screen-off state.
  • the Sensor Hub processor lights up the screen in response to the above-mentioned bright-screen signal to display the current time.
  • the electronic device 100 is a mobile phone or a tablet computer
  • it can be set that when the mobile phone or tablet computer is placed horizontally, the screen of the mobile phone or tablet computer is in an off-screen state, and the acceleration gyroscope unit 112 detects that the mobile phone or tablet computer is tilted or placed vertically , the phone or tablet lights up the screen.
  • the above-mentioned preset posture can be set according to experience.
  • the above-mentioned preset posture may be determined according to the posture of the electronic device 100 when the user uses the electronic device 100 .
  • the above-mentioned preset posture may be that the electronic device 100 is in a horizontal posture, or may be that the electronic device 100 is in an inclined posture, which is not exclusively limited in this embodiment of the present application.
  • the screen of the electronic device 100 is turned on.
  • the first system 110 may further include devices such as a proximity light sensor, an ultrasonic device, and the like.
  • the first system 110 can detect whether the user's limb is close to the electronic device 100 by using a device such as a proximity optical sensor, an ultrasonic device, or the like. If it is detected that the user's limb is close to the electronic device 100 , a device such as a proximity light sensor, an ultrasonic device, etc., sends a bright-screen signal to the sensor control center processor 111 . In response to the screen-on signal, the sensor control center processor 111 requests the display screen 130 to light up the screen.
  • a screen-on signal may also be sent to the display screen 130 .
  • the display screen 130 lights up the screen in response to the screen-on signal.
  • the first system 110 detects that the user is looking at the screen of the electronic device 100 , the screen of the electronic device 100 is turned on.
  • the first system 110 may further include a camera, and the camera is in a working state when the electronic device 100 is in a screen-off state.
  • the first system 110 may detect whether the user is looking at the screen of the electronic device 100 through the camera. If the camera detects that the user is looking at the screen of the electronic device 100 , it can send a screen-on signal to the sensor control center processor 111 .
  • the camera detects that the user is looking at the screen of the electronic device 100 , indicating that the user is likely to use the electronic device 100 .
  • the sensor control center processor 111 requests the display screen 130 to light up the screen.
  • the camera detects that the user is looking at the screen of the electronic device 100 , and can also send a screen-on signal to the display screen 130 .
  • the display screen 130 lights up the screen in response to the screen-on signal.
  • the first system 110 After the first system 110 lights the screen of the electronic device 100, if the sound information is received, the first system 110 is further configured to detect the preset power-on condition in the case of receiving the sound information, and perform a first operation on the sound information. Speech Recognition.
  • the preset power-on condition is a condition that determines that the user is likely to perform voice wake-up.
  • the above-mentioned preset power-on conditions may include one or more of the following: the sound information includes human voices; there is a preset user behavior; and the posture of the electronic device 100 satisfies the preset posture conditions.
  • the preset posture condition may be set based on experience, indicating that the user may need to use the electronic device 100 .
  • the preset posture condition may be: the posture of the electronic device 100 is the preset posture for a holding time greater than a time threshold.
  • the first system 110 may use the acceleration gyro unit 112 to detect whether the holding time of the preset posture is greater than a time threshold.
  • the above-mentioned time threshold may be: the user behavior of checking the time through the electronic device 100, and the corresponding duration of the electronic device 100 in the preset posture.
  • the time threshold can be determined according to information obtained in the past.
  • the above-mentioned first time threshold may be determined according to the time length between the first time and the second time obtained in the past.
  • the first time is the time when the electronic device 100 lights the screen
  • the second time is the time when the first system performs voice activity detection on the acquired sound information to detect a human voice after the electronic device 100 lights the screen.
  • the user behavior can be divided into two types: checking the time and Voice wake up.
  • the user behavior of checking the time is a high-frequency scenario.
  • the user's wrist posture will remain in a fixed posture for a period of time (for example, T hold1 ) to check the time.
  • the user behavior of voice wake-up is a low-frequency scenario.
  • the user's wrist posture After raising the wrist to turn on the screen, the user's wrist posture will remain in a fixed posture for a period of time (for example, T hold2 , and T hold2 > T hold1 ), and then the user speaks the wake word to start the voice assistant application .
  • a period of time for example, T hold2 , and T hold2 > T hold1
  • the user speaks the wake word to start the voice assistant application For example, if the electronic device 100 lights up the screen at time T00, and detects a human voice at time T0, the duration corresponding to T00 to T0 may be the above-mentioned T hold2 .
  • the electronic device 100 may record a plurality of T hold2 , and determine the above-mentioned first time threshold according to the plurality of T hold2 .
  • the preset user behavior may be set according to experience, indicating that the user may want to use the electronic device 100 .
  • the preset user behavior may include one or more of the following: the user's limb approaches the electronic device 100 ; the user stares at the screen of the electronic device 100 .
  • the first system 110 can detect whether the user's limb is close to the electronic device 100 through a device such as a proximity optical sensor, an ultrasonic device, or the like.
  • the first system 110 may detect whether the user is looking at the screen of the electronic device 100 through the camera.
  • the sensor control center processor 111 may include one or more processing units.
  • the sensor control center processor 111 may include a control unit, a digital signal processor (DSP), a voice activity detection unit, and the like.
  • DSP digital signal processor
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the first system 110 may collect sound information through the digital microphone of the electronic device 100, and detect whether the sound information contains human voice through the voice activity detection unit. If the sound information includes a human voice, it means that the user may want to wake up the electronic device 100 through a voice command.
  • the sensor control center processor 111 may perform first voice recognition on the sound information through the DSP unit. If the sound information does not contain human voice, the voice activity detection unit continues to detect the sound information collected by the digital microphone.
  • the voice activity detection unit detects whether the voice information contains human voice
  • the voice information may not contain a complete wake-up word.
  • the first system 110 may perform the first speech recognition on the currently collected sound information, but the first speech recognition result usually cannot satisfy the preset condition, that is, the first speech recognition is unsuccessful.
  • the digital microphone collects the sound information including the complete wake-up word, and the wake-up word is consistent with the preset wake-up word
  • the first voice recognition result of the sound information by the first system 110 may satisfy the preset condition, that is, the first A speech recognition succeeded.
  • the first system 110 may perform the first speech recognition on the sound information when the sound information includes a human voice.
  • the first system 110 may detect information such as frequency, energy, phase, and amplitude of the sound information through a related algorithm or model, and determine whether the sound information contains human voices.
  • the first system 110 may not perform the first speech recognition on the sound information, but continue to acquire the sound information, and then perform the first voice information on the sound information until the sound information contains a human voice. a voice recognition. Therefore, the power consumption of the first system 110 can be reduced, and the battery life of the electronic device can be increased.
  • the first system 110 is further configured to power on the second system 120 through the power management unit 140 when a preset power-on condition is detected.
  • the first system 110 may send a power-on signal to the power management unit 140 when a preset power-on condition is detected.
  • the power management unit 140 powers on the second system 120 in response to the power-on signal.
  • the second system 120 is configured to start the target application when the power-on is completed and the first speech recognition result satisfies the first preset condition.
  • the target application may be an application corresponding to the first speech recognition result.
  • the first preset condition may be: the first voice recognition result is that the voice information includes a preset wake-up word.
  • the second system 120 can start the target application corresponding to the first speech recognition result, that is, the second system 120 starts the target application corresponding to the preset wake-up word. For example, if the first voice recognition result is that the voice information contains the preset wake-up word "Hello Xiaoyi", the second system 120 starts the voice assistant application corresponding to the preset wake-up word "Hello Xiaoyi”.
  • the target application when the second system 120 needs to perform second speech recognition on the sound information, the target application may be an application corresponding to the second speech recognition result.
  • the above-mentioned second system 120 is specifically configured to: perform second voice recognition on the sound information when the power-on is completed and the first voice recognition result satisfies the first preset condition; if the second voice recognition result satisfies the second preset condition If the condition is set, the target application corresponding to the second speech recognition result is started.
  • the second preset condition may be: the second voice recognition result is that the voice information includes a preset wake-up word.
  • the second system 120 may start the target application corresponding to the second speech recognition result, that is, the second system 120 starts the target application corresponding to the preset wake-up word. For example, if the second voice recognition result is that the voice information contains the preset wake-up word "Hello Xiaoyi", the second system 120 starts the voice assistant application corresponding to the preset wake-up word "Hello Xiaoyi”.
  • first preset condition and the second preset condition may be the same or different, which are not limited in this embodiment of the present application.
  • both the first preset condition and the second preset condition may be that the sound information contains a preset wake-up word, and in judging the similarity between the wake-up word extracted from the sound information and the preset wake-up word, there may be different.
  • the second system 120 may include an application processor 121 .
  • the application processor 121 may include one or more processing units.
  • the application processor 121 may include a central processing unit (Central Processing Unit, CPU), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), memory, Video codec, DSP, baseband processor, and/or neural-network processing unit (NPU), etc.
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the application processor 121 when the second system 120 is powered on and the first speech recognition result satisfies the first preset condition, the application processor 121 performs second speech recognition on the sound information through the DSP unit. If the second speech recognition result satisfies the second preset condition, the application processor 121 starts the target application corresponding to the second speech recognition result.
  • the above-mentioned second system 120 is further configured to: if the second voice recognition result does not meet the second preset condition, generate prompt information for prompting the user to perform voice wake-up again.
  • the first speech recognition result meeting the first preset condition is called the first speech recognition success
  • the first speech recognition result not meeting the first preset condition is called the first speech recognition failure
  • the second speech If the recognition result satisfies the second preset condition, it is called the second speech recognition success, and if the second speech recognition result does not satisfy the second preset condition, it is called the second speech recognition failure.
  • the prompt information can prompt the user that voice wake-up fails, and can prompt the user to send out the voice information containing the preset wake-up word again.
  • the first system 110 performs a first voice recognition on the voice information sent out by the user again. If the first voice recognition is successful, the second system 120 performs second voice recognition on the voice information re-sent by the user. If the second voice recognition is successful, the second system 120 starts the target application corresponding to the second voice recognition result.
  • FIG. 7 is a schematic diagram of time delay analysis of the voice wake-up process according to the embodiment of FIG. 5 .
  • the time when the user sends out the sound information including the complete preset wake-up word is T0-T2.
  • the first system 110 performs VAD processing during the period of T0 to T1 to detect whether the voice information includes human voice. If the sound information includes a human voice, the first system 110 starts to perform the first speech recognition on the sound information at time T1. Since the complete wake-up word "Hello Xiaoyi" is not included in the sound information at this time, the first speech recognition fails.
  • the first system 110 completes the first voice recognition of the sound information within the time T2 to T3 until the sound information including the complete wake-up word "Hello Xiaoyi" is acquired at time T2.
  • the power management unit 140 powers on the second system 120 within a time period of T3' ⁇ T4'.
  • the second system 120 completes the second voice recognition of the voice information within the time period of T4' to T5'.
  • the second system 120 starts the target application (eg, a voice assistant application) within a time period of T5' to T6'.
  • T3' can be any time between T1 and T3.
  • the position of T3' between T2 and T3 in FIG. 7 is only illustrative, and in other embodiments, T3' may also be positioned between T1 and T2.
  • the voice wake-up process shown in FIG. 7 can reduce the time delay generated when the second system 120 is powered on, that is, the time duration corresponding to T3 to T4 shown in FIG. 2 can be reduced delay caused. If the time T4' is the time T3 or before the time T3, the voice wake-up process shown in FIG. 7 can completely eliminate the delay caused by powering on the second system 120, that is, the wake-up delay can be reduced as shown in FIG. 2 . If the time T4' is after the time T3, the voice wake-up process shown in FIG. 7 can reduce part of the delay caused by powering on the second system 120, that is, the wake-up delay can be reduced. The duration corresponding to T3' to T3 shown in 7.
  • the parts T4' to T5' in FIG. 7 may be removed.
  • the first system 110 in the above-mentioned electronic device 100 receives the sound information, it can not only perform the first voice recognition on the sound information, but also can detect the preset power-on condition, and when the preset power-on condition is detected, it can be the second system.
  • System 120 is powered up.
  • the voice wake-up process of the electronic device 100 can shorten or eliminate the time delay caused by powering on the second system 120, that is, shorten or eliminate the time duration corresponding to T3-T4 shown in FIG. Greatly reduces the voice wake-up delay.
  • the electronic device 100 can start the target application relatively quickly, which reduces the waiting time of the user and improves the user experience.
  • the second system 120 may further include an internal memory 122 and an external memory interface 123 .
  • the application processor 121, the internal memory 122, and the external memory interface 123 may be integrated on one chip or device.
  • the application processor 121, the internal memory 122, and the external memory interface 123 can be integrated on the SOC chip.
  • the application processor 121 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a secure digital input and output (Secure Digital Input and Output, SDIO) interface, serial external Device (Serial Peripheral Interface, SPI) interface, mobile industry processor interface (Mobile Industry Processor Interface, MIPI), subscriber identity module (subscriber identity module, SIM) interface, and/or universal serial bus (universal serial bus, USB) interface, etc.
  • the application processor 121 can be coupled with the power management unit 140 through the SDIO interface, and can be coupled with the display screen 130 through the MIPI interface and the I2C interface.
  • the sensor control center processor 111 may also include one or more interfaces.
  • the interface may include an I2C interface, an I2S interface, an SPI interface, an SDIO interface, and/or a MIPI interface, among others.
  • the sensor control center processor 111 can be coupled with the application processor 121 through the I2S interface and the SDIO interface, with the display screen 130 through the MIPI interface and the I2C interface, and with the acceleration gyroscope unit 112 through the SPI interface.
  • the power management unit 140 is connected to the battery of the electronic device 100 , as well as the sensor control center processor 111 and the application processor 121 .
  • the power management unit 140 receives the input of the battery, and supplies power to the sensor control center processor 111 , the application processor 121 , the internal memory 122 , and the display screen 130 .
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the electronic device 100 may further include a universal serial bus (USB) interface, a battery, a mobile communication unit, an audio unit, a speaker, a receiver, a microphone, a key, a camera, and a subscriber identification module (SIM) ) card interface, pressure sensor, air pressure sensor, magnetic sensor, distance sensor, proximity light sensor, fingerprint sensor, temperature sensor, touch sensor, ambient light sensor, bone conduction sensor, etc.
  • USB universal serial bus
  • SIM subscriber identification module
  • FIG. 8 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application.
  • the above voice wake-up method is applied to an electronic device, and the electronic device includes a first system, a second system and a power management unit.
  • the first system is in a powered-on state
  • the second system is in a powered-off state.
  • the above voice wake-up method includes steps 101 to 104 .
  • Step 101 the first system lights the screen of the electronic device.
  • the first system can monitor the posture of the electronic device, and can perform voice activity detection and voice recognition on the received sound information.
  • the first system can acquire the posture of the electronic device through the acceleration gyroscope unit, can perform VAD detection on the received sound information, and can perform the first voice recognition on the sound information through the DSP unit.
  • the first system may light up the screen of the electronic device when it is detected that the posture of the electronic device is a preset posture.
  • the first system may light up the screen of the electronic device when it is detected that the user is looking at the screen of the electronic device.
  • the first system may light up the screen of the electronic device when it is detected that the user's limb is approaching the electronic device.
  • Step 102 If the first system receives the sound information, it detects a preset power-on condition, and performs first voice recognition on the sound information.
  • the preset power-on condition is a condition that determines that the user is likely to perform voice wake-up.
  • the above-mentioned preset power-on conditions may include one or more of the following: the sound information includes human voices; there is a preset user behavior; and the posture of the electronic device satisfies the preset posture conditions.
  • the preset posture condition may be set based on experience, indicating that the user may need to use the electronic device.
  • the preset posture condition may be: the posture of the electronic device is the preset posture for which the holding time is greater than the time threshold.
  • the preset user behavior may be set according to experience, indicating that the user may need to use the electronic device.
  • the preset user behavior may include one or more of the following: the user's limbs approach the electronic device; the user gazes at the screen of the electronic device.
  • step 102 may be: if the first system receives the sound information, detect a preset power-on condition, and perform first speech recognition on the sound information when the sound information includes a human voice.
  • the first system may detect information such as frequency, energy, phase, and amplitude of the sound information through a related algorithm or model, and determine whether the sound information contains human voices.
  • the first system may not perform the first speech recognition on the sound information, but continue to acquire the sound information, and then perform the first voice information on the sound information until the sound information contains human voices. Speech Recognition. Therefore, the power consumption of the first system can be reduced, and the battery life of the electronic device can be increased.
  • Step 103 If the first system detects the preset power-on condition, power on the second system through the power management unit.
  • Step 104 If the second system is powered on and the first voice recognition result satisfies the first preset condition, the second system starts the target application.
  • the target application may be an application corresponding to the first speech recognition result.
  • the first preset condition may be: the first voice recognition result is that the voice information includes a preset wake-up word.
  • the second system can start the target application corresponding to the first speech recognition result, that is, the second system starts the target application corresponding to the preset wake-up word. For example, if the first voice recognition result is that the preset wake-up word "Hello Xiaoyi" is included in the sound information, the second system starts the voice assistant application corresponding to the preset wake-up word "Hello Xiaoyi".
  • the target application when the second system needs to perform second speech recognition on the sound information, the target application may be an application corresponding to the second speech recognition result.
  • the second system if the second system is powered on and the first voice recognition result satisfies the first preset condition, the second system performs second voice recognition on the sound information; if the second voice recognition result satisfies the second preset condition, the first voice recognition The second system starts the target application corresponding to the second speech recognition result.
  • the second preset condition may be: the second voice recognition result is that the voice information includes a preset wake-up word.
  • the second system may start the target application corresponding to the second speech recognition result, that is, the second system starts the target application corresponding to the preset wake-up word. For example, if the second voice recognition result is that the preset wake-up word "Hello Xiaoyi" is included in the sound information, the second system starts the voice assistant application corresponding to the preset wake-up word "Hello Xiaoyi".
  • first preset condition and the second preset condition may be the same or different, which are not limited in this embodiment of the present application.
  • both the first preset condition and the second preset condition may be that the sound information contains a preset wake-up word, and in judging the similarity between the wake-up word extracted from the sound information and the preset wake-up word, there may be different.
  • prompt information for prompting the user to perform voice wake-up again is generated.
  • the first speech recognition result meeting the first preset condition is called the first speech recognition success
  • the first speech recognition result not meeting the first preset condition is called the first speech recognition failure
  • the second speech If the recognition result satisfies the second preset condition, it is called the second speech recognition success, and if the second speech recognition result does not satisfy the second preset condition, it is called the second speech recognition failure.
  • the prompt information can prompt the user that voice wake-up fails, and can make the user send out voice information containing preset wake-up words again.
  • the first system performs first speech recognition on the sound information. If the first voice recognition is successful, the second system performs second voice recognition on the voice information. If the second voice recognition is successful, the second system starts the target application.
  • the first system when the voice information is received, the first system can not only perform the first voice recognition on the voice information, but also can detect the preset power-on condition, and when the preset power-on condition is detected, the second system can be switched on. Electricity. Therefore, compared with the related art, the above voice wake-up method can shorten or eliminate the time delay caused by powering on the second system, that is, shorten or eliminate the time duration corresponding to T3 to T4 shown in FIG. 2 , thereby greatly reducing the voice wake-up. time delay. After the user issues a voice command including the wake-up word, the electronic device can start the target application relatively quickly, reducing the waiting time of the user and improving the user experience.
  • the preset power-on condition is: the sound information includes human voice, and the posture of the electronic device is the preset posture for a holding time greater than a time threshold. An embodiment corresponding to the preset power-on condition will be described below with reference to FIG. 9 .
  • FIG. 9 is a schematic flowchart of a voice wake-up method provided by an embodiment of the present application. Referring to Figure 9, the method may include:
  • step 201 please refer to step 101, which will not be repeated here.
  • Step 202 If the first system receives the sound information, it performs voice activity detection on the sound information.
  • the first system may perform voice activity detection on the sound information through the voice activity detection unit.
  • the electronic device collects sound information through a digital microphone and sends it to the voice activity detection unit, and the voice activity detection unit performs voice activity detection on the sound.
  • Step 203 if the sound information includes a human voice, the first system starts the voice processing unit, and detects the holding time of the electronic device's posture as the preset posture after the screen is turned on.
  • the voice activity detection unit may send an activation signal to the voice processing unit of the first system, where the activation signal is used to activate the voice processing unit to prepare for the first voice recognition on the voice information.
  • the human voice may be a voice command for starting the voice assistant application, or may be a voice communication between users.
  • the sound information contains human voice, it means that the user may want to wake up the electronic device through a voice command. Further, after the screen is turned on, the holding time of the posture of the electronic device being the preset posture is greater than the time threshold, indicating that the user is likely to need to use an application of the electronic device. Therefore, according to whether the voice information contains human voice and whether the holding time of the preset posture is greater than the time threshold, comprehensively predicting the power-on time of the second system can increase the accuracy of the prediction.
  • Step 204 The first system performs first speech recognition on the sound information through the speech processing unit.
  • the speech processing unit may perform first speech recognition on the sound information through a related speech recognition model or speech recognition algorithm, and the specific manner and process of the first speech recognition are not limited in this embodiment of the present application.
  • step 205 may be performed after step 204; if the first speech recognition is unsuccessful, step 204 may be performed again.
  • Step 205 If the holding time is greater than the time threshold, the first system powers up the second system through the power management unit.
  • the CPU core of the second system or the SOC chip of the second system may be powered on within the time T1 to T3 .
  • the sound information contains human voice
  • time T1 it is detected whether the holding time of the above-mentioned preset posture is greater than the time threshold. If it is detected that the holding time of the preset posture is greater than the time threshold, the first system can power on the CPU core of the second system or the SOC chip of the second system through the power management unit.
  • the second system may not be powered on first, and After the first voice recognition is successful, the first system powers on the CPU core of the second system or powers on the SOC chip of the second system through the power management unit.
  • the holding time of the above-mentioned preset posture may be detected by an acceleration gyroscope unit.
  • the holding time may specifically be: the time from when the screen is turned on to the time from when it is detected that the sound information contains a human voice. If the hold time is greater than the time threshold, the acceleration gyroscope unit may send a power-on signal to the power management unit. In response to the power-on signal, the power management unit powers on the CPU core of the second system or powers on the SOC chip of the second system. If the holding time is less than or equal to the time threshold, the acceleration gyroscope unit does not send the power-on signal to the power management unit. Instead, the acceleration gyroscope unit sends the power-on signal to the power management unit after the first voice recognition is successful.
  • Step 206 The second system performs second speech recognition on the sound information.
  • the second system may perform second speech recognition on the sound information by using a related speech recognition model or speech recognition algorithm, and the specific manner and process of the second speech recognition are not limited in this embodiment of the present application.
  • the speech recognition model or speech recognition algorithm of the second speech recognition is generally larger and more accurate.
  • the first voice recognition is likely to recognize that the sound information contains the preset wake-up word
  • the second voice recognition can recognize that the sound information does not contain the preset wake-up word. Contains threshold wake words. Therefore, the second voice recognition can more accurately identify whether the voice information contains the wake-up word, thereby reducing the false wake-up rate.
  • step 207 is executed. If the second voice recognition is unsuccessful, prompt information for prompting the user to perform voice wake-up again is generated. After the user sends out the sound information containing the preset wake-up word again, the first system performs the first speech recognition on the sound information sent out again by the user. If the first voice recognition is successful, the second system 120 performs second voice recognition on the voice information re-sent by the user. If the second voice recognition is successful, the second system 120 starts the voice assistant application.
  • Step 207 After the second voice recognition is successful, the second system starts the voice assistant application.
  • the duration of the user speaking the voice command containing the wake-up word is about 800ms
  • the duration corresponding to T0 ⁇ T1 is about 7ms
  • the duration corresponding to T1 ⁇ T2 is about 793ms
  • the duration corresponding to T2 ⁇ T3 is about 100ms, which are CPU cores.
  • the time required to power on is about 200ms
  • the time required to power on the SOC chip is about 1000ms.
  • the voice wake-up delay in the embodiment of FIG. 2 is the duration corresponding to T2-T6, and the wake-up delay in this embodiment is the duration corresponding to T2-T3 plus the duration corresponding to T4-T6. Therefore, compared with the voice wake-up delay in the embodiment of FIG. 2 , the voice wake-up delay in this embodiment is reduced by the duration corresponding to T3 to T4 (ie, 200 ms).
  • the preset power-on conditions are: the sound information contains human voice, and the user's body is close to the electronic device. An embodiment corresponding to the preset power-on condition will be described below with reference to FIG. 10 .
  • FIG. 10 is a schematic flowchart of another voice wake-up method provided by an embodiment of the present application. Referring to Figure 10, the method may include:
  • steps 301 to 302 refer to steps 201 to 202, which will not be repeated here.
  • Step 303 if the sound information includes a human voice, the first system activates the voice signal processing unit of the first system, and detects whether the user's limb is close to the electronic device.
  • step 203 Based on the relevant content in step 203, it can be known that if the sound information includes human voice, it means that the user may want to wake up the electronic device through a voice command. Further, the user's limb is close to the electronic device, indicating that the user is likely to need to use a certain application of the electronic device. Therefore, according to whether the sound information includes human voice and whether the user's limb is close to the electronic device, comprehensively predicting the power-on time of the second system can increase the accuracy of the prediction.
  • whether there is a user's limb approaching the electronic device can be detected by a proximity optical sensor or an ultrasonic device.
  • a smart watch can detect whether a user's arm not wearing the smart watch is approaching the electronic device through a proximity light sensor or ultrasonic device.
  • Step 304 refer to step 204, which is not repeated here.
  • Step 305 If it is detected that the user's limb is close to the electronic device, the first system powers on the second system through the power management unit.
  • the CPU core of the second system or the SOC chip of the second system may be powered on within the time range of T1 to T3. For example, at time T0 to T1, it is detected that the sound information contains human voice, and at time T1, it is detected whether the user's limb is close to the electronic device. If it is detected that the user's limb is close to the electronic device, the first system can power on the CPU core of the second system or power on the SOC chip of the second system through the power management unit.
  • the second system may not be powered on first, but wait until the above-mentioned After the first voice recognition is successful, the first system then powers on the CPU core of the second system or powers on the SOC chip of the second system through the power management unit.
  • a power-on signal may be sent to the power management unit.
  • the power management unit powers on the CPU core of the second system or powers on the SOC chip of the second system.
  • Steps 306 to 307 refer to steps 206 to 207, which are not repeated here.
  • this embodiment does not detect whether the holding time of the above-mentioned preset posture is greater than the time threshold, but detects whether the user's limb is close to the electronic device.
  • the time spent in the above two detections is almost the same, so the voice wake-up delay in this embodiment is basically the same as the voice wake-up delay in Embodiment 1, and details are not repeated here.
  • the preset power-on conditions are: the sound information includes human voice, and the user is looking at the screen of the electronic device. An embodiment corresponding to the preset power-on condition will be described below with reference to FIG. 11 .
  • FIG. 11 is a schematic flowchart of another voice wake-up method provided by an embodiment of the present application. Referring to Figure 11, the method includes:
  • steps 401 to 402 refer to steps 201 to 202, which will not be repeated here.
  • Step 403 if the sound information includes a human voice, the first system activates the voice signal processing unit of the first system, and detects whether the user is looking at the screen of the electronic device.
  • step 203 Based on the relevant content in step 203, it can be known that if the sound information includes human voice, it means that the user may want to wake up the electronic device through a voice command. Further, after the screen is turned on, the user looks at the screen of the electronic device, indicating that the user is likely to need to use an application of the electronic device. Therefore, according to whether the sound information contains human voice and whether the user is watching the screen of the electronic device, comprehensively predicting the power-on time of the second system can increase the accuracy of the prediction.
  • whether the user is looking at the screen of the electronic device can be detected through a camera.
  • a smartwatch can use a camera to detect whether a user is looking at the smartwatch's screen.
  • Step 404 refer to step 204, which will not be repeated here.
  • Step 405 If it is detected that the user is looking at the screen of the electronic device, the first system powers on the second system through the power management unit.
  • the CPU core of the second system or the SOC chip of the second system may be powered on within the time range of T1 to T3. For example, at time T0-T1, it is detected that the sound information contains human voice, and at time T1, it is detected whether the user is looking at the screen of the electronic device. If it is detected that the user is looking at the screen of the electronic device, the first system can power on the CPU core of the second system or the SOC chip of the second system through the power management unit.
  • the second system may not be powered on first, but wait After the above-mentioned first voice recognition is successful, the first system powers on the CPU core of the second system or powers on the SOC chip of the second system through the power management unit.
  • the camera can send a power-on signal to the power management unit.
  • the power management unit powers on the CPU core of the second system or powers on the SOC chip of the second system.
  • Steps 406 to 407 refer to steps 206 to 207, which will not be repeated here.
  • the voice wake-up delay in this embodiment is basically the same as the voice wake-up delay in Embodiment 1, and details are not repeated here.
  • the preset power-on condition is: the sound information contains human voice. An embodiment corresponding to the preset power-on condition will be described below with reference to FIG. 12 .
  • FIG. 12 is a schematic flowchart of another voice wake-up method provided by an embodiment of the present application. Referring to Figure 12, the method includes:
  • steps 501 to 502 refer to steps 201 to 202, which will not be repeated here.
  • Step 503 if the sound information includes a human voice, the first system starts the voice processing unit of the first system, and powers up the second system through the power management unit.
  • the voice activity detection unit detects the human voice of the voice information, it means that the user may want to wake up the electronic device through a voice command.
  • the voice activity detection unit may send an activation signal to the voice processing unit of the first system.
  • the voice processing unit prepares to perform first voice recognition on the voice information in response to the activation signal.
  • the sound information includes human voice
  • the first system can power on the second system through the power management unit.
  • step 504 refer to step 204, which will not be repeated here.
  • steps 505-506, refer to steps 206-207, and details are not repeated here.
  • the voice wake-up delay of this embodiment is basically the same as that of Embodiment 1, which is not repeated here.
  • the preset power-on condition is: the holding time of the posture of the electronic device being the preset posture is greater than the time threshold. An embodiment corresponding to the preset power-on condition will be described below with reference to FIG. 13 .
  • FIG. 13 is a schematic flowchart of another voice wake-up method provided by an embodiment of the present application. Referring to Figure 13, the method includes:
  • step 601 please refer to step 201, which will not be repeated here.
  • Step 602 The first system starts the speech processing unit, and detects the holding time for the posture of the electronic device to be the preset posture.
  • the voice processing unit of the first system after lighting the screen of the electronic device, the voice processing unit of the first system can be activated, and the holding time of the posture of the electronic device being the preset posture can be detected after lighting the screen.
  • step 603 please refer to step 204, which will not be repeated here.
  • the speech processing unit will perform the first speech recognition on the sound information.
  • the voice processing unit may be in a waiting state.
  • Step 604 If the holding time is greater than the time threshold, the first system powers up the second system through the power management unit.
  • the holding time is greater than the time threshold, it means that the user may want to activate the voice assistant of the electronic device through a voice command.
  • the first system can power on the CPU core of the second system or power on the SOC chip of the second system in advance through the power management unit.
  • the holding time is less than or equal to the time threshold, after the first speech recognition is successful, the first system powers on the CPU core of the second system or the SOC chip of the second system through the power management unit.
  • steps 605-606 please refer to steps 206-207, which will not be repeated here.
  • the first system may not detect whether the voice information contains human voice.
  • the time it takes for the first system to detect whether the sound information contains human voice is about 7ms, which is negligible. Therefore, the voice wake-up delay in this embodiment is basically the same as the voice wake-up delay in Embodiment 1, and details are not repeated here.
  • the preset power-on condition is: the user's body is close to the electronic device. An embodiment corresponding to the preset power-on condition will be described below with reference to FIG. 14 .
  • FIG. 14 is a schematic flowchart of another voice wake-up method provided by an embodiment of the present application. Referring to Figure 14, the method includes:
  • step 701 please refer to step 201, which will not be repeated here.
  • Step 702 The first system starts the voice processing unit, and detects whether the user's limb is close to the electronic device.
  • whether the user's limb is close to the electronic device can be detected by a proximity optical sensor or an ultrasonic device.
  • a smart watch can detect whether a user's arm not wearing the smart watch is approaching the electronic device through a proximity light sensor or ultrasonic device.
  • step 703 please refer to step 204, which will not be repeated here.
  • Step 704 If it is detected that the user's limb is close to the electronic device, the first system powers on the second system through the power management unit.
  • the user's body is close to the electronic device, indicating that the user may want to activate the voice assistant of the electronic device through a voice command.
  • the first system can power on the CPU core of the second system or power on the SOC chip of the second system in advance through the power management unit.
  • steps 705 to 706 please refer to steps 206 to 207, which will not be repeated here.
  • the first system detects whether the user's limb is close to the electronic device, instead of detecting whether the holding time is greater than the time threshold. The time spent for the two detections is almost the same and can be ignored.
  • the first system may not detect whether the sound information includes a human voice, and the time it takes for the first system to detect whether the sound information includes a human voice is about 7 ms. Therefore, the voice wake-up delay in this embodiment is basically the same as the voice wake-up delay in Embodiment 1, and details are not repeated here.
  • the preset power-on condition is: the user looks at the screen of the electronic device.
  • the embodiment corresponding to the preset power-on condition will be described below with reference to FIG. 15 .
  • FIG. 15 is a schematic flowchart of another voice wake-up method provided by an embodiment of the present application. Referring to Figure 15, the method includes:
  • step 801 please refer to step 201, which will not be repeated here.
  • Step 802 The first system starts the voice processing unit, and detects whether the user is looking at the screen of the electronic device.
  • whether the user is looking at the screen of the electronic device can be detected through the camera.
  • a smartwatch can use a camera to detect whether the user is looking at the smartwatch's screen.
  • step 803 please refer to step 204, which will not be repeated here.
  • Step 804 If it is detected that the user is looking at the screen of the electronic device, the first system powers up the second system through the power management unit.
  • the user stares at the screen of the electronic device, indicating that the user may want to activate the voice assistant of the electronic device through a voice command.
  • the first system can power on the CPU core of the second system or power on the SOC chip of the second system in advance through the power management unit.
  • steps 805-806 please refer to steps 206-207, which will not be repeated here.
  • the first system detects whether the user is looking at the screen of the electronic device, instead of detecting whether the holding time is greater than the time threshold. The time spent for the two detections is almost the same and can be ignored.
  • the first system may not detect whether the sound information includes a human voice, and the time it takes for the first system to detect whether the sound information includes a human voice is about 7 ms. Therefore, the voice wake-up delay in this embodiment is basically the same as the voice wake-up delay in Embodiment 1, and details are not repeated here.
  • the second voice recognition needs to be performed on the sound information, but the embodiments of the present application are not limited to this.
  • the voice assistant application corresponding to the first voice recognition result can be directly started.
  • the preset power-on condition of this embodiment is the same as the preset power-on condition of the first embodiment.
  • the process of this embodiment is different from the process of Embodiment 1 in that: after the second system is powered on and the first voice recognition is successful, the second system does not need to perform second voice recognition on the voice information, and directly starts the process with the first voice recognition.
  • the voice assistant application corresponding to the voice recognition result.
  • the preset power-on conditions of this embodiment are the same as the preset power-on conditions of the second embodiment.
  • the process of this embodiment is different from the process of Embodiment 2 in that: after the second system is powered on and the first voice recognition is successful, the second system does not need to perform second voice recognition on the voice information, and directly starts the process with the first voice recognition.
  • the voice assistant application corresponding to the voice recognition result.
  • the preset power-on condition of this embodiment is the same as the preset power-on condition of the third embodiment.
  • the process of this embodiment is different from the process of Embodiment 3 in that: after the second system is powered on and the first voice recognition is successful, the second system does not need to perform second voice recognition on the voice information, and directly starts the process with the first voice recognition.
  • the voice assistant application corresponding to the voice recognition result.
  • the preset power-on condition of this embodiment is the same as the preset power-on condition of the fourth embodiment.
  • the process of this embodiment is different from the process of Embodiment 4 in that: after the second system is powered on and the first voice recognition is successful, the second system does not need to perform second voice recognition on the voice information, and directly starts the process with the first voice recognition.
  • the voice assistant application corresponding to the voice recognition result.
  • the preset power-on condition of this embodiment is the same as the preset power-on condition of the fifth embodiment.
  • the process of this embodiment is different from the process of Embodiment 5 in that: after the second system is powered on and the first voice recognition is successful, the second system does not need to perform second voice recognition on the voice information, and directly starts the process with the first voice recognition.
  • the voice assistant application corresponding to the voice recognition result.
  • the preset power-on condition of this embodiment is the same as the preset power-on condition of the sixth embodiment.
  • the process of this embodiment is different from the process of Embodiment 6 in that: after the second system is powered on and the first voice recognition is successful, the second system does not need to perform second voice recognition on the voice information, and directly starts the process with the first voice recognition.
  • the voice assistant application corresponding to the voice recognition result.
  • the preset power-on condition of this embodiment is the same as the preset power-on condition of the seventh embodiment.
  • the process of this embodiment is different from the process of Embodiment 7 in that: after the second system is powered on and the first voice recognition is successful, the second system does not need to perform second voice recognition on the voice information, and directly starts the process with the first voice recognition.
  • the voice assistant application corresponding to the voice recognition result.
  • FIG. 16 shows a structural block diagram of the voice wake-up apparatus 900 provided by the embodiment of the present application.
  • the above voice wake-up apparatus 900 is applied to the electronic device shown in FIG. 5 , and the electronic device includes a first system, a second system and a power management unit.
  • the first system is in a powered-on state
  • the second system is in a powered-off state.
  • the voice wake-up device 900 in this embodiment of the present application may include a screen lighting unit 901 , a detection and identification unit 902 , a power-on unit 903 , and an application startup unit 904 .
  • the screen lighting unit 901 is used for lighting the screen of the electronic device.
  • the detection and recognition unit 902 is configured to detect a preset power-on condition when the sound information is received by the first system, and perform first voice recognition on the sound information.
  • the power-on unit 903 is configured to power on the second system through the power management unit when the first system detects a preset power-on condition.
  • the application starting unit 904 is configured to start the target application when the second system is powered on and the first voice recognition result satisfies the first preset condition.
  • the first system can simultaneously detect the preset power-on condition and perform the first voice recognition on the sound information, and can power on the second system when the preset power-on condition is detected. Therefore, compared with the related art, the above-mentioned voice wake-up device can shorten or eliminate the time corresponding to the above-mentioned T3 to T4, and greatly shorten the voice wake-up time delay. After the user issues a voice command including the wake-up word, the above-mentioned voice wake-up device can start the target application relatively quickly, thereby reducing the user's waiting time and improving the user's experience.
  • the electronic device 1000 may include: at least one processor 1010, a memory 1020, and a computer program stored in the memory 1020 and running on the at least one processor 1010,
  • the processor 1010 executes the computer program
  • the steps in any of the foregoing method embodiments are implemented, for example, steps 101 to 104 in the embodiment shown in FIG. 8 .
  • the processor 1010 executes the computer program
  • the functions of the units in the foregoing device embodiments for example, the functions of the units 901 to 904 shown in FIG. 16 , are implemented.
  • the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 1020 and executed by the processor 1010 to complete the present application.
  • One or more modules/units may be a series of computer program segments capable of accomplishing specific functions, and the program segments are used to describe the execution process of the computer program in the electronic device 1000 .
  • FIG. 17 is only an example of an electronic device, and does not constitute a limitation to the electronic device. It may include more or less components than the one shown in the figure, or combine some components, or different components, such as Input and output devices, network access devices, buses, etc.
  • the processor 1010 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), ready-made processors. Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 1020 may be an internal storage unit of the electronic device, or may be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash memory card (Flash Card) etc.
  • the memory 1020 is used to store computer programs and other programs and data required by the electronic device.
  • the memory 1020 may also be used to temporarily store data that has been output or will be output.
  • the bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, or the like.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the buses in the drawings of the present application are not limited to only one bus or one type of bus.
  • Embodiments of the present application also provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer or processor is run on a computer or a processor, the computer or the processor is made to execute any one of the above methods. or multiple steps.
  • Embodiments of the present application also provide a computer program product containing instructions, when the computer program product runs on a computer or a processor, the computer or processor causes the computer or processor to execute one or more steps in any of the above methods.
  • An embodiment of the present application further provides a chip system, which may include a memory and a processor, and the processor executes a computer program stored in the memory to implement one or more steps in any of the above methods.
  • the chip system may be a single chip, or a chip module composed of multiple chips.
  • An embodiment of the present application further provides a chip system
  • the chip system may include a processor, the processor is coupled with a memory, and the processor executes a computer program stored in the memory, so as to implement one or more of the above-mentioned methods step.
  • the chip system may be a single chip, or a chip module composed of multiple chips.
  • a computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • Computer instructions may be stored in or transmitted over a computer-readable storage medium.
  • Computer instructions may be sent from one website site, computer, server, or data center to another website site, computer, server or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.) data center for transmission.
  • a computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • Useful media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
  • the process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium.
  • the program When the program is executed , which may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种电子设备,包括第一系统、第二系统和电源管理单元,电子设备处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态;第一系统用于点亮电子设备的屏幕;第一系统还用于在接收到声音信息的情况下,检测预设上电条件,并对声音信息进行第一语音识别;第一系统还用于在检测到预设上电条件的情况下,通过电源管理单元为第二系统上电;第二系统用于在上电完成且第一语音识别结果满足第一预设条件的情况下,启动目标应用。能够缩短或消除由于为第二系统上电而产生的时延,从而降低语音唤醒时延,快速启动目标应用,减少用户等待时间。

Description

语音唤醒方法、电子设备及芯片系统
本申请要求于2020年09月29日提交国家知识产权局、申请号为202011056925.X、申请名称为“语音唤醒方法、电子设备及芯片系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种语音唤醒方法、电子设备及芯片系统。
背景技术
语音助手应用作为人工智能的一种典型应用,在电子设备中得到越来越普及的应用。而语音唤醒作为语音助手应用的入口,其性能指标和唤醒时延均极大的影响用户体验。语音唤醒是指,在电子设备中预置唤醒词,当用户发出包含有该唤醒词的语音指令时,语音助手应用从休眠状态中被唤醒并作出响应,从而大大提升了人机交互效率。
目前,对于功耗要求较高的电子设备,可以采用双系统来提升电子设备的续航时间。对于不同的场景,电子设备可以选择相应的系统进行工作,以达到降低功耗、提升续航时间的效果。以智能手表为例,智能手表可以由第一系统和第二系统驱动。对于轻量级应用场景(例如普通表盘显示、本地音乐播放等),智能手表可以运行第一系统。对于重载应用场景(例如3D表盘、微信三方应用等),智能手表可以由第一系统切换至第二系统运行。
通常,在智能手表处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态。智能手表的语音唤醒过程为,在用户发出用于唤醒智能手表中语音助手应用的声音信息后,第一系统对该声音信息进行语音识别。第一系统对该声音信息的语音识别成功后,对第二系统恢复上电。之后,第二系统对该声音信息进行语音识别,第二系统对该声音信息的语音识别成功后,拉起语音助手应用,完成语音唤醒。
然而,在上述语音唤醒过程中,对第二系统恢复上电需要的时间较长,一般为200ms~1000ms左右。由此可见,虽然双系统能够降低电子设备的功耗,但是在拉起语音助手应用时会涉及到为处于下电状态的第二系统恢复上电,而为第二系统恢复上电所需的时间会增加电子设备的语音唤醒功能的时延,导致唤醒时延过大。
发明内容
本申请提供一种语音唤醒方法、电子设备及芯片系统,可以降低语音唤醒时延。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请实施例提供一种电子设备,该电子设备包括第一系统、第二系统和电源管理系统,电子设备处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态;第一系统,用于点亮电子设备的屏幕;第一系统,还用于在接收到声音信息的情况下,检测预设上电条件,并对声音信息进行第一语音识别;第一系统,还用于在检测到预设上电条件的情况下,通过电源管理单元为第二系统上电;第二系统,用于在上电完成且第一语音识别结果满足第一预设条件的情况下,启动目标应用。
上述电子设备中的第一系统在接收到声音信息时,不仅可以对声音信息进行第一语音识别,而且可以检测预设上电条件,且检测到预设上电条件就可以为第二系统上电。这种情况下,第一语音识别过程与第二系统上电过程互不干扰,各自独立,因而可以同时进行。因此,相对于相关技术,电子设备的语音唤醒过程能够缩短或消除由于为第二系统上电而产生的时延,从而可以大大降低语音唤醒时延。在用户发出包含有该唤醒词的语音指令之后,电子设备能够较快地启动目标应用,减少用户等待时间,提高用户的体验。
一些实施例中,若第一系统检测到电子设备的姿势为预设姿势,则点亮电子设备的屏幕。
例如,第一系统可以包括传感器控制中心处理器和加速度陀螺仪单元。加速度陀螺仪单元可以用于检测电子设备的姿势。加速度陀螺仪单元可以将检测到的电子设备的姿势,发送给传感器控制中心处理器。传感器控制中心处理器在电子设备的姿势为预设姿势时,向显示屏发送亮屏信号,请求点亮屏幕。
又例如,加速度陀螺仪单元可以在电子设备的姿势为预设姿势时,向传感器控制中心处理器发送亮屏信号。传感器控制中心处理器根据该亮屏信号请求显示屏点亮屏幕。
又例如,加速度陀螺仪单元可以在电子设备的姿势为预设姿势时,向显示屏发送亮屏信号。显示屏响应该亮屏信号点亮屏幕。
一些实施例中,若第一系统检测到用户肢体靠近电子设备,则点亮电子设备的屏幕。
例如,第一系统还可以包括接近光传感器、超声器件等器件。第一系统可以通过接近光传感器、超声器件等器件,检测用户肢体是否靠近电子设备。若检测到用户肢体靠近电子设备,则接近光传感器、超声器件等器件向传感器控制中心处理器发送亮屏信号。传感器控制中心处理器响应该亮屏信号,请求显示屏点亮屏幕。或者,接近光传感器、超声器件等器件向显示屏发送亮屏信号。显示屏响应亮屏信号,点亮屏幕。
一些实施例中,若第一系统检测到用户注视电子设备的屏幕,则点亮电子设备的屏幕。
例如,第一系统还可以包括摄像头,该摄像头在电子设备处于息屏状态时处于工作状态。第一系统可以通过摄像头检测用户是否注视电子设备的屏幕。若摄像头检测到用户注视电子设备的屏幕,则可以向传感器控制中心处理器发送亮屏信号。其中,摄像头检测到用户注视电子设备的屏幕,说明用户很可能要使用电子设备。传感器控制中心处理器响应该亮屏信号,请求显示屏点亮屏幕。
结合第一方面,在一些实施例中,上述预设上电条件包括以下一项或多项:声音信息包含人声;存在预设用户行为;电子设备的姿势满足预设姿势条件。
其中,第一系统可以通过加速度陀螺仪单元,检测电子设备的姿势是否满足预设姿势条件。示例性的,预设姿势条件可以为:电子设备的姿势为预设姿势的保持时间大于时间阈值。对应的,第一系统可以通过加速度陀螺仪单元,检测预设姿势的保持时间是否大于时间阈值。
例如,对于预设上电条件中包含电子设备的姿势满足预设姿势条件的情况,第一系统通过加速度陀螺仪单元检测电子设备的姿势是否满足预设姿势条件。若电子设备的姿势满足预设姿势条件,加速度陀螺仪单元可以向电源管理单元发送上电信号。电源管理单元响应该上电信号,为第二系统上电。
上述时间阈值可以为:用户通过电子设备查看时间这一用户行为,所对应的电子设备处于预设姿势的时长。该时间阈值可以根据以往获取的信息确定。例如,可以根据以往获取的第一时间和第二时间之间相距的时长确定上述第一时间阈值。其中,第一时间为电子设备点亮屏幕的时间,第二时间为在电子设备点亮屏幕后,第一系统对获取到的声音信息进行语音活动检测,检测到人声的时间。
示例性的,预设用户行为包括以下一项或多项:用户肢体靠近电子设备;用户注视电子设备的屏幕。例如,第一系统可以通过接近光传感器、超声器件等器件,检测用户肢体是否靠近电子设备。第一系统可以通过摄像头检测用户是否注视电子设备的屏幕。
结合第一方面,在一些实施例中,所述第二系统具体用于:在上电完成且第一语音识别结果满足第一预设条件的情况下,对声音信息进行第二语音识别;在第二语音识别结果满足第二预设条件的情况下,启动与第二语音识别结果对应的目标应用。
结合第一方面,在一些实施例中,第二系统还用于:在第二语音识别结果不满足第二预设条件的情况下,生成用于提示用户重新进行语音唤醒的提示信息。
其中,该提示信息能够提示用户语音唤醒失败,可以提醒用户再次发出包含预置唤醒词的声音信息。 之后,第一系统对用户再次发出的声音信息进行第一语音识别。若第一语音识别结果满足第一预设条件,第二系统对用户再次发出的声音信息进行第二语音识别。若第二语音识别结果满足第二预设条件,则第二系统启动与第二语音识别结果对应的目标应用。
第二方面,本申请实施例提供一种语音唤醒方法,应用于电子设备,电子设备包括第一系统、第二系统和电源管理单元,电子设备处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态;上述方法包括:第一系统点亮电子设备的屏幕;若第一系统接收到声音信息,则检测预设上电条件,并对声音信息进行第一语音识别;若第一系统检测到预设上电条件,则通过电源管理单元为第二系统上电;若对第二系统上电完成且第一语音识别结果满足第一预设条件,则第二系统启动目标应用。
上述语音唤醒方法,在接收到声音信息时,第一系统不仅可以对声音信息进行第一语音识别,而且可以检测预设上电条件,而且检测到预设上电条件就可以为第二系统上电。因此,相对于相关技术,上述语音唤醒方法能够缩短或消除由于为第二系统上电而产生的时延,从而可以大大降低语音唤醒时延。在用户发出包含有该唤醒词的语音指令之后,电子设备能够较快地启动目标应用,减少用户等待时间,提高用户的体验。
在一种应用场景中,第一系统可以在检测到该电子设备的姿势为预设姿势的情况下,点亮电子设备的屏幕。
例如,第一系统可以包括传感器控制中心处理器和加速度陀螺仪单元。加速度陀螺仪单元可以用于检测电子设备的姿势。加速度陀螺仪单元可以将检测到的电子设备的姿势,发送给传感器控制中心处理器。传感器控制中心处理器在电子设备的姿势为预设姿势时,向显示屏发送亮屏信号,请求点亮屏幕。
又例如,加速度陀螺仪单元可以在电子设备的姿势为预设姿势时,向传感器控制中心处理器发送亮屏信号。传感器控制中心处理器根据该亮屏信号请求显示屏点亮屏幕。
又例如,加速度陀螺仪单元可以在电子设备的姿势为预设姿势时,向显示屏发送亮屏信号。显示屏响应该亮屏信号点亮屏幕。
在又一种应用场景中,第一系统可以在检测到用户肢体靠近该电子设备的情况下,点亮电子设备的屏幕。
例如,第一系统还可以包括接近光传感器、超声器件等器件。第一系统可以通过接近光传感器、超声器件等器件,检测用户肢体是否靠近电子设备。若检测到用户肢体靠近电子设备,则接近光传感器、超声器件等器件向传感器控制中心处理器发送亮屏信号。传感器控制中心处理器响应该亮屏信号,请求显示屏点亮屏幕。或者,接近光传感器、超声器件等器件向显示屏发送亮屏信号。显示屏响应亮屏信号,点亮屏幕。
在又一种应用场景中,第一系统可以在检测到用户注视该电子设备的屏幕的情况下,点亮电子设备的屏幕。
例如,第一系统还可以包括摄像头,该摄像头在电子设备处于息屏状态时处于工作状态。第一系统可以通过摄像头检测用户是否注视电子设备的屏幕。若摄像头检测到用户注视电子设备的屏幕,则可以向传感器控制中心处理器发送亮屏信号。其中,摄像头检测到用户注视电子设备的屏幕,说明用户很可能要使用电子设备。传感器控制中心处理器响应该亮屏信号,请求显示屏点亮屏幕。
结合第二方面,在一些实施例中,预设上电条件为确定用户有很大可能会进行语音唤醒的条件。其中,上述预设上电条件以下一项或多项:声音信息包含人声;存在预设用户行为;电子设备的姿势满足预设姿势条件。
其中,第一系统可以通过加速度陀螺仪单元,检测电子设备的姿势是否满足预设姿势条件。示例性的, 上述预设姿势条件为:电子设备的姿势为预设姿势的保持时间大于时间阈值。对应的,第一系统可以通过加速度陀螺仪单元检测预设姿势的保持时间是否大于时间阈值。
示例性的,上述预设用户行为包括以下一项或多项:用户肢体靠近电子设备;用户注视电子设备的屏幕。例如,第一系统可以通过接近光传感器、超声器件等器件,检测用户肢体是否靠近电子设备。第一系统可以通过摄像头检测用户是否注视电子设备的屏幕。
结合第二方面,在一些实施例中,若对第二系统上电完成且第一语音识别结果满足第一预设条件,则第二系统启动目标应用,包括:若对第二系统上电完成且第一语音识别结果满足第一预设条件,则第二系统对声音信息进行第二语音识别;若第二语音识别结果满足第二预设条件,则第二系统启动与第二语音识别结果对应的目标应用。
结合第二方面,在一些实施例中,上述方法还包括:若第二语音识别结果不满足第二预设条件,则第二系统生成用于提示用户重新进行语音唤醒的提示信息。
其中,该提示信息能够提示用户语音唤醒失败,可以提醒用户再次发出包含预置唤醒词的声音信息。之后,第一系统对用户再次发出的声音信息进行第一语音识别。若第一语音识别结果满足第一预设条件,第二系统对用户再次发出的声音信息进行第二语音识别。若第二语音识别结果满足第二预设条件,则第二系统启动与第二语音识别结果对应的目标应用。
第三方面,本申请实施例提供一种语音唤醒装置,该语音唤醒装置应用于电子设备,该电子设备包括第一系统、第二系统和电源管理单元。电子设备处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态。该语音唤醒装置包括:屏幕点亮单元,用于点亮电子设备的屏幕。检测识别单元,用于在第一系统接收到声音信息的情况下,检测预设上电条件,并对声音信息进行第一语音识别。上电单元,用于在第一系统检测到预设上电条件的情况下,通过电源管理单元为第二系统上电。应用启动单元,用于对第二系统上电完成且所述第一语音识别结果满足第一预设条件的情况下,启动目标应用。
第四方面,本申请实施例提供一种电子设备,包括:一个或多个处理器、存储器和显示屏;所述存储器、所述显示屏与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令;当所述一个或多个处理器执行所述计算机指令时,使得所述电子设备执行如第一方面中任一项所述的方法。
第五方面,本申请实施例提供一种芯片系统,所述芯片系统包括处理器,所述处理器与存储器耦合,所述处理器执行所述存储器中存储的计算机程序,以实现如第一方面中任一项所述的方法。其中,该芯片系统可以为单个芯片,或者多个芯片组成的芯片模组。
第六方面,本申请实施例提供一种芯片系统,所述芯片系统包括存储器和处理器,所述处理器执行所述存储器中存储的计算机程序,以实现如第一方面中任一项所述的方法。其中,该芯片系统可以为单个芯片,或者多个芯片组成的芯片模组。
第七方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得电子设备执行上述第一方面中任一项所述的方法。
第八方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面任一项所述的方法。
可以理解地,上述提供的第三方面所述的语音唤醒装置、第四方面所述的电子设备、第五方面和第六方面所述的芯片系统、第七方面所述的计算机程序产品、第八方面所述的计算机可读存储介质、均用于执行第二方面中所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。
附图说明
图1为相关技术提供的语音唤醒过程示意图;
图2为图1实施例提供的语音唤醒过程的时延示意图;
图3为相关技术提供的语音唤醒过程示意图;
图4为图3实施例提供的语音唤醒过程的时延示意图;
图5为本申请实施例提供的电子设备的结构示意图;
图6为本申请实施例提供的智能手表抬腕亮屏的场景示意图;
图7为本申请实施例提供的语音唤醒过程的时延示意图;
图8为本申请实施例提供的语音唤醒方法的流程示意图;
图9为本申请实施例提供的语音唤醒方法的流程示意图;
图10为本申请实施例提供的语音唤醒方法的流程示意图;
图11为本申请实施例提供的语音唤醒方法的流程示意图;
图12为本申请实施例提供的语音唤醒方法的流程示意图;
图13为本申请实施例提供的语音唤醒方法的流程示意图;
图14为本申请实施例提供的语音唤醒方法的流程示意图;
图15为本申请实施例提供的语音唤醒方法的流程示意图;
图16为本申请实施例提供的语音唤醒装置的结构示意图;
图17为本申请实施例提供的电子设备的结构示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当……时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
此外,本申请实施例中提到的“多个”应当被解释为两个或两个以上。
本申请实施例中提供的语音唤醒方法中所涉及到的步骤仅仅作为示例,并非所有的步骤均是必须执行的步骤,或者并非各个信息或消息中的内容均是必选的,在使用过程中可以根据需要酌情增加或减少。
本申请实施例中同一个步骤或者具有相同功能的步骤或者消息在不同实施例之间可以互相参考借鉴。
本申请实施例描述的业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
在对本申请实施例进行详细地解释说明之前,先对本申请实施例涉及的应用场景予以说明。
语音唤醒是指,在电子设备中预置唤醒词,当用户发出包含有该唤醒词的语音指令时,语音助手应用从休眠状态中被唤醒并作出响应,从而大大提升了人机交互效率。语音唤醒作为语音助手应用的入口,其性能指标和唤醒时延均极大的影响用户体验。用户对语音唤醒时延的体验最佳值一般为650ms(毫秒)。
语音唤醒可以在具有双系统(可以称为第一系统和第二系统)的电子设备上执行。对于轻量级应用场景(例如普通表盘显示、本地音乐播放等),电子设备可以运行第一系统。对于重载应用场景(例如3D表盘、微信三方应用等),电子设备可以由第一系统切换至第二系统运行。
以下以电子设备为智能手表、第一系统为传感器控制中心(Sensor Hub)系统、第二系统为应用处理器(Application Processor,AP)系统为例,对相关技术中的语音唤醒过程进行说明。
其中,在智能手表处于息屏状态时,Sensor Hub系统处于上电状态,AP系统处于下电状态。例如,该下电状态可以为AP系统的CPU内核下电,即挂起到内存(Suspend to RAM,STR)状态。又例如,该下电状态可以为AP系统彻底下电,即快速挂起恢复(Fast Suspend Resume,FSR)状态。
图1为相关技术提供的一种语音唤醒过程示意图。参见图1,Sensor Hub系统通过数字麦克风(Digital Mic)采集到声音信息后,将该声音信息发送给Sensor Hub系统。其中,该声音信息可以包括:用户发出的用于唤醒智能手表中语音助手应用的人声信息(例如:你好小艺)。Sensor Hub系统对该声音信息进行语音活动检测(Voice Activity Detection,VAD),若检测到该声音信息中包含人声,对该声音信息进行第一语音识别。在第一语音识别结果满足预设条件(即第一语音识别成功)后,Sensor Hub系统向电源管理单元发送为第二系统上电的指令。电源管理单元响应该指令,为AP系统恢复上电。
其中,将AP系统从STR状态恢复上电为对AP系统的CPU内核上电,将AP系统从FSR状态下恢复上电为对AP系统的系统芯片(System on Chip,SOC)上电。在AP系统从STR状态/FSR状态下恢复上电后,AP系统对该声音信息进行第二语音识别。在第二语音识别结果满足预设条件(即第二语音识别成功)后,AP系统启动语音助手应用,完成语音唤醒。
示例性的,第一语音识别和第二语音识别均用于识别声音信息中是否包含预置唤醒词。若包含该唤醒词,则对应的语音识别成功,进入下一语音识别或启动语音助手应用。
其中,第一语音识别和第二语音识别均可以通过语音识别模型,和/或相关语音识别算法实现。而且相比于第一语音识别,第二语音识别的语音识别模型或语音识别算法一般为更大更精确的模型或算法。例如,对于声音信息中包含与预置唤醒词相近的伪唤醒词的情况,第一语音识别很可能识别为该声音信息包含预置唤醒词,而第二语音识别能够识别出该声音信息中不包含阈值唤醒词。这样,第二语音识别相比于第一语音识别,能够更加准确地识别声音信息中是否包含唤醒词,从而能够降低误唤醒率。
图2为图1实施例提供的语音唤醒过程的时延分析示意图。参见图2,用户发出用于语音唤醒的声音信息的时间为T0~T2,Sensor Hub系统在T0~T2的时间内持续获取声音信息。Sensor Hub系统从T0时间开始进行上述VAD处理,在T1时间检测到该声音信息中包含人声,即Sensor Hub系统在T0~T1的时间内完成VAD处理。在Sensor Hub系统完成VAD处理后,Sensor Hub系统开始对获取到的声音信息进行 第一语音识别。由于在T2时间之前,用户尚未发出包含完整的唤醒词“你好小艺”的声音信息,因此在T1~T2的时间内第一语音识别未能成功。直到T2时间用户发出包含完整的唤醒词“你好小艺”的声音信息后,Sensor Hub系统在T2~T3的时间内完成对声音信息的第一语音识别。之后,电源管理单元在T3~T4的时间内对AP系统上电。其中,对CPU内核上电过程大约需要200ms的时间,对SOC芯片上电过程大约需要1000ms的时间。AP系统上电完成后,AP系统在T4~T5的时间内对该声音信息进行第二语音识别。第二语音识别成功后,AP系统在T5~T6的时间内启动语音助手应用,完成语音唤醒。
由此可见,对AP系统上电大约需要200ms~1000ms的时间,即会对语音唤醒过程增加额外的200ms~1000ms的时延。用户在说出包含完整的唤醒词的语音指令后,需要经过较长的时间,智能手表才能启动语音助手应用。而这个时间远大于用户对语音唤醒时延的体验最佳值,导致用户体验较差。
对于上述语音唤醒时延较长的问题,相关技术提供又一种语音唤醒方法。参见图3和图4,相比于图1实施例中的语音唤醒过程,本实施例中的语音唤醒过程去除了第二语音识别,在T4时刻对AP系统上电后,AP系统直接启动语音助手应用(对应的时间为T4~T7)。由此可见,本实施例中的语音唤醒过程,能够省去第二语音识别所需要的时间T4~T5,降低语音唤醒时延。但是省去第二语音识别,会导致语音唤醒的误唤醒率升高,使得电子设备的唤醒性能下降,也会影响用户体验。
基于上述问题,本申请实施例提供一种电子设备及应用于该电子设备的语音唤醒方法。该电子设备包括第一系统、第二系统和电源管理单元。在电子设备处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态。第一系统在点亮电子设备的屏幕后,若接收到声音信息,则检测预设上电条件,并对声音信息进行第一语音识别。若第一系统检测到预设上电条件,则为第二系统上电。若对第二系统上电完成且第一语音识别结果满足预设条件,则第二系统启动目标应用。
本申请实施例提供的电子设备及语音唤醒方法中,第一系统可以同时检测预设上电条件和对声音信息进行第一语音识别,而且检测到预设上电条件就可以为第二系统上电。第一语音识别过程与第二系统上电过程互不干扰,各自独立,因而可以同时进行。因此,相对于相关技术,上述电子设备及语音唤醒方法,能够缩短或消除上述T3~T4对应的语音唤醒时延。
以下,对本申请实施例提供的电子设备进行详细说明。
本申请实施例所涉及的电子设备,可以包括但不限于智能手表、手机、个人数字助理(Personal Digital Assistant,PDA)、平板电脑、便携设备(例如,便携式计算机)、个人计算机(personal computer,PC)等可以具有双系统的设备,本申请实施例对此不予限定。
图5为本申请实施例提供的一种电子设备100的结构示意图。参见图5,电子设备100可以包括第一系统110、第二系统120和电源管理单元140。在电子设备100处于息屏状态时,第一系统110处于上电状态,第二系统120处于下电状态。
其中,第一系统110用于点亮电子设备100的屏幕。
在一种可能的方式中,若第一系统110检测到电子设备100的姿势为预设姿势,则点亮电子设备100的屏幕。
参见图5,电子设备100还包括显示屏130,第一系统110可以包括传感器控制中心处理器111和加速度陀螺仪单元112。加速度陀螺仪单元112可以用于检测电子设备100的姿势。其中,传感器控制中心处理器111和加速度陀螺仪单元112可以集成设置在一个芯片或设备上。
一种场景中,加速度陀螺仪单元112可以将检测到的电子设备100的姿势,发送给传感器控制中心处理器111。传感器控制中心处理器111在电子设备100的姿势为预设姿势时,向显示屏130发送亮屏信号,请求点亮屏幕。
又一种场景中,加速度陀螺仪单元112可以在电子设备100的姿势为预设姿势时,向传感器控制中心处理器111发送亮屏信号。传感器控制中心处理器111根据该亮屏信号请求显示屏130点亮屏幕。
又一种场景中,加速度陀螺仪单元112可以在电子设备100的姿势为预设姿势时,向显示屏130发送亮屏信号。显示屏130响应该亮屏信号点亮屏幕。
示例性的,加速度陀螺仪单元112可以为集加速度Acc和陀螺仪Gyro功能于一身的芯片。该芯片可以包括加速度传感器,陀螺仪传感器,以及处理器。加速度传感器和陀螺仪传感器用于监测电子设备100的姿势,并将监测数据发送给处理器。该处理器用于在电子设备100的姿势为预设姿势时,向传感器控制中心处理器111或显示屏130发送上述亮屏信号。
如图6所示,智能手表处于息屏状态,在用户抬起手腕后,Sensor Hub处理器响应上述亮屏信号点亮屏幕,显示当前的时间。对于电子设备100为手机或平板电脑的情况,可以设置为在手机或平板电脑水平放置时,手机或平板电脑屏幕为息屏状态,加速度陀螺仪单元112检测到手机或平板电脑倾斜或竖直放置时,手机或平板电脑点亮屏幕。
其中,上述预设姿势可以根据经验设置。示例性的,上述预设姿势可以根据用户使用电子设备100时电子设备100的姿势确定。例如,上述预设姿势可以为电子设备100处于水平姿势,或者可以为电子设备100处于倾斜姿势,本申请实施例对此不作唯一限定。
在又一种可能的方式中,若第一系统110检测到用户肢体靠近电子设备100,则点亮电子设备100的屏幕。
示例性的,第一系统110还可以包括接近光传感器、超声器件等器件。第一系统110可以通过接近光传感器、超声器件等器件,检测用户肢体是否靠近电子设备100。若检测到用户肢体靠近电子设备100,则接近光传感器、超声器件等器件向传感器控制中心处理器111发送亮屏信号。传感器控制中心处理器111响应该亮屏信号,请求显示屏130点亮屏幕。
示例性的,若接近光传感器、超声器件等器件检测到用户肢体靠近电子设备100,也可以向显示屏130发送亮屏信号。显示屏130响应亮屏信号,点亮屏幕。
在又一种可能的方式中,若第一系统110检测到用户注视电子设备100的屏幕,则点亮电子设备100的屏幕。
示例性的,第一系统110还可以包括摄像头,该摄像头在电子设备100处于息屏状态时处于工作状态。第一系统110可以通过摄像头检测用户是否注视电子设备100的屏幕。若摄像头检测到用户注视电子设备100的屏幕,则可以向传感器控制中心处理器111发送亮屏信号。其中,摄像头检测到用户注视电子设备100的屏幕,说明用户很可能要使用电子设备100。传感器控制中心处理器111响应该亮屏信号,请求显示屏130点亮屏幕。
示例性的,摄像头检测到用户注视电子设备100的屏幕,也可以向显示屏130发送亮屏信号。显示屏130响应亮屏信号,点亮屏幕。
第一系统110点亮电子设备100的屏幕后,若接收到声音信息,则第一系统110还用于在接收到声音信息的情况下,检测预设上电条件,并对声音信息进行第一语音识别。
其中,预设上电条件为确定用户有很大可能会进行语音唤醒的条件。在一些实施例中,上述预设上电条件可以包括以下一项或多项:声音信息包含人声;存在预设用户行为;电子设备100的姿势满足预设姿势条件。
示例性的,预设姿势条件可以根据经验设定,表征用户可能需要使用电子设备100。例如,预设姿势条件可以为:电子设备100的姿势为预设姿势的保持时间大于时间阈值。在一些实施例中,第一系统110 可以通过加速度陀螺仪单元112,检测预设姿势的保持时间是否大于时间阈值。
其中,上述时间阈值可以为:用户通过电子设备100查看时间这一用户行为,所对应的电子设备100处于预设姿势的时长。该时间阈值可以根据以往获取的信息确定。例如,可以根据以往获取的第一时间和第二时间之间相距的时长确定上述第一时间阈值。其中,第一时间为电子设备100点亮屏幕的时间,第二时间为在电子设备100点亮屏幕后,第一系统对获取到的声音信息进行语音活动检测,检测到人声的时间。
以智能手表为例,从语音唤醒的角度来说,在用户抬起手腕使得智能手表亮屏(简称抬腕亮屏)后(无触控操作),用户行为可分为两种:查看时间和语音唤醒。其中,查看时间的用户行为是高频场景,在抬腕亮屏后,用户的腕姿会保持固定姿势一段时间(例如T hold1)来查看时间。语音唤醒的用户行为是低频场景,在抬腕亮屏后,用户的腕姿会保持固定姿势一段时间(例如T hold2,且T hold2>T hold1),然后用户说出唤醒词来启动语音助手应用。例如,电子设备100在T00时间点亮屏幕,在T0时间检测到人声,则T00~T0对应的时长可以为上述T hold2。电子设备100可以记录多个T hold2,根据该多个T hold2确定上述第一时间阈值。
示例性的,预设用户行为可以根据经验设定,表征用户可能要使用电子设备100。例如,预设用户行为可以包括以下一项或多项:用户肢体靠近电子设备100;用户注视电子设备100的屏幕。
其中,第一系统110可以通过接近光传感器、超声器件等器件,检测用户肢体是否靠近电子设备100。第一系统110可以通过摄像头检测用户是否注视电子设备100的屏幕。
在一些实施例中,传感器控制中心处理器111可以包括一个或多个处理单元。例如,传感器控制中心处理器111可以包括控制单元,数字信号处理单元(digital signal processor,DSP),语音活动检测单元等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
示例性的,第一系统110可以通过电子设备100的数字麦克风采集声音信息,以及通过语音活动检测单元检测声音信息是否包含人声。若该声音信息包含人声,说明用户可能要通过语音指令唤醒电子设备100。传感器控制中心处理器111可以通过DSP单元,对该声音信息进行第一语音识别。若该声音信息不包含人声,则语音活动检测单元继续对数字麦克风采集的声音信息进行检测。
需要说明的是,在语音活动检测单元检测声音信息是否包含人声时,声音信息中可能并未包含完整的唤醒词。此时,第一系统110可以对当前采集到的声音信息进行第一语音识别,但第一语音识别结果通常不能满足预设条件,即第一语音识别不成功。而在数字麦克风采集到包含完整的唤醒词的声音信息,且唤醒词与预置唤醒词一致的情况下,第一系统110对声音信息的第一语音识别结果才可能满足预设条件,即第一语音识别成功。
一些实施例中,第一系统110可以在声音信息包含人声时对声音信息进行第一语音识别。其中,第一系统110可以通过相关算法或模型检测声音信息的频率、能量、相位、幅度等信息,确定声音信息中是否包含人声。
示例性的,对于声音信息未包含人声的情况,第一系统110可以不对声音信息进行第一语音识别,而是继续获取声音信息,直至声音信息中包含人声时再对该声音信息进行第一语音识别。由此能够减低第一系统110的能耗,增加电子设备的续航时间。
第一系统110还用于在检测到预设上电条件的情况下,通过电源管理单元140为第二系统120上电。
示例性的,第一系统110在检测到预设上电条件的情况下,可以向电源管理单元140发送上电信号。电源管理单元140响应该上电信号,为第二系统120上电。
第二系统120用于在上电完成且第一语音识别结果满足第一预设条件的情况下,启动目标应用。
一些实施例中,目标应用可以为与第一语音识别结果对应的应用。
示例性的,第一预设条件可以为:第一语音识别结果为声音信息中包含预置唤醒词。此时,第二系统 120可以启动与第一语音识别结果对应的目标应用,即第二系统120启动与预置唤醒词对应的目标应用。例如,第一语音识别结果为声音信息中包含预置唤醒词为“你好小艺”,则第二系统120启动与预置唤醒词“你好小艺”对应的语音助手应用。
一些实施例中,在第二系统120需要对声音信息进行第二语音识别的情况下,目标应用可以为与第二语音识别结果对应的应用。
其中,上述第二系统120具体用于:在上电完成且第一语音识别结果满足第一预设条件的情况下,对声音信息进行第二语音识别;若第二语音识别结果满足第二预设条件,则启动与第二语音识别结果对应的目标应用。
示例性的,第二预设条件可以为:第二语音识别结果为声音信息中包含预置唤醒词。此时,第二系统120可以启动与第二语音识别结果对应的目标应用,即第二系统120启动与预置唤醒词对应的目标应用。例如,第二语音识别结果为声音信息中包含预置唤醒词为“你好小艺”,则第二系统120启动与预置唤醒词“你好小艺”对应的语音助手应用。
需要说明的是,第一预设条件和第二预设条件可以相同,也可以不同,本申请实施例对此不予限定。示例性的,第一预设条件和第二预设条件均可以为声音信息中包含预置唤醒词,而在判断从声音信息中提取的唤醒词与预置唤醒词的相似度上,可以有所不同。
参见图5,第二系统120可以包括应用处理器121。应用处理器121可以包括一个或多个处理单元。例如,应用处理器121可以包括中央处理器(Central Processing Unit,CPU),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),存储器,视频编解码器,DSP,基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
示例性的,在为第二系统120上电完成且第一语音识别结果满足第一预设条件的情况下,应用处理器121通过DSP单元对声音信息进行第二语音识别。若第二语音识别结果满足第而预设条件,则应用处理器121启动与第二语音识别结果对应的目标应用。
可选的,上述第二系统120还用于:若第二语音识别结果不满足第二预设条件,生成用于提示用户重新进行语音唤醒的提示信息。
为便于描述,下文将第一语音识别结果满足第一预设条件称为第一语音识别成功,将第一语音识别结果不满足第一预设条件称为第一语音识别失败,将第二语音识别结果满足第二预设条件称为第二语音识别成功,将第二语音识别结果不满足第二预设条件称为第二语音识别失败。
其中,该提示信息能够提示用户语音唤醒失败,可以提醒用户再次发出包含预置唤醒词的声音信息。之后,第一系统110对用户再次发出的声音信息进行第一语音识别。若第一语音识别成功,第二系统120对用户再次发出的声音信息进行第二语音识别。若第二语音识别成功,则第二系统120启动与第二语音识别结果对应的目标应用。
图7为图5实施例提供的语音唤醒过程的时延分析示意图。
参见图7,用户发出包含完整的预置唤醒词的声音信息的时间为T0~T2。第一系统110在T0~T1的时间内进行VAD处理,检测声音信息是否包含人声。若声音信息包含人声,第一系统110在T1时间开始对声音信息进行第一语音识别。由于此时声音信息中未包含完整的唤醒词“你好小艺”,因此第一语音识别失败。直到T2时间获取到包含完整的唤醒词“你好小艺”的声音信息后,第一系统110在T2~T3的时间内完成对声音信息的第一语音识别。在第一系统110检测到预设上电条件后,电源管理单元140在T3'~T4'的时间内为第二系统120上电。上电完成后,第二系统120在T4'~T5'的时间内完成对声音信息的第二 语音识别。第二语音识别成功后,第二系统120在T5'~T6'的时间内启动目标应用(例如语音助手应用)。
其中,T3'可以为位于T1~T3之间的任意时间。图7中T3'位于T2和T3之间仅为示例性说明,在其他实施例中T3'也可以位于T1和T2之间。
因此,图7所示的语音唤醒过程相对于图2所示的语音唤醒过程,能够减少为第二系统120上电所产生的时延,即可以减少图2所示的T3~T4对应的时长带来的时延。若T4'时间为T3时间或在T3时间之前,则图7所示的语音唤醒过程可以完全消除由于为第二系统120上电而产生的时延,即唤醒时延可以减少图2中所示的T3~T4对应的时长;若T4'时间在T3时间之后,则图7所示的语音唤醒过程可以减少一部分由于为第二系统120上电而产生的时延,即唤醒时延可以减少图7所示的T3'~T3对应的时长。
对于电子设备100对声音信息只进行第一语音识别的情况,去除图7中T4'~T5'的部分即可,唤醒时延降低的情况可以参考图7的实施例,在此不再赘述。
上述电子设备100中的第一系统110在接收到声音信息时,不仅可以对声音信息进行第一语音识别,而且可以检测预设上电条件,且检测到预设上电条件就可以为第二系统120上电。这种情况下,第一语音识别过程与第二系统上电过程互不干扰,各自独立,因而可以同时进行。因此,相对于相关技术,电子设备100的语音唤醒过程能够缩短或消除由于为第二系统120上电而产生的时延,即缩短或消除图2所示的T3~T4对应的时长,从而可以大大降低语音唤醒时延。在用户发出包含有该唤醒词的语音指令之后,电子设备100能够较快地启动目标应用,减少用户等待时间,提高用户的体验。
参见图5,第二系统120还可以包括内部存储器122和外部存储器接口123。应用处理器121,内部存储器122,外部存储器接口123可以集成设置在一个芯片或设备上。例如,应用处理器121,内部存储器122,外部存储器接口123可以集成设置在SOC芯片上。
在一些实施例中,应用处理器121可以包括一个或多个接口。例如,接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,安全数字输入输出(Secure Digital Input and Output,SDIO)接口,串行外设(Serial Peripheral Interface,SPI)接口,移动行业处理器接口(Mobile Industry Processor Interface,MIPI),用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。其中,应用处理器121可以通过SDIO接口与电源管理单元140耦合,通过MIPI接口和I2C接口与显示屏130耦合。
在一些实施例中,传感器控制中心处理器111还可以包括一个或多个接口。例如,接口可以包括I2C接口,I2S接口,SPI接口,SDIO接口,和/或MIPI接口等。其中,传感器控制中心处理器111可以通过I2S接口和SDIO接口与应用处理器121耦合,通过MIPI接口和I2C接口与显示屏130耦合,通过SPI接口与加速度陀螺仪单元112耦合。
电源管理单元140连接电子设备100的电池,以及传感器控制中心处理器111和应用处理器121。电源管理单元140接收电池的输入,为传感器控制中心处理器111,应用处理器121,内部存储器122,和显示屏130等供电。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。例如,电子设备100还可以包括通用串行总线(universal serial bus,USB)接口,电池,移动通信单元,音频单元,扬声器,受话器,麦克风,按键,摄像头,以及用户标识模块(subscriber identification module,SIM)卡接口,压力传感器,气压传感器,磁传感器,距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等。
以下对本申请实施例提供的语音唤醒方法进行详细说明。
参见图8,图8是本申请实施例提供的一种语音唤醒方法的流程示意图。上述语音唤醒方法应用于电子设备,该电子设备包括第一系统、第二系统和电源管理单元。电子设备处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态。上述语音唤醒方法包括步骤101~步骤104。
步骤101、第一系统点亮该电子设备的屏幕。
其中,第一系统可以监测电子设备的姿势,可以对接收到的声音信息进行语音活动检测以及语音识别等。例如,第一系统可以通过加速度陀螺仪单元获取电子设备的姿势,可以对接收到的声音信息进行VAD检测,以及可以通过DSP单元对声音信息进行第一语音识别。
在一种应用场景中,第一系统可以在检测到该电子设备的姿势为预设姿势的情况下,点亮电子设备的屏幕。
在又一种应用场景中,第一系统可以在检测到用户注视该电子设备的屏幕的情况下,点亮电子设备的屏幕。
在又一种应用场景中,第一系统可以在检测到用户肢体靠近该电子设备的情况下,点亮电子设备的屏幕。
其中,上述点亮电子设备屏幕的具体实现方式,请参照图5所示实施例的相关内容,在此不再赘述。
步骤102、若第一系统接收到声音信息,则检测预设上电条件,并对声音信息进行第一语音识别。
其中,预设上电条件为确定用户有很大可能会进行语音唤醒的条件。上述预设上电条件可以可以包括以下一项或多项:声音信息包含人声;存在预设用户行为;电子设备的姿势满足预设姿势条件。
示例性的,预设姿势条件可以根据经验设定,表征用户可能需要使用电子设备。例如,预设姿势条件可以为:电子设备的姿势为预设姿势的保持时间大于时间阈值。
示例性的,预设用户行为可以根据经验设定,表征用户可能需要使用电子设备。例如,预设用户行为可以包括以下一项或多项:用户肢体靠近电子设备;用户注视电子设备的屏幕。
其中,关于预设上电条件中各项的详细说明,请参照图5所示实施例的相关内容,在此不再赘述。
一些实施例中,步骤102可以为:若第一系统接收到声音信息,则检测预设上电条件,并在声音信息包含人声时对声音信息进行第一语音识别。其中,第一系统可以通过相关算法或模型检测声音信息的频率、能量、相位、幅度等信息,确定声音信息中是否包含人声。
示例性的,对于声音信息未包含人声的情况,第一系统可以不对声音信息进行第一语音识别,而是继续获取声音信息,直至声音信息中包含人声时再对该声音信息进行第一语音识别。由此能够减低第一系统的能耗,增加电子设备的续航时间。
步骤103、若第一系统检测到预设上电条件,则通过电源管理单元为第二系统上电。
步骤104、若对第二系统上电完成且第一语音识别结果满足第一预设条件,则第二系统启动目标应用。
一些实施例中,目标应用可以为与第一语音识别结果对应的应用。
示例性的,第一预设条件可以为:第一语音识别结果为声音信息中包含预置唤醒词。此时,第二系统可以启动与第一语音识别结果对应的目标应用,即第二系统启动与预置唤醒词对应的目标应用。例如,第一语音识别结果为声音信息中包含预置唤醒词为“你好小艺”,则第二系统启动与预置唤醒词“你好小艺”对应的语音助手应用。
一些实施例中,在第二系统需要对声音信息进行第二语音识别的情况下,目标应用可以为与第二语音识别结果对应的应用。
其中,若对第二系统上电完成且第一语音识别结果满足第一预设条件,第二系统对声音信息进行第二 语音识别;若第二语音识别结果满足第二预设条件,则第二系统启动与第二语音识别结果对应的目标应用。
示例性的,第二预设条件可以为:第二语音识别结果为声音信息中包含预置唤醒词。此时,第二系统可以启动与第二语音识别结果对应的目标应用,即第二系统启动与预置唤醒词对应的目标应用。例如,第二语音识别结果为声音信息中包含预置唤醒词为“你好小艺”,则第二系统启动与预置唤醒词“你好小艺”对应的语音助手应用。
需要说明的是,第一预设条件和第二预设条件可以相同,也可以不同,本申请实施例对此不予限定。示例性的,第一预设条件和第二预设条件均可以为声音信息中包含预置唤醒词,而在判断从声音信息中提取的唤醒词与预置唤醒词的相似度上,可以有所不同。
可选的,若第二语音识别结果不满足第二预设条件,生成用于提示用户重新进行语音唤醒的提示信息。
为便于描述,下文将第一语音识别结果满足第一预设条件称为第一语音识别成功,将第一语音识别结果不满足第一预设条件称为第一语音识别失败,将第二语音识别结果满足第二预设条件称为第二语音识别成功,将第二语音识别结果不满足第二预设条件称为第二语音识别失败。
其中,该提示信息能够提示用户语音唤醒失败,可以使得用户再次发出包含预置唤醒词的声音信息。之后,第一系统对该声音信息进行第一语音识别。若第一语音识别成功,第二系统对该声音信息进行第二语音识别。若第二语音识别成功,则第二系统启动目标应用。
对本实施例中的语音唤醒方法的时延分析请参考图7及相关内容,在此不再赘述。
上述语音唤醒方法,在接收到声音信息时,第一系统不仅可以对声音信息进行第一语音识别,而且可以检测预设上电条件,而且检测到预设上电条件就可以为第二系统上电。因此,相对于相关技术,上述语音唤醒方法能够缩短或消除由于为第二系统上电而产生的时延,即缩短或消除图2所示的T3~T4对应的时长,从而可以大大降低语音唤醒时延。在用户发出包含有该唤醒词的语音指令之后,电子设备能够较快地启动目标应用,减少用户等待时间,提高用户的体验。
以下以目标应用为语音助手应用为例,对上述预设上电条件的几种可能实现的方式进行详细说明。
实施例1
预设上电条件为:声音信息包含人声,且电子设备的姿势为预设姿势的保持时间大于时间阈值。以下结合图9对该预设上电条件对应的实施例进行说明。
图9是本申请实施例提供的一种语音唤醒方法的流程示意图。参见图9,该方法可以包括:
步骤201、请参考步骤101,在此不再赘述。
步骤202、若第一系统接收到声音信息,对该声音信息进行语音活动检测。
其中,第一系统可以通过语音活动检测单元对该声音信息进行语音活动检测。例如,电子设备通过数字麦克风采集声音信息并发送给语音活动检测单元,语音活动检测单元对该声音进行语音活动检测。
步骤203、若该声音信息包含人声,则第一系统启动语音处理单元,以及检测在点亮屏幕后电子设备的姿势为预设姿势的保持时间。
示例性的,若语音活动检测单元检测到声音信息包含人声,说明用户可能要通过语音指令唤醒该电子设备。此时语音活动检测单元可以向第一系统的语音处理单元发送启动信号,该启动信号用于启动语音处理单元,以准备对声音信息进行第一语音识别。
需要说明的是,该人声可能为用于启动语音助手应用的语音指令,也可能为用户之间的语音交流。
若声音信息包含人声,说明用户可能要通过语音指令唤醒该电子设备。进一步的,在点亮屏幕后电子设备的姿势为预设姿势的保持时间大于时间阈值,说明用户有较大可能需要使用电子设备的某个应用。因此,根据声音信息是否包含人声和预设姿势的保持时间是否大于时间阈值,综合预判对第二系统上电的时 间,能够增加预判的准确性。
步骤204、第一系统通过该语音处理单元对声音信息进行第一语音识别。
示例性的,语音处理单元可以通过相关的语音识别模型或语音识别算法,对该声音信息进行第一语音识别,本申请实施例对第一语音识别的具体方式和过程不予限定。
其中,若第一语音识别成功,则可以在步骤204之后,执行步骤205;若第一语音识别未成功,则再次执行步骤204。
步骤205、若该保持时间大于时间阈值,则第一系统通过电源管理单元为第二系统上电。
示例性的,参见图7,可以在T1~T3的时间内为第二系统的CPU内核上电或为第二系统的SOC芯片上电。例如,在T0~T1时间检测到声音信息包含人声,在T1时间开始检测上述预设姿势的保持时间是否大于时间阈值。若检测到预设姿势的保持时间大于时间阈值,第一系统就可以通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
其中,对于上述预设姿势的保持时间小于或等于时间阈值的情况,由于没有在很大概率上确定用户需要进行语音唤醒,因此,为了节省能耗,可以先不为第二系统上电,而是等待至上述第一语音识别成功之后,第一系统再通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
一些实施例中,可以在检测到声音信息包含人声后,通过加速度陀螺仪单元检测上述预设姿势的保持时间。该保持时间具体可以为:点亮屏幕的时间到检测到声音信息包含人声的时间。若该保持时间大于时间阈值,加速度陀螺仪单元可以向电源管理单元发送上电信号。电源管理单元响应于该上电信号,为第二系统的CPU内核上电或为第二系统的SOC芯片上电。若该保持时间小于或等于时间阈值,加速度陀螺仪单元不向电源管理单元发送该上电信号。而是等待至上述第一语音识别成功之后,加速度陀螺仪单元再向电源管理单元发送该上电信号。
关于时间阈值的确定请参考前述相关内容,在此不再赘述。
步骤206、第二系统对该声音信息进行第二语音识别。
示例性的,第二系统可以通过相关的语音识别模型或语音识别算法,对该声音信息进行第二语音识别,本申请实施例对第二语音识别的具体方式和过程不予限定。
一般的,相比于第一语音识别,第二语音识别的语音识别模型或语音识别算法,一般更大更精确。例如,对于声音信息中包含与预置唤醒词相近的伪唤醒词的情况,第一语音识别很可能识别为该声音信息包含预置唤醒词,而第二语音识别能够识别出该声音信息中不包含阈值唤醒词。因此,第二语音识别能够更加准确地识别声音信息中是否包含唤醒词,从而能够降低误唤醒率。
其中,若第二语音识别成功,则执行步骤207。若第二语音识别未成功,则生成用于提示用户重新进行语音唤醒的提示信息。用户再次发出包含预置唤醒词的声音信息之后,第一系统对用户再次发出的声音信息进行第一语音识别。若第一语音识别成功,第二系统120对用户再次发出的声音信息进行第二语音识别。若第二语音识别成功,则第二系统120启动语音助手应用。
步骤207、在上述第二语音识别成功后,第二系统启动语音助手应用。
以下将本实施例的语音唤醒时延与图2实施例中的语音唤醒时延进行比较。其中,用户说出包含唤醒词的语音指令的时长约为800ms,T0~T1对应的时长约为7ms,T1~T2对应的时长约为793ms,T2~T3对应的时长约为100ms,为CPU内核上电所需时长约为200ms,为SOC芯片上电所需时长约为1000ms。在T1时间开始检测预设姿势的保持时间是否大于时间阈值,到检测到该保持时间大于时间阈值,时间差可以忽略不计。
对于为CPU内核上电的情况,图2实施例中语音唤醒时延为T2~T6对应的时长,本实施例中唤醒时 延为T2~T3对应的时长加上T4~T6对应的时长。因此,相对于图2实施例中的语音唤醒时延,本实施例的语音唤醒时延减少了T3~T4对应的时长(即200ms)。
对于为SOC芯片上电的情况,图2实施例中语音唤醒时延为T2~T6对应的时长,本实施例中唤醒时延为T2~T3对应的时长、T4~T6对应的时长与107ms(为SOC芯片上电所需时长1000ms减去T1~T3对应的时长)三者之和。因此,相对于图2实施例中的语音唤醒时延,本实施例的语音唤醒时延减少了约900ms(1000ms-107ms=893ms)。
实施例2
预设上电条件为:声音信息包含人声,且用户肢体靠近电子设备。以下结合图10对该预设上电条件对应的实施例进行说明。
图10是本申请实施例提供的又一种语音唤醒方法的流程示意图。参见图10,该方法可以包括:
步骤301~302、参见步骤201~202,在此不再赘述。
步骤303、若该声音信息包含人声,则第一系统启动第一系统的语音信号处理单元,以及检测用户肢体是否靠近该电子设备。
基于步骤203中相关内容可知,若声音信息包含人声,说明用户可能要通过语音指令唤醒该电子设备。进一步的,用户肢体靠近该电子设备,说明用户有较大可能需要使用该电子设备的某个应用。因此,根据声音信息是否包含人声和用户肢体是否靠近该电子设备,综合预判对第二系统上电的时间,能够增加预判的准确性。
一些实施例中,可以通过接近光传感器或超声器件,检测是否存在用户肢体靠近该电子设备。例如,智能手表可以通过接近光传感器或超声器件,检测用户未佩戴智能手表的胳膊是否靠近该电子设备。
步骤304、参考步骤204,在此不再赘述。
步骤305、若检测到用户肢体靠近该电子设备,则第一系统通过电源管理单元为第二系统上电。
示例性的,参见图7,可以在上述T1~T3的时间范围内为第二系统的CPU内核上电或为第二系统的SOC芯片上电。例如,T0~T1时间检测到声音信息包含人声,T1时间开始检测用户肢体是否靠近该电子设备。若检测到用户肢体靠近该电子设备,第一系统就可以通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
其中,对于未检测到用户肢体靠近电子设备的情况,由于没有在很大概率上确定用户需要进行语音唤醒,因此,为了节省能耗,可以先不为第二系统上电,而是等待至上述第一语音识别成功之后,第一系统再通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
在一种场景中,若接近光传感器或超声器件在检测到用户肢体靠近该电子设备,则可以向电源管理单元发送上电信号。电源管理单元响应于该上电信号,为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
步骤306~307、参考步骤206~207,在此不再赘述。
相对于实施例1的语音唤醒过程,本实施例中不检测上述预设姿势的保持时间是否大于时间阈值,而是检测用户肢体是否靠近该电子设备。而上述两种检测所花费的时间相差无几,因此本实施例的语音唤醒时延与实施例1的语音唤醒时延基本相同,在此不再赘述。
实施例3
预设上电条件为:声音信息包含人声,且用户注视电子设备的屏幕。以下结合图11对该预设上电条件对应的实施例进行说明。
图11是本申请实施例提供的又一种语音唤醒方法的流程示意图。参见图11,该方法包括:
步骤401~402、参见步骤201~202,在此不再赘述。
步骤403、若该声音信息包含人声,则第一系统启动第一系统的语音信号处理单元,以及检测用户是否注视该电子设备的屏幕。
基于步骤203中相关内容可知,若声音信息包含人声,说明用户可能要通过语音指令唤醒该电子设备。进一步的,在点亮屏幕后,用户注视该电子设备的屏幕,说明用户有较大可能需要使用该电子设备的某个应用。因此,根据声音信息是否包含人声和用户是否注视该电子设备的屏幕,综合预判对第二系统上电的时间,能够增加预判的准确性。
一些实施例中,可以通过摄像头检测用户是否在注视该电子设备的屏幕。例如,智能手表可以通过摄像头检测用户是否在注视该智能手表的屏幕。
步骤404、参考步骤204,在此不再赘述。
步骤405、若检测到用户注视该电子设备的屏幕,则第一系统通过电源管理单元为第二系统上电。
示例性的,参见图7,可以在上述T1~T3的时间范围内为第二系统的CPU内核上电或为第二系统的SOC芯片上电。例如,T0~T1时间检测到声音信息包含人声,T1时间开始检测用户是否注视该电子设备的屏幕。若检测到用户注视电子设备的屏幕,第一系统就可以通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
其中,对于未检测到用户注视该电子设备的屏幕的情况,由于没有在很大概率上确定用户需要进行语音唤醒,因此,为了节省能耗,可以先不为第二系统上电,而是等待至上述第一语音识别成功之后,第一系统再通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
在一种场景中,若摄像头在检测到用户注视该电子设备的屏幕时,则可以向电源管理单元发送上电信号。电源管理单元响应该上电信号,为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
步骤406~407、参考步骤206~207,在此不再赘述。
相对于实施例1的语音唤醒过程,本实施例中不检测上述预设姿势的保持时间是否大于时间阈值,而是检测用户是否注视该电子设备的屏幕。而上述两种检测所花费的时间相差无几,因此本实施例的语音唤醒时延与实施例1的语音唤醒时延基本相同,在此不再赘述。
实施例4
预设上电条件为:声音信息包含人声。以下结合图12对该预设上电条件对应的实施例进行说明。
图12是本申请实施例提供的又一种语音唤醒方法的流程示意图。参见图12,该方法包括:
步骤501~502、参见步骤201~202,在此不再赘述。
步骤503、若该声音信息包含人声,则第一系统启动第一系统的语音处理单元,以及通过电源管理单元为第二系统上电。
示例性的,若语音活动检测单元检测到声音信息人声后,说明用户可能要通过语音指令唤醒该电子设备。此时,语音活动检测单元可以向第一系统的语音处理单元发送启动信号。语音处理单元响应启动信号准备对声音信息进行第一语音识别。而且,在声音信息包含人声的情况下,第一系统就可以通过电源管理单元为第二系统上电。
步骤504、参见步骤204,在此不再赘述。
步骤505~506、参见步骤206~207,在此不再赘述。
相对于实施例1的语音唤醒过程,本实施例中不检测预设姿势的保持时间是否大于时间阈值。而检测预设姿势的保持时间是否大于时间阈值所花费的时间,几乎可以忽略不计,因此本实施例的语音唤醒时延与实施例1的语音唤醒时延基本相同,在此不再赘述。
实施例5
预设上电条件为:电子设备的姿势为预设姿势的保持时间大于时间阈值。以下结合图13对该预设上电条件对应的实施例进行说明。
图13是本申请实施例提供的又一种语音唤醒方法的流程示意图。参见图13,该方法包括:
步骤601、请参考步骤201,在此不再赘述。
步骤602、第一系统启动语音处理单元,以及检测电子设备的姿势为预设姿势的保持时间。
其中,相对于实施例2,本实施例在点亮电子设备的屏幕后,即可启动第一系统的语音处理单元,以及检测在点亮屏幕后电子设备的姿势为预设姿势的保持时间。
步骤603、请参考步骤204,在此不再赘述。
需要说明的是,在第一系统获取到声音信息的情况下,语音处理单元才会对该声音信息进行第一语音识别。而对于未获取到声音信息的情况,语音处理单元可以处于等待状态。
步骤604、若该保持时间大于时间阈值,则第一系统通过电源管理单元为第二系统上电。
其中,若保持时间大于时间阈值,说明用户可能要通过语音指令启动该电子设备的语音助手。此时,在上述保持时间大于时间阈值的情况下,第一系统就可以通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片提前上电。其中,若该保持时间小于或等于时间阈值,则在第一语音识别成功之后,第一系统通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片上电。
步骤605~606、请参考步骤206~207,在此不再赘述。
相对于实施例1的语音唤醒过程,本实施例中第一系统可以不检测声音信息是否包含人声。而第一系统检测声音信息是否包含人声所花费的时间约为7ms,可以忽略不计。因此,本实施例的语音唤醒时延与实施例1的语音唤醒时延基本相同,在此不再赘述。
实施例6
预设上电条件为:用户肢体靠近电子设备。以下结合图14对该预设上电条件对应的实施例进行说明。
图14是本申请实施例提供的又一种语音唤醒方法的流程示意图。参见图14,该方法包括:
步骤701、请参考步骤201,在此不再赘述。
步骤702、第一系统启动语音处理单元,以及检测用户肢体是否靠近电子设备。
一些实施例中,可以通过接近光传感器或超声器件,检测用户肢体是否靠近该电子设备。例如,智能手表可以通过接近光传感器或超声器件,检测用户未佩戴智能手表的胳膊是否靠近该电子设备。
步骤703、请参考步骤204,在此不再赘述。
步骤704、若检测到用户肢体靠近该电子设备,则第一系统通过电源管理单元为第二系统上电。
其中,用户肢体靠近该电子设备,说明用户可能要通过语音指令启动该电子设备的语音助手。此时,在检测到用户肢体靠近该电子设备的情况下,第一系统就可以通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片提前上电。
步骤705~706、请参考步骤206~207,在此不再赘述。
相对于实施例1的语音唤醒过程,本实施例中第一系统检测用户肢体是否靠近该电子设备,而不是检测保持时间是否大于时间阈值。而两种检测所花费的时间相差无几,可以忽略不计。另外,本实施例中第一系统可以不检测声音信息是否包含人声,而第一系统检测声音信息是否包含人声所花费的时间约为7ms。因此,本实施例的语音唤醒时延与实施例1的语音唤醒时延基本相同,在此不再赘述。
实施例7
预设上电条件为:用户注视电子设备的屏幕。以下结合图15对该预设上电条件对应的实施例进行说 明。
图15是本申请实施例提供的又一种语音唤醒方法的流程示意图。参见图15,该方法包括:
步骤801、请参考步骤201,在此不再赘述。
步骤802、第一系统启动语音处理单元,以及检测用户是否注视电子设备的屏幕。
一些实施例中,可以通过摄像头检测用户是否注视电子设备的屏幕。例如,智能手表可以通过摄像头检测用户是否注视智能手表的屏幕。
步骤803、请参考步骤204,在此不再赘述。
步骤804、若检测到用户注视电子设备的屏幕,则第一系统通过电源管理单元为第二系统上电。
其中,用户注视该电子设备的屏幕,说明用户可能要通过语音指令启动该电子设备的语音助手。此时,在检测到用户注视该电子设备的屏幕的情况下,第一系统就可以通过电源管理单元为第二系统的CPU内核上电或为第二系统的SOC芯片提前上电。
步骤805~806、请参考步骤206~207,在此不再赘述。
相对于实施例1的语音唤醒过程,本实施例中第一系统检测用户是否注视该电子设备的屏幕,而不是检测保持时间是否大于时间阈值。而两种检测所花费的时间相差无几,可以忽略不计。另外,本实施例中第一系统可以不检测声音信息是否包含人声,第一系统检测声音信息是否包含人声所花费的时间约为7ms。因此,本实施例的语音唤醒时延与实施例1的语音唤醒时延基本相同,在此不再赘述。
以上实施例1~7中,第二系统恢复上电后,均需对声音信息进行第二语音识别,但本申请实施例并不限于此。例如,在第二系统恢复上电后,可直接启动与第一语音识别结果对应的语音助手应用。以下通过几个实施例进行详细说明。
实施例8
本实施例的预设上电条件与实施例1的预设上电条件相同。
本实施例的流程与实施例1的流程不同之处在于:在第二系统恢复上电且第一语音识别成功后,第二系统不需对声音信息进行第二语音识别,直接启动与第一语音识别结果对应的语音助手应用。
实施例9
本实施例的预设上电条件与实施例2的预设上电条件相同。
本实施例的流程与实施例2的流程不同之处在于:在第二系统恢复上电且第一语音识别成功后,第二系统不需对声音信息进行第二语音识别,直接启动与第一语音识别结果对应的语音助手应用。
实施例10
本实施例的预设上电条件与实施例3的预设上电条件相同。
本实施例的流程与实施例3的流程不同之处在于:在第二系统恢复上电且第一语音识别成功后,第二系统不需对声音信息进行第二语音识别,直接启动与第一语音识别结果对应的语音助手应用。
实施例11
本实施例的预设上电条件与实施例4的预设上电条件相同。
本实施例的流程与实施例4的流程不同之处在于:在第二系统恢复上电且第一语音识别成功后,第二系统不需对声音信息进行第二语音识别,直接启动与第一语音识别结果对应的语音助手应用。
实施例12
本实施例的预设上电条件与实施例5的预设上电条件相同。
本实施例的流程与实施例5的流程不同之处在于:在第二系统恢复上电且第一语音识别成功后,第二系统不需对声音信息进行第二语音识别,直接启动与第一语音识别结果对应的语音助手应用。
实施例13
本实施例的预设上电条件与实施例6的预设上电条件相同。
本实施例的流程与实施例6的流程不同之处在于:在第二系统恢复上电且第一语音识别成功后,第二系统不需对声音信息进行第二语音识别,直接启动与第一语音识别结果对应的语音助手应用。
实施例14
本实施例的预设上电条件与实施例7的预设上电条件相同。
本实施例的流程与实施例7的流程不同之处在于:在第二系统恢复上电且第一语音识别成功后,第二系统不需对声音信息进行第二语音识别,直接启动与第一语音识别结果对应的语音助手应用。
对应于上文实施例的语音唤醒方法,图16示出了本申请实施例提供的语音唤醒装置900的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。上述语音唤醒装置900应用于图5所述的电子设备,该电子设备包括第一系统、第二系统和电源管理单元。电子设备处于息屏状态时,第一系统处于上电状态,第二系统处于下电状态。
参见图16,本申请实施例中的语音唤醒装置900可以包括屏幕点亮单元901、检测识别单元902、上电单元903和应用启动单元904。
其中,屏幕点亮单元901,用于点亮电子设备的屏幕。检测识别单元902,用于在第一系统接收到声音信息的情况下,检测预设上电条件,并对声音信息进行第一语音识别。上电单元903,用于在第一系统检测到预设上电条件的情况下,通过电源管理单元为第二系统上电。应用启动单元904,用于对第二系统上电完成且所述第一语音识别结果满足第一预设条件的情况下,启动目标应用。
上述语音唤醒装置,第一系统可以同时检测预设上电条件和对声音信息进行第一语音识别,而且检测到预设上电条件就可以为第二系统上电。因此,相对于相关技术,上述语音唤醒装置能够缩短或消除上述T3~T4对应的时间,大大缩短语音唤醒时延。在用户发出包含有该唤醒词的语音指令之后,上述语音唤醒装置能够较快地启动目标应用,减少用户等待时间,提高用户的体验。
本申请实施例还提供了一种电子设备,参见图17,电子设备1000可以包括:至少一个处理器1010、存储器1020以及存储在存储器1020中并可在至少一个处理器1010上运行的计算机程序,处理器1010执行计算机程序时实现上述任意各个方法实施例中的步骤,例如图8所示实施例中的步骤101至步骤104。或者,处理器1010执行计算机程序时实现上述各装置实施例中各单元的功能,例如图16所示单元901至904的功能。
示例性的,计算机程序可以被分割成一个或多个模块/单元,一个或者多个模块/单元被存储在存储器1020中,并由处理器1010执行,以完成本申请。一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序段,该程序段用于描述计算机程序在电子设备1000中的执行过程。
本领域技术人员可以理解,图17仅仅是电子设备的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如输入输出设备、网络接入设备、总线等。
处理器1010可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器1020可以是电子设备的内部存储单元,也可以是电子设备的外部存储设备,例如插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。 存储器1020用于存储计算机程序以及电子设备所需的其他程序和数据。存储器1020还可以用于暂时地存储已经输出或者将要输出的数据。
总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
本申请实施例还提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
本申请实施例还提供了一种芯片系统,该芯片系统可包括存储器和处理器,该处理器执行该存储器中存储的计算机程序,以实现上述任一个方法中的一个或多个步骤。其中,该芯片系统可以为单个芯片,或者多个芯片组成的芯片模组。
本申请实施例还提供了一种芯片系统,该芯片系统可包括处理器,该处理器与存储器耦合,该处理器执行存储器中存储的计算机程序,以实现上述任一个方法中的一个或多个步骤。其中,该芯片系统可以为单个芯片,或者多个芯片组成的芯片模组。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者通过计算机可读存储介质进行传输。计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (14)

  1. 一种电子设备,其特征在于,所述电子设备包括第一系统、第二系统和电源管理单元,所述电子设备处于息屏状态时,所述第一系统处于上电状态,所述第二系统处于下电状态;
    所述第一系统,用于点亮所述电子设备的屏幕;
    所述第一系统,还用于在接收到声音信息的情况下,检测预设上电条件,并对所述声音信息进行第一语音识别;
    所述第一系统,还用于在检测到所述预设上电条件的情况下,通过所述电源管理单元为所述第二系统上电;
    所述第二系统,用于在上电完成且所述第一语音识别结果满足第一预设条件的情况下,启动目标应用。
  2. 根据权利要求1所述的电子设备,其特征在于,所述预设上电条件包括以下一项或多项:
    所述声音信息包含人声;
    存在预设用户行为;
    所述电子设备的姿势满足预设姿势条件。
  3. 根据权利要求2所述的电子设备,其特征在于,所述预设姿势条件为:所述电子设备的姿势为预设姿势的保持时间大于时间阈值。
  4. 根据权利要求2所述的电子设备,其特征在于,所述预设用户行为包括以下一项或多项:
    用户肢体靠近所述电子设备;
    用户注视所述电子设备的屏幕。
  5. 根据权利要求1至4任一项所述的电子设备,其特征在于,所述第二系统具体用于:
    在上电完成且所述第一语音识别结果满足所述第一预设条件的情况下,对所述声音信息进行第二语音识别;
    在所述第二语音识别结果满足第二预设条件的情况下,启动与所述第二语音识别结果对应的目标应用。
  6. 根据权利要求5所述的电子设备,其特征在于,所述第二系统还用于:
    在所述第二语音识别结果不满足所述第二预设条件的情况下,生成用于提示用户重新进行语音唤醒的提示信息。
  7. 一种语音唤醒方法,其特征在于,应用于电子设备,所述电子设备包括第一系统、第二系统和电源管理单元,所述电子设备处于息屏状态时,所述第一系统处于上电状态,所述第二系统处于下电状态,所述方法包括:
    所述第一系统点亮所述电子设备的屏幕;
    若所述第一系统接收到声音信息,则检测预设上电条件,并对所述声音信息进行第一语音识别;
    若所述第一系统检测到所述预设上电条件,则通过所述电源管理单元为所述第二系统上电;
    若对所述第二系统上电完成且所述第一语音识别结果满足第一预设条件,则所述第二系统启动目标应用。
  8. 根据权利要求7所述的方法,其特征在于,所述预设上电条件包括以下一项或多项:
    所述声音信息包含人声;
    存在预设用户行为;
    所述电子设备的姿势满足预设姿势条件。
  9. 根据权利要求8所述的方法,其特征在于,所述预设姿势条件为:所述电子设备的姿势为预设姿势的保持时间大于时间阈值。
  10. 根据权利要求8所述的方法,其特征在于,所述预设用户行为包括以下一项或多项:
    用户肢体靠近所述电子设备;
    用户注视所述电子设备的屏幕。
  11. 根据权利要求7至10任一项所述的方法,其特征在于,所述若对所述第二系统上电完成且所述第一语音识别结果满足第一预设条件,则所述第二系统启动目标应用,包括:
    若对所述第二系统上电完成且所述第一语音识别结果满足所述第一预设条件,则所述第二系统对所述声音信息进行第二语音识别;
    若所述第二语音识别结果满足第二预设条件,则所述第二系统启动与所述第二语音识别结果对应的目标应用。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    若所述第二语音识别结果不满足所述第二预设条件,则所述第二系统生成用于提示用户重新进行语音唤醒的提示信息。
  13. 一种电子设备,其特征在于,包括:一个或多个处理器、存储器和显示屏;
    所述存储器、所述显示屏与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令;
    当所述一个或多个处理器执行所述计算机指令时,使得所述电子设备执行如权利要求7至12中任一项所述的方法。
  14. 一种芯片系统,其特征在于,所述芯片系统包括处理器,所述处理器与存储器耦合,所述处理器执行所述存储器中存储的计算机程序,以实现如权利要求7至12中任一项所述的方法。
PCT/CN2021/117227 2020-09-29 2021-09-08 语音唤醒方法、电子设备及芯片系统 WO2022068544A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011056925.X 2020-09-29
CN202011056925.XA CN114333854A (zh) 2020-09-29 2020-09-29 语音唤醒方法、电子设备及芯片系统

Publications (1)

Publication Number Publication Date
WO2022068544A1 true WO2022068544A1 (zh) 2022-04-07

Family

ID=80951191

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117227 WO2022068544A1 (zh) 2020-09-29 2021-09-08 语音唤醒方法、电子设备及芯片系统

Country Status (2)

Country Link
CN (1) CN114333854A (zh)
WO (1) WO2022068544A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116456441A (zh) * 2023-06-16 2023-07-18 荣耀终端有限公司 声音处理装置、方法和电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253488A (zh) * 2022-06-10 2023-12-19 Oppo广东移动通信有限公司 语音识别方法、装置、设备及存储介质
CN118034790A (zh) * 2022-11-14 2024-05-14 华为技术有限公司 一种应用程序启动方法及智能设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160066113A1 (en) * 2014-08-28 2016-03-03 Qualcomm Incorporated Selective enabling of a component by a microphone circuit
CN105632491A (zh) * 2014-11-26 2016-06-01 三星电子株式会社 用于语音识别的方法和电子装置
CN107103906A (zh) * 2017-05-02 2017-08-29 网易(杭州)网络有限公司 一种唤醒智能设备进行语音识别的方法、智能设备和介质
CN110223691A (zh) * 2019-06-11 2019-09-10 苏州思必驰信息科技有限公司 语音唤醒识别的切换控制方法和装置
CN111105792A (zh) * 2018-10-29 2020-05-05 华为技术有限公司 语音交互处理方法及装置
CN111402871A (zh) * 2019-01-03 2020-07-10 三星电子株式会社 电子装置及其控制方法
CN111429901A (zh) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 一种面向IoT芯片的多级语音智能唤醒方法及系统
US10720158B2 (en) * 2015-02-27 2020-07-21 Imagination Technologies Limited Low power detection of a voice control activation phrase

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160066113A1 (en) * 2014-08-28 2016-03-03 Qualcomm Incorporated Selective enabling of a component by a microphone circuit
CN105632491A (zh) * 2014-11-26 2016-06-01 三星电子株式会社 用于语音识别的方法和电子装置
US10720158B2 (en) * 2015-02-27 2020-07-21 Imagination Technologies Limited Low power detection of a voice control activation phrase
CN107103906A (zh) * 2017-05-02 2017-08-29 网易(杭州)网络有限公司 一种唤醒智能设备进行语音识别的方法、智能设备和介质
CN111105792A (zh) * 2018-10-29 2020-05-05 华为技术有限公司 语音交互处理方法及装置
CN111402871A (zh) * 2019-01-03 2020-07-10 三星电子株式会社 电子装置及其控制方法
CN110223691A (zh) * 2019-06-11 2019-09-10 苏州思必驰信息科技有限公司 语音唤醒识别的切换控制方法和装置
CN111429901A (zh) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 一种面向IoT芯片的多级语音智能唤醒方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116456441A (zh) * 2023-06-16 2023-07-18 荣耀终端有限公司 声音处理装置、方法和电子设备
CN116456441B (zh) * 2023-06-16 2023-10-31 荣耀终端有限公司 声音处理装置、方法和电子设备

Also Published As

Publication number Publication date
CN114333854A (zh) 2022-04-12

Similar Documents

Publication Publication Date Title
WO2022068544A1 (zh) 语音唤醒方法、电子设备及芯片系统
EP3026667B1 (en) Method and electronic device for voice recognition
US10263439B2 (en) Method and apparatus for protecting battery
US9880606B2 (en) Method and apparatus for wake-up control of intelligent terminal
US9591121B2 (en) Function controlling method and electronic device supporting the same
CN107103906B (zh) 一种唤醒智能设备进行语音识别的方法、智能设备和介质
KR20180083587A (ko) 전자 장치 및 그의 동작 방법
CN108574322B (zh) 基于电池的电压的充电控制方法和电子设备
US20140351618A1 (en) Method and Electronic Device for Bringing a Primary Processor Out of Sleep Mode
EP2816554A2 (en) Method of executing voice recognition of electronic device and electronic device using the same
KR20180074301A (ko) 배터리 이상 상태 확인 방법 및 장치
WO2021036714A1 (zh) 一种语音控制的分屏显示方法及电子设备
KR20160055162A (ko) 음성 인식을 위한 전자 장치 및 방법
KR102657052B1 (ko) 배터리 충전 방법 및 전자 장치
US11074910B2 (en) Electronic device for recognizing speech
KR102630526B1 (ko) 배터리를 충전하는 전자 장치 및 그 동작 방법
KR20170019127A (ko) 전자 장치 상태에 따른 제어 방법 및 그 장치
US20170070080A1 (en) Electronic device and operating method thereof
KR102324964B1 (ko) 외부 입력 장치의 입력을 처리하는 전자 장치 및 방법
KR20180108225A (ko) 전자 장치의 상태에 따른 외부 장치 제어 방법 및 장치
KR20170036198A (ko) 통신 수행 방법 및 이를 지원하는 전자장치
KR20160129626A (ko) 배터리 팽창을 방지하기 위한 방법 및 그 전자 장치
KR20170075327A (ko) 전력을 관리하는 전자 장치 및 그 제어 방법
US10756847B2 (en) Electronic device and method for detecting error thereof
KR20170071068A (ko) 충전 및 데이터 통신 경로 제어 방법 및 이를 구현한 전자 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874205

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21874205

Country of ref document: EP

Kind code of ref document: A1