CN112447180A

CN112447180A - Voice wake-up method and device

Info

Publication number: CN112447180A
Application number: CN201910815809.2A
Authority: CN
Inventors: 程飞飞; 孙文涌
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2021-03-05

Abstract

The application provides a voice awakening method and a voice awakening device, wherein the method comprises the following steps: monitoring a currently occurring event; recognizing scene information according to the event and awakening a voice engine; and generating first voice information according to the scene information, and playing the first voice information through the voice engine. The voice awakening method and the voice awakening device not only can actively awaken the voice engine according to the current event, so that interaction between the terminal equipment and the user is intelligent, but also can prevent the voice engine from being in a running state all the time, and therefore power consumption of the terminal equipment is reduced.

Description

Voice wake-up method and device

Technical Field

The embodiment of the application relates to the technical field of terminal equipment, in particular to a voice awakening method and device.

Background

With the rapid development of artificial intelligence, the requirement for humanized experience of terminal equipment is higher and higher through voice interaction. The terminal device wakes up a speech engine (or a speech assistant) running in the background through various means, so that the user can realize the speech interaction with the terminal device.

The prior art wake-up means for waking up the speech engine (or the speech assistant) include the following: (1) awakening through awakening keywords; (2) awakening through a key mode; (3) for some specific products, the voice engine is always operated after the system is started, and the voice call of the user is waited in the background; (4) the speech engine is directly started in some specific mode (e.g. driving mode) or in a specific application to wait for a voice call of the user.

However, in the first two wake-up modes in the prior art, the interaction between the terminal device and the user is not intelligent enough because the wake-up word and the wake-up key are required, and the power consumption of the terminal device is large because the speech engine is always in the running state in the other two wake-up modes.

Disclosure of Invention

The embodiment of the application provides a voice awakening method and device, which not only can enable interaction between terminal equipment and a user to be intelligent, but also can prevent a voice engine from being in a running state all the time, so that the power consumption of the terminal equipment is reduced.

In a first aspect, an embodiment of the present application provides a voice wake-up method, including:

monitoring a currently occurring event;

recognizing scene information according to the event and awakening a voice engine;

and generating first voice information according to the scene information, and playing the first voice information through the voice engine.

In the scheme, the current event is monitored, the scene information is identified according to the monitored event, the first voice information is generated according to the scene information, in addition, the terminal equipment can awaken the voice engine according to the monitored event, the first voice information is played through the awakened voice engine, therefore, the terminal equipment can carry out voice interaction with the user after actively awakening the voice engine, interaction between the terminal equipment and the user can be intelligentized, the voice engine can be prevented from being always in the running state, and the power consumption of the terminal equipment is reduced. In addition, after the voice engine enters the wake-up mode, the terminal equipment can actively send out voice so as to realize voice interaction with the user, and therefore the experience of the user can be intelligentized.

In one possible implementation, the events include system events and/or application events.

In one possible implementation, the event includes one or more of a system service event, an application state event, a terminal device state event, a terminal network connection state event, a terminal device location state event, and a terminal device connection state event.

In the scheme, the system service event can comprise screen locking and unlocking events, screen capturing and recording events and input method calling events; application state events may include, for example, phone events, short message events, instant message events, alarm events; carrying out AI deep analysis based on the application and then outputting intelligent events, such as express receiving, getting on and off duty and arriving at an airport; terminal device state events may include, for example, power on/off events, screen wake-up, battery low events; terminal device location status events may include, for example, initiating positioning and navigation events; the terminal device connection state event may include, for example, connecting a car machine, connecting a bluetooth headset.

The voice engine can be actively awakened through the multiple events, so that the awakening scene of the voice engine can be enriched, and the user experience is improved.

In a possible implementation manner, the generating first voice information according to the scene information and playing the first voice information through the voice engine includes:

determining the user intention of the user according to the scene information;

generating semantic text information corresponding to the first voice information according to the user intention;

and sending the semantic text information to the voice engine to play the first voice information.

In the scheme, the terminal equipment can determine the user intention of the user according to the scene information, so that the semantic text information corresponding to the first voice information is generated according to the user intention, and the semantic text information is sent to the voice engine to play the first voice information, so that the content most desired by the user can be played, and the user experience is improved.

In one possible implementation, the method further includes:

and when the terminal equipment is determined to meet the preset conditions, closing the voice engine.

In one possible implementation, the preset condition includes at least one of the following conditions:

the time that the screen is in the screen-off state exceeds first preset time;

the event exits; and

and the time of no interactive operation with the user exceeds a second preset time, wherein the time of no interactive operation is the duration time of no detection of the operation terminal equipment.

In the scheme, when the terminal equipment meets the preset condition, the voice engine is closed, so that the power consumption of the terminal equipment can be saved.

In a possible implementation manner, after the playing of the first voice message by the voice engine, the method further includes:

and receiving second voice information input by the user.

In the scheme, after the first voice message is played through the voice engine, the second voice message input by the user can be received, so that the voice interaction between the user and the terminal equipment is completed, and the user experience is improved.

In one possible implementation, the monitoring the currently occurring event includes:

monitoring whether the state of the event changes;

the wake-up speech engine comprises:

and when the state of the event is monitored to be changed, waking up the voice engine.

In a second aspect, an embodiment of the present application provides a voice wake-up apparatus, including:

the processing unit is used for monitoring a current event;

the processing unit is also used for identifying scene information according to the event and awakening a voice engine;

the processing unit is further configured to generate first voice information according to the scene information, and play the first voice information through the voice engine.

In a possible implementation manner, the processing unit is specifically configured to:

determining the user intention of the user according to the scene information;

In a possible implementation manner, the processing unit is further configured to:

the time that the screen is in the screen-off state exceeds first preset time;

the event exits; and

In one possible implementation, the apparatus further includes:

and the receiving unit is used for receiving the second voice information input by the user.

monitoring whether the state of the event changes;

The apparatus provided in the second aspect of the present application may be a terminal device, or may be a chip in the terminal device, where the terminal device or the network device or the chip has a function of implementing the voice wake-up method in the above aspects or any possible design thereof. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more units corresponding to the above functions.

The terminal device includes: the terminal device comprises a processing unit which may be a processor, and a transceiver unit which may be a transceiver comprising radio frequency circuitry, and optionally a storage unit which may be a memory, for example. When the terminal device includes a storage unit, the storage unit is configured to store computer-executable instructions, the processing unit is connected to the storage unit, and the processing unit executes the computer-executable instructions stored in the storage unit, so that the terminal device executes the voice wake-up method in each of the above aspects or any possible design thereof.

The chip includes: a processing unit, which may be a processor, and a transceiver unit, which may be an input/output interface, pins or circuits, etc. on the chip. The processing unit may execute computer-executable instructions stored by the storage unit to cause the chip to perform the voice wake-up method in the above aspects or any possible design thereof. Alternatively, the storage unit may be a storage unit (e.g., a register, a cache, etc.) inside the chip, and the storage unit may also be a storage unit (e.g., a read-only memory (ROM)) located outside the chip inside the terminal device, or another type of static storage device (e.g., a Random Access Memory (RAM)) that may store static information and instructions.

The processor mentioned above may be a Central Processing Unit (CPU), a microprocessor or an Application Specific Integrated Circuit (ASIC), or may be one or more integrated circuits for controlling the execution of programs of the voice wake-up method of the above aspects or any possible design thereof.

A third aspect of the embodiments of the present application provides a computer-readable storage medium for storing computer instructions, which, when run on a computer, cause the computer to perform the voice wake-up method provided by the first aspect of the embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to perform the voice wake-up method provided by the first aspect of embodiments of the present application.

A fifth aspect of the embodiments of the present application provides a voice wake-up apparatus, including: a memory, a processor, and a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of the first aspect.

According to the voice awakening method and device provided by the embodiment of the application, the current event is monitored, the scene information is identified according to the monitored event, the first voice information is generated according to the scene information, in addition, the terminal equipment can also awaken the voice engine according to the monitored event, and the first voice information is played through the awakened voice engine. In addition, after the voice engine enters the wake-up mode, the terminal equipment can actively send out voice so as to realize voice interaction with the user, and therefore the experience of the user can be intelligentized.

Drawings

Fig. 1 is a schematic view of an application scenario of a voice wakeup method according to an embodiment of the present application;

fig. 2 is a system architecture diagram of the terminal device 20 of fig. 1;

fig. 3 is a schematic flowchart of a voice wake-up method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a voice wake-up apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

Hereinafter, some terms in the present application are explained to facilitate understanding by those skilled in the art.

1) A terminal device, which may also be referred to as a User Equipment (UE), an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent, or a user equipment. The terminal device may be a Station (ST) in a Wireless Local Area Network (WLAN), and may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA) device, a handheld device with a wireless communication function, a computing device or other processing device connected to a wireless modem, a vehicle-mounted device, a wearable device, and a next-generation communication system, for example, a terminal device in a fifth-generation communication (5G) network or a terminal device in a Public Land Mobile Network (PLMN) network for future evolution, a terminal device in a new air interface (NR) communication system, and the like.

By way of example and not limitation, in the embodiments of the present application, the terminal device may also be a wearable device. Wearable equipment can also be called wearable intelligent equipment, is the general term of applying wearable technique to carry out intelligent design, develop the equipment that can dress to daily wearing, like glasses, gloves, wrist-watch, dress and shoes etc.. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction and cloud interaction. The generalized wearable smart device includes full functionality, large size, and can implement full or partial functionality without relying on a smart phone, such as: smart watches or smart glasses and the like, and only focus on a certain type of application functions, and need to be used in cooperation with other devices such as smart phones, such as various smart bracelets for physical sign monitoring, smart jewelry and the like.

In addition, the terminal device may further include a drone, such as an onboard communication device on the drone.

2) The voice assistant is an application program which can realize inquiry and operation through voice interaction, and the convenience of operating the mobile phone in different scenes can be greatly improved through the application program. The voice engine is a core part of the voice assistant application program, and voice interaction between a user and the terminal equipment can be realized through the voice engine.

3) Units in this application refer to functional units or logical units. It may be in the form of software whose function is carried out by a processor executing program code; but may also be in hardware.

4) "plurality" means two or more, and other terms are analogous. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The ranges described as "above" or "below" and the like include boundary points.

Those skilled in the art can understand that the voice wake-up method provided in the embodiments of the present application can be applied in a scenario in which a voice interaction between a user and a terminal device is realized by waking up a voice engine (or a voice assistant). Fig. 1 is a schematic view of an application scenario of the voice wakeup method according to the embodiment of the present application, as shown in fig. 1, a terminal device 20 may be, for example, a UE, a voice engine is configured in the terminal device 20, and after the voice engine is woken up, a user 10 may perform voice interaction with the terminal device 20, so as to implement an operation on the terminal device 20.

Currently, the wake-up modes for the speech engine in the terminal device 20 include the following:

1) wake up by wake up keywords such as "hello", "Hi", etc.;

2) awakening through a key mode, such as long-pressing a home key, double-clicking a power key, a vehicle-mounted voice key, a remote controller key, awakening a voice engine (assistant) and the like;

3) for some specific products, the voice engine is always operated after the system is started, and the voice call of the user is waited in the background;

4) directly starting a voice engine in a specific mode (driving mode) and waiting for voice interaction all the time;

5) the voice engine is directly started in a specific application (hundredth map) and voice interaction is always waited.

However, in the above wake-up modes, the first two wake-up modes need to rely on the wake-up word and the wake-up button, which causes the wake-up mode to be too hard and unable to meet the user's experience requirement for intelligent voice interaction, so that the interaction between the terminal device and the user is not intelligent enough. In addition, after the speech engine enters the wake-up mode, the terminal device needs to wait for the user to actively interact, which results in an insufficiently intelligent user experience.

In view of these situations, the embodiment of the present application provides a voice wake-up method, as shown in fig. 1, a terminal device 20 comprehensively identifies scene information according to a monitored event by monitoring a currently occurring event, for example, one or more of a currently occurring device event, a system event, and an application event, and generates first voice information according to the scene information, in addition, the terminal device 20 may also wake up a voice engine according to the monitored event, so as to play the first voice information through the woken-up voice engine, a user 10 may input second voice information to the terminal device 20 after hearing the first voice information, so that the terminal device may perform voice interaction with the user 10 after actively waking up the voice engine, thereby not only enabling interaction between the terminal device 20 and the user 10 to be intelligent, but also avoiding the voice engine from being in a running state all the time, thereby reducing the power consumption of the terminal device 20. In addition, after the voice engine enters the wake-up mode, the terminal equipment can actively send out voice so as to realize voice interaction with the user, and therefore the experience of the user can be intelligentized.

In addition, fig. 2 is a system architecture diagram of the terminal device 20 in fig. 1, and as shown in fig. 2, the terminal device 20 includes device hardware including: a Liquid Crystal Display (LCD) for displaying images or videos, a Wireless Local Area Network (WLAN), Bluetooth (BT), buttons, a Touch screen (TP), a microphone for voice input and a speaker for voice output, a sensor, a Global Positioning System (GPS), a Universal Serial Bus (USB) interface, a General Packet Radio Service (GPRS), a camera, a fingerprint module, a Central Processing Unit (CPU), and a memory. The keys include a power-on key, a volume key and the like. The keys can be mechanical keys or touch keys. The terminal device 20 may receive a key input, and generate a key signal input related to user setting and function control of the terminal device. The sensors may include pressure sensors, gyroscope sensors, air pressure sensors, magnetic sensors, acceleration sensors, distance sensors, proximity light sensors, fingerprint sensors, temperature sensors, touch sensors, ambient light sensors, bone conduction sensors, and the like.

The terminal device 20 includes software modules including: an Operating System (OS) platform layer, an OS System service layer, a Software Development Kit (SDK) Application Programming Interface (API), and an Application layer. The OS platform Layer comprises a resource virtual and Hardware Abstraction (HAL) interface and an OS driving Layer, wherein the resource virtual and Hardware Abstraction (HAL) interface is used for abstracting and isolating upper-Layer services and bottom-Layer drives, and when the lower-Layer Hardware drives change, OS system services do not need to change; the OS driving layer comprises a Linux kernel driving system and a microkernel driving system; the OS system service layer comprises a system service module, an event service module, an Artificial Intelligence (AI) intelligent module, a key service module and a voice service module, wherein the system service module is used for providing hardware services and platform services, the event service module belongs to the platform services and is used for monitoring and acquiring the current events in the terminal equipment 20, and the AI intelligent module is used for identifying scene information and awakening a voice engine according to the events monitored and acquired by the event service module; the key service module is used for collecting signals input by a user through keys and a touch screen, and the voice service module is used for receiving voice information input by the user through a microphone and outputting corresponding voice information to the user through a loudspeaker according to a voice engine.

The device hardware has respective hardware service modules corresponding to the OS system service layer. For example, the LCD service triggers a screen on/off event, the WLAN and GPRS services report a network state change event, the bluetooth service sends a voice and media connection event, the USB service sends a USB plug event and a charging event, the GPS location service sends a positioning reporting event, the sensor service reports a sensor event, the camera service reports a photographing and recording event, the fingerprint service reports a fingerprint identification event, and the like. In addition, other software services are also included, for example, an account service can send out account login and exit events, and a file management service can report file copying and deleting events and the like.

The SDK API is a calling interface reserved for an application program by an operating system, and the application program enables the operating system to execute the command of the application program by calling the API of the operating system; the Application layer comprises a system Application program (APP), an APP and a system voice assistant, wherein the system APP relates to a telephone, receives messages (short messages and instant messages) and alarm events, the system voice assistant comprises a voice engine to process voice information input by a user and output the voice information to the user, and the APP can trigger Application starting, foreground and background switching and service scene events, such as playing, pausing or starting navigation events and the like.

On the basis of the above fig. 2, the terminal device monitors events occurring in the application layer and the OS system service layer through the event service module, and wakes up the speech engine through the AI intelligentization module to actively perform speech interaction with the user, so that the user experience can be improved.

The technical solution of the present application will be described in detail below with reference to specific examples. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 3 is a flowchart illustrating a voice wake-up method according to an embodiment of the present application. On the basis of the application scenario shown in fig. 1 and the system architecture of the terminal device shown in fig. 2, as shown in fig. 3, in this embodiment, the voice wakeup method may include the following steps:

step 301: the events that currently occur are monitored.

In this step, the terminal device monitors a currently occurring event, which may be an event that is currently occurring and can interact with a user. The event comprises one or more of a system event, an application event and a device event, and further comprises one or more of a system service event, an application state event, a terminal device state event, a terminal network connection state event, a terminal device position state event and a terminal device connection state event.

For example, the system service event may include a screen locking and unlocking event, a screen capturing and recording event, and an input method calling event; application state events may include, for example, phone events, short message events, instant message events, alarm events; carrying out AI deep analysis based on the application and then outputting intelligent events, such as express receiving, getting on and off duty and arriving at an airport; terminal device state events may include, for example, power on/off events, screen wake-up, battery low events; terminal device location status events may include, for example, initiating positioning and navigation events; the terminal device connection status event may include, for example, connecting a car machine, connecting a bluetooth headset, and the like.

In one possible implementation, the currently occurring event is monitored, including monitoring whether a change in the state of the event has occurred. For example: if the event is a telephone event, monitoring whether a new incoming call exists currently, if the event is a short message event, monitoring whether a new short message exists currently, and if the event is a terminal equipment position state event, monitoring whether the current position of the terminal equipment changes or not.

Step 302: and identifying scene information according to the event, and awakening a voice engine.

In this step, after monitoring the current event, the terminal device may identify scene information according to the monitored event.

In a possible implementation manner, if there is one event monitored by the terminal device, the scene information may be identified according to the event. For example, if the event detected by the terminal device is a telephone event, that is, a new incoming call exists, the terminal device may identify that the scene information is an incoming call scene according to the event; or, if the event monitored by the terminal device is a startup event, the terminal device may identify that the scene information is a startup scene according to the event; or, if the event monitored by the terminal device is a short message event or an instant message event, that is, a new short message or a new instant message exists, the terminal device may identify that the scene information is a message chat scene according to the event; or, if the event monitored by the terminal device is an alarm event and the current time is determined to be in the morning, the terminal device may identify that the scene information is in the morning getting-up scene according to the event; or, if the terminal device monitors that the user presses the earphone play key event, the terminal device may identify the scene information as a music listening scene according to the event; or, if the terminal device monitors a low battery event, the terminal device may identify the scene information as a scene needing charging according to the event; alternatively, if the terminal device detects an event of connecting to the wireless network from the mobile network, the terminal device may recognize that the scene information is a home scene or the like according to the event.

In another possible implementation manner, if the number of the events monitored by the terminal device is at least two, the scene information may be identified comprehensively according to the at least two events. For example, if the event monitored by the terminal device includes a phone event and a positioning and navigation start event, the terminal device may comprehensively identify the scene information as an incoming call scene in the navigation process according to the event; or, if the events monitored by the terminal device include a power-on event and multiple instant message events, the terminal device may identify, according to the events, that the scene information is a scene in which messages and chats are viewed at the first time after power-on.

For example, after the terminal device monitors a current event, the terminal device may further wake up the speech engine according to the monitored event. That is to say, the terminal device can enter the speech recognition and interaction scene by using the event point of the terminal device itself and intelligently awakening the speech engine, so that the speech interaction with the user can be actively carried out, and thus, the phenomenon that the speech engine can only be awakened by means of keywords or keys in the prior art can be avoided, and the interaction between the user and the terminal device is more intelligent.

For example: the terminal device wakes up a voice engine in the terminal device when monitoring one or more events of a power-on/off event, a screen wake-up event, a screen unlocking event, a battery low-battery event, a telephone event, a short message event, an instant message event, an alarm clock event, a positioning and navigation starting event, a vehicle connecting machine, an intelligent event such as an express receiving event, a work-on/off event and an airport arrival event.

It should be noted that the above events are merely examples, and in practical applications, the terminal device may also monitor other events, so as to recognize scene information according to the other events and wake up the speech engine.

Step 303: and generating first voice information according to the scene information, and playing the first voice information through a voice engine.

In this step, after recognizing the scene information, the terminal device may generate the first voice information according to the recognized scene information. The recognized scene information is different, and the first voice information generated by the terminal equipment is different. In a specific implementation process, the corresponding first voice information may be generated according to the preset corresponding relationship between the scene information and the voice information.

For example, if the terminal device recognizes that the scene information is a power-on scene, the terminal device may generate a first voice message, "i power on, the owner is good in the morning, what is shown? "; or, if the terminal device recognizes that the scene information is an incoming call scene, the terminal device may generate the first voice information "XX call, whether answering is needed? "; alternatively, if the terminal device recognizes that the scene information is a message chat scene, the terminal device may generate the first voice information "XX" to be informed, ask you' what time to meet? "; or, if the terminal device recognizes that the scene information is a morning getting-up scene, the terminal device may generate a first voice message "master, which is getting-up"; or, if the terminal device recognizes that the scene information is a music listening scene, the terminal device may generate the first voice information "open XX music listening song? "; or, if the terminal device recognizes that the scene information is a low battery scene while listening to music, the terminal device may generate a first voice message "master, break, remember charge? "; or, if the terminal device recognizes that the scene information is a scene of going home, the terminal device may generate a first voice message "owner, welcome to go home, what is shown? "and the like.

After the terminal equipment generates the first voice information, the first voice information can be played through the awakened voice engine, so that voice interaction with a user is actively carried out.

In a possible implementation manner, the first voice information is generated according to the scene information, and the first voice information is played through the voice engine, including determining the user intention of the user according to the scene information, generating semantic text information corresponding to the first voice information according to the user intention, and then sending the semantic text information to the voice engine to play the first voice information.

Specifically, after recognizing the scene information, the terminal device may determine the most reasonable user intention, that is, what the user wants to do next, according to the scene information, so that semantic text information corresponding to the first speech information may be generated according to the most reasonable user intention. In one possible implementation, the determination may be made based on historical behavior information of the user when determining the user's intent. For example: if the scene information is the scene of getting up in the morning, according to the historical behavior of the user, it can be determined that the number of times that the user delays the alarm clock for five minutes after the alarm clock rings within the past preset time period exceeds a preset threshold, and it can be determined that the most reasonable user intention of the user is to delay the alarm clock, so that the terminal device can generate semantic text information whether to delay the alarm clock for five minutes or not.

In another possible implementation, the determination may also be made from big data when determining the user intent. For example, the most reasonable user intention of the user is determined according to the behavior habit information of most users. For example: if the scene information is a startup scene, statistics shows that users exceeding a preset number can inquire weather conditions when the computer is started. Then the terminal device determines that the most reasonable user intention of the user is to inquire weather according to the starting scene, so that the terminal device can generate semantic text information of ' today's weather is clear, breeze is 2-level, the highest temperature is 35 ℃, and attention is given to sun protection '.

The terminal equipment generates semantic text information according to the scene information, and then sends the semantic text information to the voice engine, and the voice engine identifies the semantic text information, so that first voice information corresponding to the semantic text information is played, and the purpose of actively performing voice interaction with a user is achieved.

In this embodiment, the terminal device determines the user intention of the user according to the scene information, so that semantic text information corresponding to the first voice information is generated according to the user intention, and the semantic text information is sent to the voice engine to play the first voice information, so that the content most desired by the user can be played, and the user experience is improved.

Illustratively, after the terminal device plays the first voice message through the voice engine, the terminal device will receive the second voice message input by the user, thereby completing the voice interaction with the user.

For example: if the terminal device plays the first voice message, "i turn on, the host is good in the morning, what is shown? ", the user may enter a second voice message" help me launch XX application "; or, if the terminal device plays the first voice message "XX call, whether answering is needed? "the user can input the second voice message" answer the bar, help me open note book by the way ", at this moment the terminal equipment can further output" good. Or, if the terminal device plays the first voice message "95.. incoming call", the user may input the second voice message "no connect", and at this time, the terminal device may further output "has been hung up", thereby completing the voice interaction with the user; or, if the terminal device plays the first voice message "XX for information, ask you' what time to take a meeting? "the user may input a second voice message" 4 pm ", at which time the terminal device may further output" how is a meeting reminder to be set? "; or, if the terminal device plays the first voice message "owner, should get up.,. get up", the user may input the second voice message "get up", and call me to work after half an hour ", at this time, the terminal device may further output" good ", the user may further input the voice message" about, how do the weather today? "the terminal equipment outputs" well, clear, breeze 2 level, highest temperature 35 degree, attention sun-proof "to complete the voice interaction with the user; or, if the terminal device plays the first voice message "open XX music to listen to a song", the user may input the second voice message "no, YY music bar", and then the terminal device may further output "what song to listen to? ", to complete a voice interaction with the user, and so on.

Further, in order to reduce the power consumption of the terminal device, the speech engine is turned off when it is determined that the terminal device satisfies the preset condition.

Wherein the preset condition comprises at least one of the following conditions: the time that the screen is in the screen-off state exceeds a first preset time, the time that the event exits and the time that no interactive operation is conducted with the user exceeds a second preset time, and the time of no interactive operation is the duration time that no operation terminal equipment is detected.

Specifically, if the time that the screen of the terminal device is in the screen-off state exceeds the first preset time, it indicates that the terminal device may be in a state that is not operated by the user at this time, and at this time, the speech engine may be turned off, so that the power consumption of the terminal device may be saved.

If the terminal device detects that a previous event exits, for example, the alarm clock rings off or the time for hanging up the phone exceeds a third preset time, the awakened speech engine can also be turned off.

If the terminal device detects that the time of no interactive operation with the user exceeds the second preset time, for example, the duration of the type of screen touch, voice or gesture and the like of the terminal device is not detected to exceed the second preset time, the awakened voice engine can be closed, so that the power consumption of the terminal device is reduced.

It should be noted that the terminal device may also turn off the speech engine when the above conditions are met simultaneously, or when at least two conditions are met. For example: when the time that the screen is in the screen-off state exceeds a first preset time and the event exits, closing the voice engine; or when the time that the screen is in the screen-off state exceeds a first preset time and the time that no interactive operation is performed with the user exceeds a second preset time, closing the voice engine; or, when the event exits and the time of no interactive operation with the user exceeds a second preset time, closing the voice engine; or, when the time that the screen is in the screen-off state exceeds a first preset time, the event exits, and the time that no interactive operation is performed with the user exceeds a second preset time, the speech engine is closed.

The first preset time, the second preset time, and the third preset time may be set according to actual conditions or experience, for example, the first preset time may be set to 20s, the second preset time may be set to 30s, the third preset time may be set to 60s, and the like.

For example, the preset conditions are only examples, and in practical applications, the terminal device may further turn off the speech engine when a new event is monitored, for example, turn off the speech engine when a shutdown event is monitored; or the terminal equipment closes the voice engine when monitoring that the sensor does not work within a first preset time period; or the terminal equipment closes the voice engine and the like when monitoring that the user does not perform any operation on the home interface within a second preset time period.

In this embodiment, when it is determined that the terminal device meets the preset condition, the speech engine is turned off, so that power consumption of the terminal device can be saved.

The embodiment of the application provides a voice awakening method, by monitoring a current event, scene information is identified according to the monitored event, and first voice information is generated according to the scene information, in addition, a terminal device can also awaken a voice engine according to the monitored event, so that the first voice information is played through the awakened voice engine, and thus, after the terminal device actively awakens the voice engine, voice interaction can be carried out with a user, so that interaction between the terminal device and the user is intelligent, the voice engine can be prevented from being always in a running state, and power consumption of the terminal device is reduced. In addition, after the voice engine enters the wake-up mode, the terminal equipment can actively send out voice so as to realize voice interaction with the user, and therefore the experience of the user can be intelligentized.

The following describes the technical solution of the embodiment of the present application in detail, taking the arrival of a user at an airport as a specific example.

The terminal device determines that the user arrives at the airport by detecting the position information of the user, and then the terminal device recognizes that the user needs to go out at present, namely the scene information is a going-out scene, and in addition, the terminal device can awaken a voice engine. Further, the terminal device determines that the most reasonable user intention of the user is the need of handling the trip procedure according to the trip scene, and then generates semantic text information "please go to XX counter for handling the procedure" according to the user intention, and sends the generated semantic text information to the awakening voice engine, so as to play the first voice information corresponding to the semantic text information. Therefore, the terminal equipment can actively awaken the voice engine after recognizing the scene information according to the monitored event and play the first voice information generated according to the scene information, so that voice interaction can be actively carried out with the user, and the interaction between the user and the terminal equipment is more intelligent.

After the terminal device plays the first voice message, the user can input the second voice message to the terminal device according to the actual situation, for example, "at which gate? Therefore, the voice interaction between the user and the terminal equipment is completed, and the user experience is better.

Further, when the terminal device detects that the duration of the operation of the user on the screen of the terminal device, the operation of inputting voice or gestures does not exceed 30s, the voice engine is actively turned off, so that the purpose of saving the power consumption of the terminal device is achieved.

Fig. 4 is a schematic structural diagram of a voice wake-up apparatus according to an embodiment of the present application, where the voice wake-up apparatus 40 may be a terminal device in the foregoing embodiment. Referring to fig. 4, the apparatus includes: a processing unit 11 and a receiving unit 12, wherein:

a processing unit 11 for monitoring a currently occurring event;

the processing unit 11 is further configured to identify scene information according to the event, and wake up a speech engine;

the processing unit 11 is further configured to generate first voice information according to the scene information, and play the first voice information through the voice engine.

The voice awakening device provided by the embodiment of the application, processing unit 11 monitors the current event, scene information is identified according to the monitored event, and first voice information is generated according to the scene information, in addition, processing unit 11 can also awaken the voice engine according to the monitored event, so that the first voice information is played through the awakened voice engine, thus, the terminal equipment can carry out voice interaction with the user after actively awakening the voice engine, thereby not only enabling the interaction between the terminal equipment and the user to be intelligent, but also avoiding the voice engine from being in a running state all the time, and further reducing the power consumption of the terminal equipment. In addition, after the voice engine enters the wake-up mode, the terminal equipment can actively send out voice so as to realize voice interaction with the user, and therefore the experience of the user can be intelligentized.

Illustratively, the events include system events and/or application events.

Illustratively, the event includes one or more of a system service event, an application state event, a terminal device state event, a terminal network connection state event, a terminal device location state event, and a terminal device connection state event.

Illustratively, the processing unit 11 is specifically configured to:

determining the user intention of the user according to the scene information;

Illustratively, the processing unit 11 is further configured to:

Illustratively, the preset condition includes at least one of the following conditions:

the time that the screen is in the screen-off state exceeds first preset time;

the event exits; and

Illustratively, the apparatus further comprises: a receiving unit 12; wherein the content of the first and second substances,

and a receiving unit 12, configured to receive second voice information input by a user.

Illustratively, the processing unit 11 is specifically configured to:

monitoring whether the state of the event changes;

The voice wake-up apparatus provided in the embodiment of the present application may perform the corresponding method embodiment described above, for example, the embodiment shown in fig. 3, and the implementation principle and the technical effect are similar, which are not described herein again.

It should be noted that the division of each unit of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these units can be implemented entirely in software, invoked by a processing element; or may be implemented entirely in hardware; and part of the units can be realized in the form of calling by a processing element through software, and part of the units can be realized in the form of hardware. For example, the sending unit may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of a program, and the function of the sending unit may be called and executed by a processing element of the apparatus. The other units are implemented similarly. In addition, all or part of the units can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, the steps of the method or the units above may be implemented by hardware integrated logic circuits in a processor element or instructions in software. Further, the above receiving unit is a unit that controls reception, and information can be received by a receiving device of the device, such as an antenna and a radio frequency device.

The above units may be one or more integrated circuits configured to implement the above methods, for example: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when the above units are implemented in the form of a processing element scheduler, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling programs. As another example, these units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device includes: processor 110, memory 120, transceiver 130. The transceiver 130 may be connected to an antenna. In the downlink direction, the transceiver 130 receives information transmitted by the base station through the antenna and transmits the information to the processor 110 for processing. In the uplink direction, the processor 110 processes the data of the terminal device and transmits the processed data to the base station through the transceiver 130.

The memory 120 is used for storing a program for implementing the above method embodiment, or each unit in the embodiment shown in fig. 4, and the processor 110 calls the program to execute the operation of the above method embodiment to implement each unit shown in fig. 4.

Alternatively, part or all of the above units may be implemented by being embedded in a chip of the terminal device in the form of an integrated circuit. And they may be implemented separately or integrated together. That is, the above units may be configured as one or more integrated circuits implementing the above methods, for example: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others.

The present application also provides a storage medium comprising: a readable storage medium and a computer program for implementing the voice wake-up method provided by any of the foregoing embodiments.

The present application also provides a program product comprising a computer program (i.e. executing instructions), the computer program being stored in a readable storage medium. The computer program can be read from a readable storage medium by at least one processor of the terminal device, and the computer program can be executed by the at least one processor to enable the terminal device to implement the voice wake-up method provided by the foregoing various embodiments.

The embodiment of the present application further provides a voice wake-up apparatus, which includes at least one storage element and at least one processing element, where the at least one storage element is used to store a program, and when the program is executed, the voice wake-up apparatus is enabled to perform the operation of the terminal device in any of the above embodiments.

All or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The aforementioned program may be stored in a readable memory. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned memory (storage medium) includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape (magnetic tape), floppy disk (optical disc), and any combination thereof.

Claims

1. A voice wake-up method is applied to a terminal device, and comprises the following steps:

monitoring a currently occurring event;

2. The method of claim 1, wherein the event comprises a system event and/or an application event.

3. The method of claim 1 or 2, wherein the event comprises one or more of a system service event, an application state event, a terminal device state event, a terminal network connection state event, a terminal device location state event, and a terminal device connection state event.

4. The method according to any one of claims 1-3, wherein the generating the first voice message according to the scene information and playing the first voice message through the voice engine comprises:

determining the user intention of the user according to the scene information;

5. The method according to any one of claims 1-4, further comprising:

6. The method according to claim 5, wherein the preset condition comprises at least one of the following conditions:

the time that the screen is in the screen-off state exceeds first preset time;

the event exits; and

7. The method of any of claims 1-6, wherein after the playing of the first speech information by the speech engine, the method further comprises:

and receiving second voice information input by the user.

8. The method according to any one of claims 1-7, wherein said monitoring of a currently occurring event comprises:

monitoring whether the state of the event changes;

the wake-up speech engine comprises:

9. A voice wake-up apparatus, comprising:

the processing unit is used for monitoring a current event;

10. The apparatus of claim 9, wherein the event comprises a system event and/or an application event.

11. The apparatus of claim 9 or 10, wherein the event comprises one or more of a system service event, an application state event, a terminal device state event, a terminal network connection state event, a terminal device location state event, and a terminal device connection state event.

12. The apparatus according to any one of claims 9 to 11, wherein the processing unit is specifically configured to:

determining the user intention of the user according to the scene information;

13. The apparatus according to any of claims 9-12, wherein the processing unit is further configured to:

14. The apparatus of claim 13, wherein the preset condition comprises at least one of the following conditions:

the time that the screen is in the screen-off state exceeds first preset time;

the event exits; and

15. The apparatus according to any one of claims 9-14, further comprising:

16. The apparatus according to any one of claims 9 to 14, wherein the processing unit is specifically configured to:

monitoring whether the state of the event changes;

17. A terminal device, comprising:

a processor;

a memory; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-8.

18. A computer-readable storage medium, characterized in that it stores a computer program that causes a terminal device to execute the method of any one of claims 1-8.