WO2022042274A1 - 一种语音交互方法及电子设备 - Google Patents
一种语音交互方法及电子设备 Download PDFInfo
- Publication number
- WO2022042274A1 WO2022042274A1 PCT/CN2021/111407 CN2021111407W WO2022042274A1 WO 2022042274 A1 WO2022042274 A1 WO 2022042274A1 CN 2021111407 W CN2021111407 W CN 2021111407W WO 2022042274 A1 WO2022042274 A1 WO 2022042274A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- electronic device
- screen
- user
- voice
- sensor
- Prior art date
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 107
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000001514 detection method Methods 0.000 claims abstract description 120
- 230000003213 activating effect Effects 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims description 34
- 230000001133 acceleration Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 abstract description 20
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 25
- 230000009471 action Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 10
- 230000000903 blocking effect Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 9
- 230000005236 sound signal Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 238000010295 mobile communication Methods 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 235000003385 Diospyros ebenum Nutrition 0.000 description 1
- 241000792913 Ebenaceae Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72454—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
- H04W52/0251—Power saving arrangements in terminal devices using monitoring of local events, e.g. events related to user activity
- H04W52/0254—Power saving arrangements in terminal devices using monitoring of local events, e.g. events related to user activity detecting a user operation or a tactile contact or a motion of the device
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72469—User interfaces specially adapted for cordless or mobile telephones for operating the device by selecting functions from two or more displayed items, e.g. menus or icons
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/74—Details of telephonic subscriber devices with voice recognition means
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
- H04W52/0261—Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level
- H04W52/0267—Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level by controlling user interface components
- H04W52/027—Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level by controlling user interface components by controlling a display operation or backlight unit
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
- H04W52/0261—Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level
- H04W52/0274—Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level by switching on or off the equipment or parts thereof
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Definitions
- the present application relates to the field of artificial intelligence, and in particular, to a voice interaction method and electronic device.
- voice assistants users that can perform voice interaction with users, such as voice assistants. Users can interact with electronic devices through voice assistants to achieve functions that previously required multiple manual operations. For example, make a phone call, play music, etc.
- the electronic device when a user interacts with an electronic device with a screen through a voice assistant, the electronic device will directly light up the screen to display the user interface of the voice assistant and related content obtained by executing the received voice command.
- the present application provides a voice interaction method and electronic device.
- the electronic device in the off-screen state can detect whether the user needs to watch the screen to intelligently decide whether to turn on the screen when the voice assistant is activated.
- the electronic device can keep the screen in an off-screen state and interact with the user by means of voice. In this way, the electronic device can save power consumption and avoid accidental touches.
- the present application provides a voice interaction method.
- the method includes: the electronic device can detect the first operation of the user when the screen is in the off-screen state, and the first operation can be used to start the voice assistant.
- the electronic device may start the voice assistant while keeping the screen off, and make the voice assistant interact with the user in a first manner, which is to interact with the user only through voice.
- the first situation can include any of the following:
- the electronic device in combination with the first aspect, can light up the screen, activate the voice assistant, and make the voice assistant interact with the user in a second manner, which includes interacting with the user through a graphical interface.
- the second situation includes any of the following:
- a human face is detected by the second sensor; or,
- the above-mentioned first sensor may include one or more of the following: a proximity light sensor, an infrared light sensor, and a radar sensor.
- the aforementioned second sensor may include a camera.
- the above-mentioned third sensor may include a motion sensor.
- the motion sensor includes one or more of the following: acceleration sensor, gyroscope sensor.
- the above-mentioned first situation may be a situation in which the user does not watch the screen of the electronic device.
- the first sensor of the electronic device may detect that there is an object blocking within a preset distance of the screen.
- the electronic device can determine that the user is not viewing the screen of the electronic device.
- the first sensor of the electronic device can detect that there is no object within a preset distance of the screen, and pass the second sensor to the screen. The sensor did not detect a face.
- the electronic device can determine that the user is not viewing the screen of the electronic device.
- the above-mentioned second situation may be a situation in which the user watches the screen of the electronic device.
- the first sensor of the electronic device can detect that there is no object blocking within a preset distance of the screen, And the face is detected by the second sensor.
- the electronic device can determine that the user is viewing the screen of the electronic device.
- the screen of the electronic device after posture adjustment can be opposite to the face, and the electronic device can pass the third
- the sensor detects that the posture of the electronic device is switched from the first posture to the second posture.
- the electronic device can determine that the user is viewing the screen of the electronic device.
- the above-mentioned first posture may be, for example, a posture in which the screen of the electronic device is placed horizontally upward.
- the above-mentioned second posture may be, for example, a posture in which the screen is tilted upward.
- the electronic device can keep the screen in an off-screen state and perform voice interaction with the user when detecting that the user does not need to watch the screen, thereby saving power consumption of the electronic device and avoiding false touches.
- the electronic device in the off-screen state when the electronic device in the off-screen state can activate the voice assistant, first use a first sensor (eg, a proximity light sensor) to detect whether there is an object blocking within a preset distance of the screen. If it is detected that there is an object blocking within the preset distance of the screen, the electronic device can directly determine that the user is not watching the screen. In this way, the electronic device may not activate the second sensor (such as a camera) to detect the human face, thereby saving power consumption of the electronic device. Since the electronic device cannot directly determine whether the user is viewing the screen when the screen of the electronic device is not blocked, the electronic device can further detect the face to determine whether the user is viewing the screen.
- a first sensor eg, a proximity light sensor
- the electronic device can then activate the second sensor to detect whether there is a human face. If a human face is detected, the electronic device can determine that the user is viewing the screen, and then light up the screen, and interact with the user through a graphical interface and a voice. If no face is detected, the electronic device can determine that the user is not watching the screen, and then keeps the screen off, and only interacts with the user through voice.
- the above-mentioned first sensor, the above-mentioned second sensor, and the above-mentioned third sensor can all continue to work during the process of the electronic device interacting with the user through the voice assistant.
- the electronic device may light up the screen and cause the voice assistant to interact with the user in the second manner. That is to say, if it is determined when the voice assistant is activated that the user is not watching the screen, the electronic device can keep the screen off and interact with the user only through voice. Further, in the above-mentioned process of interaction only through voice, if it is determined that the user is watching the screen, the electronic device can light the screen to interact with the user through a graphical interface and voice.
- the electronic device may receive the first voice input by the user, and recognize the first voice.
- the electronic device can light up the screen and cause the voice assistant to interact with the user in a second manner.
- that the first voice satisfies the first condition may include: the first voice includes one or more of the following: a first type of keywords and a second type of keywords, wherein the first type of keywords includes one or more of the following categories Application name: video, shopping, navigation; the second type of keywords includes one or more of the following verbs: view, display.
- the electronic device can also judge whether the user needs to watch the screen by analyzing the received voice command input by the user. Among them, for video, shopping, and navigation applications, users often need to watch the screen. When detecting that the voice command contains the application programs of the above categories, the electronic device can determine that the user needs to watch the screen. In addition, if there is a verb in the voice command indicating that the user needs to watch the screen, such as viewing, displaying, etc., the electronic device may also consider that the user needs to watch the screen.
- the electronic device Since the electronic device determines that the user is not watching the screen according to the first sensor and/or the second sensor when the voice assistant is activated, the electronic device can first keep the screen off and interact with the user through voice only. In the above-mentioned process of interaction only by voice, if it is detected that the received voice command contains the above-mentioned first type of keywords and/or the above-mentioned second type of keywords, the electronic device can light up the screen, and through the graphical interface and voice way to interact with users.
- the electronic device detects the first condition while the voice assistant is interacting with the user in the second manner. Further, the electronic device can turn off the screen and make the voice assistant interact with the user in the first way. That is to say, if the electronic device detects that the user is viewing the screen by using one or more of the first sensor, the second sensor and the third sensor, the electronic device can light up the screen and communicate with the user by means of a graphical interface and voice. interact.
- the electronic device can turn off the screen and interact with the user only through voice. In this way, the sharing of electronic devices can be saved and accidental touches can be avoided.
- the electronic device in the process of interacting with the user through the graphical interface and the voice, can respond to the user's operation of stopping the voice playback, no longer interact with the user through the voice, but only through the graphics. The way the interface interacts with the user.
- the above-mentioned first sensor, the above-mentioned second sensor, and the above-mentioned third sensor continue to work during the process of the electronic device interacting with the user through the voice assistant.
- the electronic device can detect in real time whether the user needs to watch the screen during the interaction between the voice assistant and the user. If it is detected that the user needs to watch the screen, the electronic device can light the screen. If it is detected that the user does not need to watch the screen, the electronic device may turn off the screen. It can be known from the above method that the electronic device can intelligently decide whether to light up the screen during the interaction between the voice assistant and the user. This not only saves the sharing of electronic devices and avoids accidental touches, but also improves the user's experience of using the voice assistant.
- the time when the electronic device detects the first situation may include any one of the following situations:
- the electronic device Upon detecting the first operation, the electronic device detects the first condition. or,
- the electronic device At a first time after detecting the first operation, the electronic device detects the first situation; wherein the interval between the first time and the time when the electronic device detects the first operation is less than the first duration. or,
- the electronic device At a second time before the first operation is detected, the electronic device detects the first situation; wherein the interval between the second time and the time when the electronic device detects the first operation is less than the second duration.
- the time when the electronic device detects the second situation may include any one of the following situations:
- the electronic device Upon detecting the first operation, the electronic device detects the second condition. or,
- the electronic device At a first time after detecting the first operation, the electronic device detects the second situation; wherein the interval between the first time and the time when the electronic device detects the first operation is less than the first duration. or,
- the electronic device At a second time before the first operation is detected, the electronic device detects the second situation; wherein the interval between the second time and the time when the electronic device detects the first operation is less than the second duration.
- the electronic device enables the voice assistant to interact with the user in the first manner, which may be specifically: the electronic device enables the voice assistant to only run the first program. Alternatively, the electronic device may cause the voice assistant to run the second program and the first program.
- the first program may be a program for performing voice interaction with the user
- the second program may be a program for obtaining a graphical interface for interacting with the user.
- the electronic device detects the first operation when the screen is in the off-screen state, which may be specifically: the electronic device receives the second voice input by the user when the screen is in the off-screen state.
- the second speech may include a wake word for activating the voice assistant.
- the electronic device detects a long-press operation acting on the first key when the screen is in an off-screen state.
- the first key may include one or more of the following: a power key, a volume up key, and a volume down key.
- the implementation of the present application provides an electronic device.
- the electronic device may include: a screen, an input device, a detection device, and at least one processor.
- the above detection device includes one or more of the following: a first sensor and a second sensor.
- the input device can be used to detect the user's first operation when the screen is in an off-screen state; the first operation is used to activate the voice assistant.
- the detection device may be configured to detect whether the first situation exists when the above-mentioned input device detects the first operation of the user.
- the first situation includes any one of the following:
- the processor may be configured to start the voice assistant while keeping the screen in an off-screen state when the detection device detects the first situation, and make the voice assistant interact with the user in the first manner.
- the first way may be to interact with the user only by voice.
- the detection device further includes a third sensor.
- the detection device can also be used to detect whether there is a second situation; wherein, the second situation includes any one of the following:
- a human face is detected by the second sensor; or,
- the processor can also be used to light up the screen, activate the voice assistant, and make the voice assistant interact with the user in a second manner when the detection device detects the second situation.
- the second approach may include interacting with the user through a graphical interface.
- the second way may be to interact with the user through a graphical interface and voice.
- the first sensor may include one or more of the following: a proximity light sensor, an infrared light sensor, and a radar sensor.
- the second sensor may include a camera.
- the third sensor may include a motion sensor; wherein the motion sensor includes one or more of the following: an acceleration sensor, a gyroscope sensor.
- the above-mentioned first sensor, the above-mentioned second sensor, and the above-mentioned third sensor can all continue to work during the process of the electronic device interacting with the user through the voice assistant.
- the detection device may be further configured to detect whether the second situation exists during the process of the voice assistant interacting with the user in the first manner.
- the processor may also be configured to light up the screen and cause the voice assistant to interact with the user in a second manner when the detection device detects the second situation, where the second manner includes interacting with the user through a graphical interface.
- the input device may be further configured to receive the first voice input by the user during the process of the voice assistant interacting with the user in the first manner.
- the processor may also be configured to recognize the first voice, and in the case of recognizing that the first voice meets the first condition, light up the screen, so that the voice assistant interacts with the user in a second manner.
- the second way includes interacting with the user through a graphical interface.
- that the first voice satisfies the first condition may include: the first voice includes one or more of the following: a first type of keywords and a second type of keywords, wherein the first type of keywords includes one or more of the following categories Application name: video, shopping, navigation; the second type of keywords includes one or more of the following verbs: view, display.
- the detection device may be further configured to detect whether the first situation exists during the process of the voice assistant interacting with the user in the second manner.
- the processor may also be configured to turn off the screen when the detection device detects the first situation, and make the voice assistant interact with the user in the first manner.
- the electronic device can keep the screen in an off-screen state and perform voice interaction with the user when detecting that the user does not need to watch the screen, thereby saving power consumption of the electronic device and avoiding false touches.
- the implementation of the present application further provides a voice interaction method.
- the method includes: the electronic device can detect the first operation of the user when the screen is in the off-screen state, and the first operation is used to start the voice assistant.
- the electronic device may start the voice assistant while keeping the screen off, and make the voice assistant interact with the user in a first manner, where the first manner is to interact with the user only through voice.
- the third situation may include: detecting a human face and detecting the first gesture through the camera.
- the electronic device may, in the fourth case, light up the screen, activate the voice assistant, and make the voice assistant interact with the user in a second manner, where the second manner includes interacting with the user through a graphical interface.
- the fourth situation may include: a face is detected by the camera and the first gesture is not detected.
- the electronic device can determine whether a face is detected and whether a first gesture is detected according to an image captured by a camera, and then detect whether the user needs to watch the screen.
- the electronic device can determine whether to turn on the screen when the voice assistant is activated in the off-screen state according to whether the camera needs to watch the screen.
- the electronic device can keep the screen in an off-screen state and perform voice interaction with the user.
- the user does not need to perform a corresponding operation to turn off the screen after the electronic device lights the screen, thereby simplifying the user's operation of using the electronic device as a speaker.
- an implementation of the present application provides an electronic device.
- the electronic device may include: a screen, an input device, a camera, and at least one processor.
- the input device can be used to detect the first operation of the user when the screen is off; the first operation is used to start the voice assistant.
- the camera can be used to detect whether there is a third situation when the input device detects the first operation of the user; wherein the third situation includes: detecting a human face and detecting the first gesture through the camera.
- the processor may be configured to start the voice assistant while keeping the screen off when the camera detects the third situation, and make the voice assistant interact with the user in a first manner; the first manner is to interact with the user only by voice.
- the camera can also be used to detect whether there is a fourth situation; the fourth situation includes: a face is detected by the camera and the first gesture is not detected.
- the processor may also be configured to light up the screen, activate the voice assistant, and make the voice assistant interact with the user in a second manner, including interacting with the user through a graphical interface, when the camera detects the fourth situation.
- an embodiment of the present application provides a chip, the chip is applied to the electronic device provided in the second aspect or the electronic device provided in the fourth aspect, the chip includes one or more processors, and the one or more processors use Invoking computer instructions to cause the electronic device provided in the second aspect to perform any possible implementation manner of the first aspect, or to cause the electronic device provided in the fourth aspect to perform any possible implementation manner of the third aspect.
- an embodiment of the present application provides a computer program product that includes instructions, when the computer program product is run on a device, the electronic device provided in the second aspect above executes any of the possible implementations in the first aspect. , or make the electronic device provided in the fourth aspect perform any of the possible implementation manners of the third aspect.
- an embodiment of the present application provides a computer-readable storage medium, including instructions, when the above-mentioned instructions are executed on the device, the electronic device provided in the above-mentioned second aspect can execute any possible implementation manner of the first aspect. , or make the electronic device provided in the fourth aspect perform any of the possible implementation manners of the third aspect.
- the chip provided in the fifth aspect, the computer program product provided in the sixth aspect, and the computer-readable storage medium provided in the seventh aspect are all used to execute the method provided by the embodiments of the present application. Therefore, for the beneficial effects that can be achieved, reference may be made to the beneficial effects in the corresponding method, which will not be repeated here.
- FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- FIG. 2 and FIG. 3 are schematic diagrams of scenarios in which the electronic device keeps the screen in an off-screen state and performs voice interaction with a user according to an embodiment of the present application;
- Figure 4A, Figure 4B, Figure 5A, Figure 5B, Figure 6A and Figure 6B are schematic diagrams of scenarios in which the electronic device provided by the embodiment of the application lights up the screen and interacts with the user by means of a graphical interface and voice;
- FIGS. 7A to 7E are schematic diagrams of a group of voice interaction scenarios provided by an embodiment of the present application.
- FIGS. 8A to 8D are schematic diagrams of another group of voice interaction scenarios provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of a voice interaction scenario provided by an embodiment of the present application.
- FIG. 10 is a schematic diagram of another voice interaction scenario provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of another electronic device provided by an embodiment of the application.
- FIG. 13 is a flowchart of another voice interaction method provided by an embodiment of the present application.
- the electronic device when a user interacts with an electronic device with a screen through a voice assistant, the electronic device can output voice in response to the user's voice command. In addition, the electronic device will directly light up the screen to display the user interface of the voice assistant and related content obtained by executing the received voice command. However, in some scenarios, such as when the electronic device is placed in a pocket, the user does not need to watch the screen when interacting with the electronic device through the voice assistant. However, the electronic device lighting up the screen to display the user interface of the voice assistant will waste the power consumption of the electronic device and easily lead to false touches.
- Electronic devices may be configured with detection means, eg, cameras, proximity light sensors, motion sensors.
- the electronic device when receiving the first operation for activating the voice assistant, for example, when receiving a voice input containing a preset wake-up word, the electronic device can activate the above detection device to detect whether the user needs to watch the screen.
- the electronic device can perform voice interaction with the user without lighting the screen. In this way, in some scenarios where the user does not need to watch the screen, the electronic device can only interact with the user through voice without lighting the screen, thereby saving power consumption of the electronic device and reducing false touches.
- the electronic device may use a camera to capture an image, and detect whether the user needs to watch the screen according to whether the image captured by the camera includes a human face.
- the electronic device can run the voice assistant in the background, and interact with the user by means of voice, without displaying the user interface of the voice assistant.
- the electronic device can use the proximity light sensor to determine whether the screen of the electronic device is blocked.
- the screen of the electronic device is blocked, for example, the electronic device is placed in a pocket or placed on a table under the screen of the electronic device, the user generally does not need to watch the screen. That is, when it is detected that the screen of the electronic device is blocked according to the proximity light sensor, the electronic device can run the voice assistant in the background and interact with the user through voice without displaying the user interface of the voice assistant.
- the electronic device can use the motion sensor to detect the change of the posture of the electronic device, and detect whether the user needs to watch the screen according to the change of the posture of the electronic device. For example, when the user picks up the electronic device and performs a hand-raising action or a flipping action, the posture of the electronic device changes.
- the electronic device can light up the screen to realize raising the hand to brighten the screen.
- the electronic device can interact with the user through voice without displaying the user interface of the voice assistant.
- the electronic device detects the hand-raising action the electronic device can light up the screen, display the user interface of the voice assistant, and interact with the user in a way of combining voice.
- the electronic device may also detect whether the screen is blocked by other types of sensors, such as an infrared light sensor, a radar sensor, and the like.
- the electronic device can also detect whether the screen is opposite to the face through other types of sensors.
- FIG. 1 exemplarily shows a schematic structural diagram of an electronic device 100 .
- the electronic device 100 shown in FIG. 1 is only an example, and the electronic device 100 may have more or less components than those shown in FIG. 1 , may combine two or more components, or may Available in different parts configurations.
- the various components shown in FIG. 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
- the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
- the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an acceleration sensor 180E, a proximity light sensor 180G, a fingerprint sensor 180H, a touch sensor 180K, and the like.
- the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 .
- the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- graphics processor graphics processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the processor 110 may include a voice wake-up module and a voice command recognition module.
- the voice wake-up module and the voice command recognition module can be integrated in different processor chips and executed by different chips.
- the voice wake-up module can be integrated in a coprocessor or DSP chip with lower power consumption
- the voice command recognition module can be integrated in an AP or NPU or other chips.
- the voice wake-up module and the voice command recognition module can be integrated in the same processor chip, and the same chip performs related functions.
- both the voice wake-up module and the voice command recognition module can be integrated in the AP chip or the NPU or other chips.
- the processor 110 may further include a voice instruction execution module, that is, after recognizing the voice instruction, execute the operation corresponding to the voice instruction.
- a voice instruction execution module that is, after recognizing the voice instruction, execute the operation corresponding to the voice instruction.
- voice assistants may be an application that includes voice command recognition functionality. After recognizing the voice command, the voice assistant can directly perform the operation corresponding to the voice command. Alternatively, if the operation corresponding to the voice instruction involves a third application, the voice assistant may call the third application to perform the corresponding operation.
- a memory may also be provided in the processor 110 for storing instructions and data.
- the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
- the charging management module 140 is used to receive charging input from the charger.
- the charger may be a wireless charger or a wired charger. While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
- the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
- the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
- the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
- Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
- at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
- the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
- WLAN wireless local area networks
- BT Bluetooth
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication
- IR infrared technology
- the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
- Display screen 194 is used to display images, videos, and the like.
- Display screen 194 includes a display panel.
- the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
- LED diode AMOLED
- flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
- the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
- the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
- the ISP is used to process the data fed back by the camera 193 .
- the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
- the ISP may be provided in the camera 193 .
- Camera 193 is used to capture still images or video.
- the object is projected through the lens to generate an optical image onto the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
- the ISP outputs the digital image signal to the DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
- the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
- a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
- Video codecs are used to compress or decompress digital video.
- the electronic device 100 may support one or more video codecs.
- the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
- MPEG Moving Picture Experts Group
- MPEG2 moving picture experts group
- MPEG3 MPEG4
- MPEG4 Moving Picture Experts Group
- the NPU is a neural-network (NN) computing processor.
- NN neural-network
- Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
- Internal memory 121 may be used to store computer executable program code, which includes instructions.
- the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
- the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
- the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
- Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
- the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
- the voice can be answered by placing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
- the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
- the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals.
- the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
- the pressure sensor 180A may be provided on the display screen 194 .
- the capacitance between the electrodes changes.
- the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
- the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
- the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
- the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
- the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
- the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
- the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to recognize the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, raising hand to brighten the screen, and pedometer.
- Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
- the light emitting diodes may be infrared light emitting diodes.
- the electronic device 100 emits infrared light to the outside through the light emitting diode.
- Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
- the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power and prevent accidental touches.
- the electronic device 100 can use the proximity light sensor 180G to detect whether the screen of the electronic device 100 is blocked when the voice assistant is activated, so as to interact with the user through voice broadcast when the screen of the electronic device 100 is blocked without lighting the screen, To achieve the purpose of saving power and preventing accidental touch.
- the fingerprint sensor 180H is used to collect fingerprints.
- the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking photos with fingerprints, answering incoming calls with fingerprints, and the like.
- Touch sensor 180K also called “touch panel”.
- the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
- the touch sensor 180K is used to detect a touch operation on or near it.
- the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
- the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
- the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
- the SIM card interface 195 is used to connect a SIM card.
- the SIM card can be contacted and separated from the electronic device 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195 .
- the following introduces a method for the electronic device 100 to activate the voice assistant by recognizing the wake word.
- the electronic device 100 may receive voice input through a microphone. Wherein, when the user speaks the wake-up voice near the electronic device 100, the voice input may include the wake-up voice. After receiving the voice input, the electronic device 100 can separate the user's wake-up voice from the voice input. Next, the electronic device 100 can decode the phoneme sequence from the user's speech signal from the wake-up speech by using the acoustic model. After decoding the phoneme sequence from the wake-up speech, the electronic device 100 can determine whether the decoded phoneme sequence matches the stored wake-up word phoneme sequence, and if so, it indicates that there is a wake-up word in the wake-up speech. When it is determined that there is a wake-up word in the wake-up voice, the electronic device 100 can activate the voice assistant.
- the electronic device 100 may receive voice input through a microphone. Wherein, when the user speaks the wake-up voice near the electronic device 100, the voice input may include the wake-up voice. After receiving the voice input, the electronic device 100 can separate the user's wake-up voice from the voice input. Next, the electronic device 100 can decode the phoneme sequence from the user's speech signal from the wake-up speech by using the acoustic model. Then, the text information is further decoded from the decoded phoneme sequence through the speech model and the pronunciation dictionary of the speech model.
- the electronic device 100 can determine whether the text information decoded from the wake-up voice includes the stored wake-up word text, and if so, it indicates that the user's voice signal contains the wake-up word. When it is determined that there is a wake-up word in the wake-up voice, the electronic device 100 can activate the voice assistant.
- starting the voice assistant may start the voice instruction recognition module and the voice instruction execution module in the application processor for the electronic device 100 .
- the activated voice command recognition module can be used to recognize the voice command in the voice input collected by the microphone, and the activated voice command execution module can be used to execute the recognized voice command.
- Activating the voice assistant can also be called waking up the voice assistant.
- the voice wake-up module of the electronic device 100 may be in a working state from time to time.
- the voice wake-up module recognizes the wake-up word from the voice input collected by the microphone, the electronic device 100 can activate the voice assistant.
- the electronic device 100 may collect voice input through a microphone.
- the voice input may include wake words and/or voice commands.
- the voice input obtained by the electronic device 100 may include the wake-up word and the voice command.
- the voice input obtained by the electronic device 100 is the wake-up word.
- the voice assistant is activated, the user can only say a voice command during the voice interaction between the user and the voice assistant, for example, "I want to send a text message to Zhang San", and the voice input obtained by the electronic device 100 is the voice command.
- the electronic device 100 may also activate the voice assistant through other detected user operations. For example, in response to a user operation of long pressing the power key, the electronic device 100 may activate the voice assistant.
- the above-mentioned time for long pressing the power button may be 1 second or 2 seconds, which is not limited in this embodiment of the present application.
- This embodiment of the present application does not limit the first operation for starting the voice assistant, and the first operation may also be a user operation for starting the voice assistant for other users.
- the electronic device 100 may extract the wake-up word and the user's voiceprint feature from the user's voice signal.
- the electronic device 100 can start the detection device (eg, proximity light sensor, camera, motion sensor) to detect whether the user needs to watch the screen, and recognize the voice command input by the user next. In this way, it can be realized that only a specific user can activate the voice assistant to recognize and execute the voice command, which improves the information security of the terminal.
- the detection device eg, proximity light sensor, camera, motion sensor
- the electronic device 100 with the screen in the off-screen state may respond to the first operation by starting the voice assistant and the detection device.
- that the screen of the electronic device 100 is in the off-screen state may mean that the screen of the electronic device 100 is off.
- the light-emitting devices included in the screen of the electronic device 100 such as light-emitting diodes, do not emit light.
- the fact that the screen of the electronic device 100 is in the off-screen state may also mean that a small part of the light-emitting device included in the screen of the electronic device 100 emits light.
- the electronic device 100 enables the screen-off display function.
- the electronic device 100 may turn off the screen and display time on the screen.
- the above-mentioned off-screen state may also be referred to as a black-screen state or a screen-off state.
- the electronic device 100 may be in a bright screen state.
- the light-emitting devices included in the screen of the electronic device 100 such as light-emitting diodes, may all be in a light-emitting state.
- the application processor of the electronic device 100 may be in a working state.
- the electronic device 100 may use the proximity light sensor and the camera as detection means to detect whether the user needs to watch the screen.
- the process of detecting whether the user needs to watch the screen by the electronic device 100 may refer to the method flowchart shown in FIG. 12 .
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled.
- the electronic device 100 can first activate the proximity light sensor to detect whether there is an object blocking within a preset distance of the screen.
- the electronic device is placed in a pocket, and the electronic device is placed on a table with the screen facing down.
- the screen is not occluded.
- the user may need to watch the screen of the electronic device, or may not need to watch the screen of the electronic device. For example, an electronic device is placed on a table with the screen facing up, but the user is not looking at the screen.
- the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user. For example, when receiving the voice command of "check the weather", the electronic device 100 can keep the screen off and broadcast the weather by voice.
- the above-mentioned voice instructions can also be, for example, making a call, sending a short message, playing music, and controlling a smart home device.
- the electronic device 100 can activate the camera to detect whether there is a human face.
- the electronic device can collect multiple frames of images including faces in a continuous period of time (eg, 1 second, 2 seconds).
- the image captured by the camera does not contain the face.
- the user's face flashes in front of the screen, there is an image that does not contain the human face in the multi-frame images collected by the camera in a continuous period of time.
- the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user.
- the electronic device 100 can light up the screen, display a graphical interface, and perform voice interaction with the user.
- the presence of a human face in a frame of image collected by the camera may indicate that the frame of image contains a complete human face or a frontal human face. If a frame of image contains a side face or an incomplete human face, the electronic device 100 may determine that this frame of image does not contain a human face.
- the above-mentioned graphical interface may be a user interface of a voice assistant.
- the voice command involves displaying the user interface of the third application, for example, the voice command is "view gallery" or "play video”
- the above-mentioned graphical interface may be the user interface of the third application.
- the electronic device 100 when the electronic device 100 only interacts with the user through voice, the electronic device 100 can run the first program for obtaining the voice for interacting with the user and the first program for obtaining the graphical interface for interacting with the user.
- Second program when it is determined that the user does not need to watch the screen, the electronic device 100 can keep the screen in an off-screen state. That is, the electronic device 100 will not display the graphical interface obtained by running the above-mentioned second program on the screen.
- the electronic device 100 can light up the screen, and display the graphical interface obtained by running the above-mentioned second program on the screen.
- the electronic device 100 uses the voice assistant to interact with the user, if it is determined that the user needs to watch the screen, the electronic device 100 can quickly display the graphical interface obtained by running the second program on the screen, thereby reducing the time delay for drawing the graphical interface .
- the electronic device 100 when the electronic device 100 only interacts with the user through voice, the electronic device 100 may only run the above-mentioned first program. Then, the electronic device 100 may output the voice obtained by running the above-mentioned first program through the speaker, so as to realize the interaction with the user.
- the electronic device 100 may execute the above-mentioned second program when it is determined that the user needs to watch the screen. Further, the electronic device 100 lights up the screen, and displays on the screen the graphical interface obtained by running the above-mentioned second program. In this way, the electronic device 100 can only run the above-mentioned first program when it is determined that the user does not need to watch the screen, thereby saving power consumption.
- the electronic device 100 first determines whether the screen is blocked by the proximity light sensor, and when the screen is blocked, that is, the user does not need to watch the screen at this time, the electronic device 100 can keep the screen in the off-screen state and start and run in the background.
- a voice assistant that interacts with users by voice.
- the electronic device 100 can activate the camera for further detection. In this way, the electronic device 100 can save power consumption when detecting whether the user needs to watch the screen.
- the embodiment of the present application does not limit the time sequence of the electronic device 100 starting the voice assistant and starting the detection device.
- the above-mentioned proximity light sensor and the above-mentioned camera may be in a working state in real time.
- the electronic device may acquire the data collected by the proximity light sensor and/or the camera after detecting the first operation, to determine whether the user needs to watch the screen.
- the electronic device 100 when the electronic device 100 starts up the voice assistant, it may output a start prompt to prompt the user to input a voice command.
- the opening prompt may be one or more of a voice prompt, a text prompt, and a mechanical vibration prompt.
- the voice prompt may voice "Hi, I'm listening" to the electronic device 100 .
- the text prompt may display the text "Hi, I'm listening” on the screen for the electronic device 100 .
- the electronic device 100 may activate the voice assistant and the detection apparatus at the same time.
- the electronic device 100 may first output an opening prompt by voice when the screen is in an off-screen state. That is, the electronic device 100 can firstly announce "Hi, I'm listening".
- the electronic device 100 may determine whether to light up the screen. For example, when it is determined that a human face is detected, the electronic device 100 can light up the screen, and output an opening prompt in the form of text. That is, the electronic device 100 may display the text "Hi, I'm listening" on the screen. If it is detected by the detection device that the user does not need to watch the screen, the electronic device 100 can keep the screen in an off-screen state.
- the electronic device 100 may first start the detection device to perform detection, and then start the voice assistant after determining whether the user needs to watch the screen. If the detection device detects that the user needs to watch the screen, the electronic device 100 can light up the screen, display the text "Hi, I'm listening", and voice broadcast "Hi, I'm listening”. That is, the electronic device 100 can output the opening prompt in the form of text and voice. If it is detected by the detection device that the user does not need to watch the screen, the electronic device 100 may keep the screen in an off-screen state, and output an opening prompt by means of voice.
- FIG. 2 exemplarily shows a schematic diagram of a scenario in which an electronic device keeps a screen in an off-screen state and performs voice interaction with a user.
- the electronic device 100 is placed in a pocket.
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled, the microphone can collect voice input near the electronic device 100 in real time, and the electronic device 100 can recognize whether the voice input contains a preset wake-up word. This way, the user can activate the voice assistant by saying a preset wake word.
- the electronic device 100 can recognize the wake-up word “Xiaoyi Xiaoyi”. Further, the electronic device 100 may activate the detection device to detect whether the user needs to watch the screen. When the proximity light sensor is activated for detection, the electronic device 100 may determine that the screen is blocked. Further, the electronic device 100 can keep the screen in an off-screen state (that is, the screen is in a black-screen state), run a voice assistant in the background, and perform voice interaction with the user.
- the electronic device 100 can execute the operation corresponding to the voice instruction. For example, the electronic device 100 can call the address book application to check whether there is a contact named "Zhang San”. If it is determined that the contact exists, the electronic device 100 can voice prompt "OK, please state the content of the short message” through the speaker, and call the short message application to provide the user with the service of sending short messages
- the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user, thereby saving power consumption of the electronic device and avoiding accidental touches .
- FIG. 3 exemplarily shows a schematic diagram of another scenario in which the electronic device keeps the screen in the off-screen state and performs voice interaction with the user.
- the electronic device 100 is placed on the table with the screen facing up.
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled.
- the electronic device 100 can recognize the wake-up word “Xiaoyi Xiaoyi". Further, the electronic device 100 may activate the detection device to detect whether the user needs to watch the screen. When the proximity light sensor is activated for detection, the electronic device 100 may determine that the screen is not blocked. Then, the electronic device 100 can activate the camera. According to the image captured by the camera, the electronic device 100 may determine that no human face is detected. Further, the electronic device can keep the screen off, run the voice assistant in the background, and perform voice interaction with the user.
- the electronic device 100 may execute the operation corresponding to the voice command.
- the electronic device 100 may call an application for controlling the smart home device to turn on the air conditioner while keeping the screen in the off-screen state.
- the electronic device 100 may voice prompt "OK, the air conditioner is on” through the speaker, so as to reply to the voice command spoken by the user. In this way, the user can know that the electronic device 100 has activated the voice assistant to recognize and execute the voice command.
- the electronic device 100 when the screen is not blocked, the electronic device 100 can activate the camera for further detection. This allows for a more accurate judgment on whether the user needs to watch the screen. In a scenario where the screen is not blocked and no human face is detected, the electronic device can keep the screen in an off-screen state and perform voice interaction with the user to save the power consumption of the electronic device.
- electronic devices can also keep the screen off when it detects that the user does not need to watch the screen according to the detection results of the detection device.
- Status provides users with functions such as playing music, making and answering calls, checking the weather, and navigating through voice.
- the voice command is "view gallery", "play video”, etc.
- the user often watches the screen of the electronic device, that is, the electronic device can generally detect that the user needs to watch the screen by approaching the light sensor and the camera.
- the electronic device 100 can keep the screen in the off-screen state , and the speaker will give a voice prompt "It has been found for you, come and check it out". Further, when the user views the screen according to the voice prompt of the electronic device, the electronic device can detect that the user needs to view the screen through the proximity light sensor and the camera. Further, the electronic device 100 can light up the screen to display the user interface of the related application.
- the electronic device can keep the screen in an off-screen state when it is detected that the user does not need to watch the screen, and perform voice interaction with the user, thereby saving power consumption of the electronic device and avoiding accidental touches.
- FIG. 4A and FIG. 4B exemplarily show schematic diagrams of scenarios in which an electronic device lights up a screen and interacts with a user by means of a graphical interface and voice.
- the user holds the electronic device 100 and keeps his face facing the screen of the electronic device 100 .
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled.
- the electronic device 100 can recognize the wake-up word “Xiaoyi Xiaoyi". Further, the electronic device 100 may activate the detection device to detect whether the user needs to watch the screen. When the proximity light sensor is activated for detection, the electronic device 100 may detect that the screen is not blocked. The electronic device 100 can then activate the camera. According to the image captured by the camera, the electronic device 100 may determine that a human face is detected.
- the electronic device 100 may run a voice assistant, and display a user interface of the voice assistant, for example, displaying a speech-to-text box 202 as shown in FIG. 4A .
- the speech-to-text box 202 can be used to display the speech instruction “I want to send a text message to Zhang San” recognized by the electronic device 100 . In this way, the user can compare whether the voice command recognized by the electronic device 100 is consistent with the voice command spoken by the user.
- the electronic device 100 can perform the operation corresponding to the voice command.
- the electronic device 100 may first call the address book application to check whether there is a contact named "Zhang San”. If it is determined that the contact exists, the electronic device 100 may prompt the user to speak the content of the short message by means of text display and voice broadcast.
- the electronic device 100 may display the user interface as shown in FIG. 4B , and prompt the user to speak the contents of the short message through a speaker voice prompting “OK, please speak the contents of the short message”.
- the user interface shown in FIG. 4B may include a text prompt box 203 .
- the content of the text prompt box 203 may be the same as the content of the voice prompt of the electronic device 100, such as "OK, please say the content of the short message”.
- the electronic device when the screen is not blocked and a human face is detected, the electronic device can light up the screen to display the user interface of the voice assistant, or, when it is recognized that the voice command involves the user who displays the third application interface, the electronic device can call the third application and display the user interface of the third application.
- the electronic device can also interact with the user in the form of voice broadcast. Electronic devices can intelligently decide whether to light up the screen. When it is detected that the user needs to watch the screen, the electronic device can interact with the user through a graphical interface and voice to give the user a good experience.
- the embodiment of the present application does not limit the content of the voice broadcast by the electronic device and the content of the text prompt.
- the electronic device 100 may utilize only the proximity light sensor to detect whether the user needs to view the screen.
- the screen of the electronic device 100 is in an off-screen state.
- the electronic device 100 may activate the proximity light sensor.
- the voice wake-up module in the electronic device 100 may acquire and process the voice input collected by the microphone.
- the electronic device 100 can activate the proximity light sensor.
- the electronic device 100 can keep the screen in an off-screen state, run the voice assistant in the background, and perform voice interaction with the user.
- the electronic device 100 can light the screen and run the voice assistant.
- the electronic device 100 can display the user interface of the voice assistant and perform voice interaction with the user. In this way, the electronic device 100 can interact with the user through a graphical interface and voice.
- the proximity light sensor can be active in real time. If the electronic device 100 determines through the proximity light sensor that the screen is not blocked within a preset time before receiving the first operation, the electronic device 100 may start the voice assistant after receiving the first operation. Among them, the electronic device 100 can light up the screen and interact with the user in a combination of a graphical interface and voice.
- the foregoing preset time before the first operation is received may be 1 second or 2 seconds, which is not limited in this embodiment of the present application.
- the electronic device 100 may use only the camera to detect whether the user needs to view the screen.
- the screen of the electronic device 100 is in an off-screen state.
- the electronic device 100 may activate the camera.
- the voice wake-up module in the electronic device 100 may acquire and process the voice input collected by the microphone.
- the electronic device 100 can activate the camera.
- the electronic device 100 can keep the screen off, run the voice assistant in the background, and perform voice interaction with the user.
- the electronic device 100 can light up the screen and run the voice assistant.
- the electronic device 100 can display the user interface of the voice assistant and perform voice interaction with the user. In this way, the electronic device 100 can interact with the user through a graphical interface and voice.
- the camera can be in working state in real time. If the electronic device 100 detects a human face through the camera within a preset time before receiving the first operation, the electronic device 100 may start the voice assistant after receiving the first operation. Among them, the electronic device 100 can light up the screen and interact with the user in a combination of a graphical interface and a voice.
- the foregoing preset time before the first operation is received may be 1 second or 2 seconds, which is not limited in this embodiment of the present application.
- the electronic device 100 may only utilize motion sensors to detect whether the user needs to watch the screen.
- the screen of the electronic device 100 is in an off-screen state.
- the electronic device 100 may activate the motion sensor.
- the voice wake-up module in the electronic device 100 may acquire and process the voice input collected by the microphone.
- the electronic device 100 may activate the motion sensor.
- the motion sensor may include an acceleration sensor, a gyroscope sensor. The motion sensor can be used to detect the posture change of the electronic device 100 . Not limited to an acceleration sensor and a gyroscope sensor, the motion sensor may also be other types of sensors that can be used to detect changes in the attitude of the electronic device 100 .
- the electronic device 100 When the electronic device 100 does not detect a hand-raising motion according to the motion sensor, the electronic device 100 can keep the screen off, run the voice assistant in the background, and perform voice interaction with the user.
- the posture change of the electronic device may be: the electronic device 100 changes from a horizontal posture to an inclined or vertical posture when the screen faces upward.
- the electronic device 100 When the electronic device 100 detects a hand-raising motion according to the motion sensor, the electronic device 100 can light up the screen and run the voice assistant. The electronic device 100 can display the user interface of the voice assistant and perform voice interaction with the user. In this way, the electronic device 100 can interact with the user through a graphical interface and voice.
- 5A , 5B, 6A and 6B exemplarily show schematic diagrams of scenarios in which the electronic device 100 intelligently decides whether to turn on the screen according to the motion sensor when the voice assistant is running.
- the screen of the electronic device 100 is in an off-screen state.
- the electronic device 100 may activate the motion sensor.
- the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user. For example, in response to the user's voice instruction "How is the weather today", the electronic device 100 may search for the weather and voice the current weather "New York issued a yellow lightning warning today, thunderstorms all day" through the speaker.
- the electronic device 100 can light up the screen, display the user interface 210 as shown in FIG. 5B , and continue to announce the weather by voice.
- a text prompt 211 may be included in the user interface 210 .
- the text prompt box 211 can be used to display data such as location, date, and weather by means of icons and texts.
- the electronic device 100 can light up the screen, start and run the voice assistant, and interact with the user through a graphical interface and voice.
- the motion sensor can be active in real time. If the electronic device 100 detects a hand raising motion within a preset time before receiving the first operation through the motion sensor, the electronic device 100 may start the voice assistant after receiving the first operation. Among them, the electronic device 100 can light up the screen and interact with the user in a combination of a graphical interface and a voice.
- the foregoing preset time before the first operation is received may be 1 second or 2 seconds, which is not limited in this embodiment of the present application.
- the screen of the electronic device 100 is in an off-screen state.
- the electronic device 100 can detect the hand raising action through the motion sensor.
- the screen of the electronic device 100 remains in the off-screen state.
- the electronic device 100 can light up the screen and execute the user's voice Command "What's the weather like today".
- the electronic device 100 can display the user interface 210 as shown in FIG. 6B , and broadcast the weather of the day through the speaker voice "New York issued a yellow lightning warning today, there will be thunderstorms throughout the day.".
- the user can first pick up the mobile phone and do a hand-raising action. If the user speaks the wake-up word within a preset time period after the user raises his hand, for example, within 1 second or 2 seconds, the electronic device 100 can activate the voice assistant, light up the screen, and interact with the user in a combination of graphical interface and voice. .
- the motion sensor can be active in real time. If the electronic device 100 detects a hand raising motion while receiving the first operation, the electronic device 100 may activate the voice assistant after receiving the first operation. Among them, the electronic device 100 can light up the screen and interact with the user in a combination of a graphical interface and a voice.
- the electronic device 100 can activate the voice assistant, light up the screen, and interact with the user by combining the graphical interface and voice.
- the electronic device 100 may incorporate a proximity light sensor and a motion sensor to detect whether the user needs to view the screen.
- the screen of the electronic device 100 is in an off-screen state.
- the electronic device 100 may activate the proximity light sensor first.
- the voice wake-up module in the electronic device 100 may acquire and process the voice input collected by the microphone.
- the electronic device 100 can activate the proximity light sensor.
- the electronic device 100 may use the proximity light sensor to detect whether the screen is blocked. If it is determined that the screen is blocked, the electronic device 100 can keep the screen in an off-screen state, run the voice assistant in the background, and perform voice interaction with the user.
- the electronic device 100 may activate the motion sensor.
- the electronic device 100 may detect a posture change of the electronic device 100 according to the motion sensor. For example, when a hand raising motion is detected, the electronic device 100 can light up the screen, run a voice assistant, and interact with the user through a graphical interface and voice.
- the following describes the time when the electronic device 100 uses the detection device (eg, a proximity light sensor, a camera, and a motion sensor) to perform detection.
- the detection device eg, a proximity light sensor, a camera, and a motion sensor
- the detection apparatus may start when the electronic device 100 receives the first operation, and continue to detect until the end of the voice interaction.
- the end of the above-mentioned voice interaction may indicate that the voice assistant stops running, and the user needs to perform the first operation mentioned in the foregoing embodiment again to start the voice assistant.
- the voice command is to send a short message.
- the electronic device 100 may stop running the voice assistant.
- the voice command is to query the weather.
- the electronic device 100 may stop running the voice assistant.
- 7A to 7E exemplarily show schematic diagrams of scenarios in which the detection apparatus continues to detect during the process from the electronic device 100 recognizing the wake-up word of the voice assistant to the end of the voice interaction.
- the electronic device 100 is placed on the table with the screen facing up.
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled.
- the voice wake-up module in the electronic device 100 can recognize the wake-up word “Xiaoyi Xiaoyi". Further, the electronic device 100 may activate the detection device to detect whether the user needs to watch the screen.
- the electronic device 100 can detect that the screen is not blocked. Then, the electronic device 100 can activate the camera. According to the image captured by the camera, the electronic device 100 may determine that no human face is detected. Further, the electronic device can run a voice assistant in the background to perform voice interaction with the user. The electronic device 100 may recognize the voice instruction "tell me the story of Snow White” from the voice input collected by the microphone, and execute the operation corresponding to the voice instruction. For example, the electronic device 100 can call the browser application to search for the story of "Snow White” while keeping the screen off, and broadcast the story through the speaker "A long time ago, a queen gave birth to a girl in winter -- .
- the electronic device 100 can continuously use the detection device to detect whether the user needs to watch the screen, and intelligently decide whether to light the screen according to the judgment result.
- the proximity light sensor and camera are turned off.
- the electronic device 100 may first activate the proximity light sensor to detect whether the screen is blocked.
- the electronic device 100 may turn off the proximity light sensor and activate the camera at the first moment.
- the electronic device 100 may use the proximity light sensor for detection first, and then turn on the camera for detection after determining that the screen is not blocked. That is, the camera can be turned off when the screen is blocked.
- the electronic device 100 may turn off the camera when the voice interaction ends.
- the electronic device 100 can detect whether there is a face according to the camera. When no face is detected, the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user. When a human face is detected, the electronic device 100 can light up the screen and display a corresponding user interface on the screen. In this way, the electronic device 100 can interact with the user through a graphical interface and voice.
- the working time of the proximity light sensor can start from the recognition of the wake-up word to the first moment when it is determined that the screen is not blocked.
- the working time of the camera may start from the above-mentioned first moment and end when the voice interaction ends. If the screen is always blocked, the electronic device 100 can only turn on the proximity light sensor for detection, thereby saving power consumption.
- the electronic device 100 is placed on the table with the screen facing up.
- the electronic device 100 may determine that the screen is not blocked according to the proximity light sensor. Then the electronic device 100 can turn on the camera for detection.
- the user walks toward the electronic device 100 and picks up the electronic device 100 .
- the user's face is opposed to the screen of the electronic device 100 .
- the electronic device 100 can detect a human face according to an image captured by a camera.
- the electronic device 100 may light up the screen to display the user interface as shown in FIG. 7B .
- the user interface may include a text prompt 204 .
- the text prompt box 204 can be used to display the search result of the electronic device 100 according to the recognized voice command. For example, if the voice command is "tell me the story of Snow White", the text prompt box 204 may display the story of "Snow White” searched by the electronic device 100 "lips are red as snow, hair is black as beautiful as ebony". As shown in FIG.
- the user puts down and leaves the electronic device 100 .
- the electronic device 100 is placed on a table.
- the camera of the electronic device 100 is in a working state.
- the electronic device 100 can turn off the screen and interact with the user by voice. For example, the electronic device 100 can turn off the screen and continue to voice the story of Snow White.
- the electronic device 100 may stop running the voice assistant and turn off the camera.
- the working time of the proximity light sensor and the camera is not limited in this embodiment of the present application.
- the electronic device 100 may turn on the proximity light sensor and the camera.
- the electronic device 100 can turn off the proximity light sensor and the camera.
- the proximity light sensor and the camera can work alternately from recognition to the wake word to the end of the voice interaction.
- the electronic device 100 may stop or continue to interact with the user in a manner of voice.
- the electronic device 100 can display the user interface of the voice assistant and perform voice interaction with the user.
- the electronic device 100 may display a text prompt 204 as shown in FIG. 7D.
- the text prompt box 204 may include the text content of the story of "Snow White”, and may also include a previous page control 204A, a next page control 204B and a stop voice broadcast control 204C.
- the previous page control 204A and the next page control 204B may be used to control the text content displayed in the text prompt box 204 .
- the electronic device 100 may display a user interface as shown in FIG. 7B.
- the content in the text prompt box 204 shown in FIG. 7D may be a continuation of the content in the text prompt box 204 shown in FIG. 7B .
- the stop voice announcement control 204C may be used for the electronic device 100 to stop voice interaction with the user. For example, in response to a touch operation acting on the stop voice announcement control 204C shown in FIG. 7D , the electronic device 100 may stop the speech announcement of the story of "Snow White".
- the electronic device 100 may switch the stop speech announcement control 204C to the continue speech announcement control 204D.
- the continue voice announcement control 204D can be used for the electronic device 100 to continue to perform voice interaction with the user. For example, in response to a touch operation acting on the continue voice announcement control 204D, the electronic device 100 may resume the speech announcement from the content broadcast when the speech announcement was stopped. Alternatively, the electronic device 100 may voice broadcast the content currently displayed in the text prompt box 204 .
- the foregoing text prompt box 204 may further include more or less controls, which is not limited in this embodiment of the present application.
- the electronic device 100 may use a proximity light sensor and a motion sensor to detect whether the user needs to watch the screen, and then determine whether to light the screen.
- the electronic device 100 may first turn on the proximity light sensor. When it is determined that the screen is blocked, the electronic device 100 can keep the screen in an off-screen state, run the voice assistant in the background, and perform voice interaction with the user. And, the proximity light sensor can work continuously to detect if the screen is blocked. If at the first moment, the electronic device 100 determines according to the proximity light sensor that the screen is not blocked, the electronic device 100 can turn off the proximity light sensor and turn on the motion sensor. When the hand raising action is not detected, the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user.
- the electronic device 100 can light up the screen and interact with the user through a graphical interface and voice. That is, the working time of the proximity light sensor may start from the recognition of the wake-up word and end at the above-mentioned first moment. The working time of the motion sensor may start from the above-mentioned first moment and end when the voice interaction ends.
- the electronic device 100 intelligently decides whether to light up the screen, thereby saving the power of the electronic device. consumption to avoid accidental touch.
- the electronic device 100 can light up the screen when it is detected that the user needs to view the screen, without affecting the user's viewing of the related user interface.
- the electronic device 100 may only use the proximity light sensor as the detection device to detect whether the user needs to watch the screen, and then determine whether to light the screen. That is, the working time of the proximity light sensor can start from the recognition of the wake-up word and end when the voice interaction ends.
- the electronic device 100 can light the screen, run the voice assistant, and interact with the user through a graphical interface and voice.
- the electronic device 100 may turn off the screen and perform voice interaction with the user.
- the screen of the electronic device 100 can be switched between the screen-off state and the screen-on state, which not only does not affect the user's viewing of the screen and related user interfaces when needed, but also saves the power consumption of the electronic device , and avoid accidental touch.
- the electronic device 100 may only use the camera as the detection device to detect whether the user needs to watch the screen, and then determine whether to light the screen. That is, the working time of the camera can start from the recognition of the wake-up word and end when the voice interaction ends.
- the electronic device 100 can light up the screen, run a voice assistant, and interact with the user through a graphical interface and voice.
- the electronic device 100 can turn off the screen and perform voice interaction with the user.
- the screen of the electronic device 100 can be switched between the off-screen state and the on-screen state, which not only does not affect the user's viewing of the screen and related user interfaces when needed, but also saves the time of the electronic device. power consumption and avoid false touches.
- the electronic device 100 may only use the motion sensor as the detection device to detect whether the user needs to watch the screen, and then determine whether to light the screen. That is, the working time of the motion sensor can start from the recognition of the wake-up word and end when the voice interaction ends.
- the electronic device 100 may keep the screen in an off-screen state and perform voice interaction with the user when the hand-raising action is not detected.
- the electronic device 100 can light up the screen, run a voice assistant, and interact with the user through a graphical interface and voice.
- the electronic device 100 can intelligently decide whether to light up the screen, thereby saving the power consumption of the electronic device and avoiding false positives. touch. Moreover, the electronic device 100 can light up the screen when it is detected that the user needs to view the screen, without affecting the user's viewing of the related user interface.
- the electronic device 100 can continuously use the detection device to detect whether the user needs to watch the screen. In this way, when it is detected that the user does not need to watch the screen, the electronic device can interact with the user by means of voice when the screen is in an off-screen state. When it is detected that the user needs to watch the screen, the screen is lit, and the user interacts with the user through a graphical interface and voice. In this way, the electronic device can intelligently decide whether to light the screen or not. The screen can be switched between the off-screen state and the screen-on state.
- the screen of the electronic device is in an off-screen state, which can save power consumption of the electronic device and avoid accidental touches.
- the electronic device can display the corresponding user interface without affecting the user's experience.
- the detection apparatus may start the detection from the wake word of the voice assistant recognized by the electronic device 100, and end the detection after the screen is turned on.
- FIGS. 7A to 7C are still used for description.
- the electronic device 100 is placed on the table with the screen facing up.
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled.
- the electronic device 100 may first turn on the proximity light sensor for detection, and when it is detected that the screen is not blocked, turn off the proximity light sensor and turn on the camera for detection.
- the electronic device 100 can keep the screen in an off-screen state, run the voice assistant in the background, and interact with the user through voice.
- the user's face is opposed to the screen of the electronic device 100 .
- the electronic device 100 can detect the human face according to the image captured by the camera.
- the electronic device 100 can light up the screen, display the user interface shown in FIG. 7B , and interact with the user through a graphical interface and voice.
- the electronic device 100 may turn off the camera.
- the electronic device 100 can turn off the detection device, and no longer detect whether the user needs to watch the screen in the subsequent stage. Then, after the electronic device 100 lights up the screen and displays the user interface shown in FIG. 7B, before the end of the voice interaction, if the user's face is no longer opposite to the screen of the electronic device 100, for example, as shown in FIG. 7C, When the user puts down and leaves the electronic device 100, the screen of the electronic device 100 can still remain in a bright screen state.
- the detection apparatus may start the detection from the wake-up word of the voice assistant recognized by the electronic device 100, and end the detection after completing a round of voice interaction with the user.
- the above-mentioned completion of one round of voice interaction with the user can be uttered a voice command for the user, and the electronic device 100 runs the voice assistant to reply to the above-mentioned voice command uttered by the user.
- the user speaks the voice instruction "I want to send a text message to Zhang San” and the electronic device 100 can reply to the voice prompt "Okay, please state the content of the text message", which is a round of voice interaction.
- the user speaks the voice command “turn on the air conditioner” and the electronic device 100 replies “OK, the air conditioner is being turned on”, which is a round of voice interaction.
- FIG. 2 The embodiment shown in FIG. 2 will be described.
- the electronic device 100 is placed in a pocket.
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled.
- the electronic device 100 can use detection devices, such as proximity light sensors and cameras, to detect the user Whether you need to watch the screen.
- the electronic device 100 can keep the screen in an off-screen state, and interact with the user by means of voice.
- the electronic device 100 can light up the screen and interact with the user through a graphical interface and voice. That is, the screen can be switched from the off-screen state to the bright-screen state.
- the electronic device 100 can turn off the detection device. Whether the screen of the electronic device 100 is in the off-screen state or in the bright-screen state may be determined by the state of the screen when the electronic device 100 turns off the detection device.
- the state of the screen is the off-screen state, and the electronic device 100 can keep the screen in the off-screen state, and interacts with the user by voice in the subsequent stage of the voice interaction process.
- the electronic device 100 turns off the detection device, the screen is in the bright screen state, and the electronic device 100 can keep the screen in the bright screen state, and interacts with the user through a graphical interface and voice in the subsequent stages of the voice interaction process.
- the detection apparatus may start the detection from the wake-up word of the voice assistant recognized by the electronic device 100, and end the detection after completing N rounds of voice interaction with the user.
- N can be an integer greater than 1.
- This embodiment of the present application does not limit the detection time for the electronic device to use the detection device to detect whether the user needs to watch the screen.
- the electronic device 100 can detect whether the user needs to watch the screen by combining the detection result of the detection device and the analysis result of analyzing whether the received voice instruction contains a specific keyword.
- the above-mentioned specific keywords may include first-type keywords and second-type keywords.
- the first type of keywords may be keywords related to specific categories of applications, and these specific categories of applications generally interact with users through a user interface. For example, video applications: Huawei Video, iQiyi, etc., shopping applications: Taobao, Jingdong, etc., navigation applications: Baidu Maps, Google Maps, etc.
- the second type of keywords may be keywords related to specific actions, which may be actions indicating that the user needs to watch the screen. For example: view, display, etc.
- the electronic device 100 can keep the screen off and perform voice interaction with the user.
- the electronic device 100 cannot describe the user interface that needs to be displayed for the user in the form of voice broadcast, for example, the voice command is "view gallery", "play video", the electronic device 100 The screen can be lit to display the user interface involved in the voice command.
- the electronic device 100 may further identify whether the voice command contains the above-mentioned first type of keywords and/or the above-mentioned second type of keywords, to determine whether to light up Screen.
- the electronic device 100 may first identify whether the voice instruction contains the first type of keywords. If it is determined that the voice command contains the first type of keywords, the electronic device 100 may light up the screen to display the user interface involved in the voice command. In this way, the electronic device 100 can interact with the user through a graphical interface and voice. If it is determined that the voice command does not contain the first type of keywords, the electronic device 100 may re-identify whether the voice command contains the second type of keywords. If it is determined that the second type of keyword is included in the voice command, the electronic device 100 may light up the screen to display the user interface involved in the voice command. In this way, the electronic device 100 can interact with the user through a graphical interface and voice. If it is determined that the voice command does not contain the above-mentioned first type of keywords and the above-mentioned second type of keywords, the electronic device 100 may keep the screen in an off-screen state and perform voice interaction with the user.
- the electronic device 100 may also first identify whether the voice command contains the above-mentioned second type of keywords. If it is determined that the voice command does not contain the second type of keywords, the electronic device 100 can re-identify whether the voice command contains the above-mentioned first type of keywords to detect whether the user needs to watch the screen, and then intelligently decide whether to light the screen.
- FIGS. 8A-8D exemplarily show an embodiment in which the electronic device 100 detects whether the user needs to watch the screen by combining the detection result of the detection device and the analysis result of analyzing whether the received voice instruction contains a specific keyword.
- the electronic device 100 is placed on the table with the screen facing up.
- the screen of the electronic device 100 is in an off-screen state, and the voice wake-up function is enabled.
- the user speaks the wake-up word "Xiaoyi Xiaoyi" near the electronic device 100 .
- the microphone in the electronic device 100 can collect voice input near the electronic device 100 .
- the voice wake-up module in the electronic device 100 may acquire the voice input collected by the microphone, and recognize that the voice input contains a wake-up word.
- the electronic device 100 may activate the detection means to detect whether the user needs to watch the screen.
- the electronic device 100 may use one or more of a proximity light sensor, a camera, and a motion sensor to detect whether the user needs to watch the screen. For the specific detection method, reference may be made to the foregoing embodiments, which will not be repeated here.
- the electronic device 100 can keep the screen in an off-screen state, run the voice assistant in the background, and perform voice interaction with the user. For example, the electronic device 100 may keep the screen in an off-screen state, and prompt the user to speak a voice instruction through a speaker voice prompting "Hi, I'm listening.”
- the microphone in the electronic device 100 can collect the voice input near the electronic device 100 .
- the electronic device 100 can recognize the voice command from the voice input.
- the user speaks the voice command "view gallery" near the electronic device 100 .
- the electronic device 100 can recognize that the voice instruction contains the first type of keyword "gallery".
- the electronic device 100 may execute the voice instruction. Specifically, the electronic device 100 may invoke the gallery application to display the user interface of the gallery application as shown in FIG. 8B .
- the electronic device 100 can also give a voice prompt through the speaker "The gallery has been opened, come and check it out”.
- the microphone in the electronic device 100 can collect the voice input near the electronic device 100 .
- the electronic device 100 can recognize the voice command from the voice input.
- the user speaks the voice instruction "I want to watch video A" near the electronic device 100.
- the electronic device 100 may first identify whether the voice instruction contains the first type of keywords.
- the voice command does not contain the first type of keywords.
- the electronic device 100 may re-identify whether the voice instruction contains the second type of keywords.
- the voice instruction contains the second type of keyword "see”.
- the electronic device 100 may execute the voice instruction.
- the electronic device 100 can call the Huawei video application, and display the user interface of the Huawei video application as shown in FIG. 8C .
- the video A indicated in the voice command may be included in the user interface.
- the electronic device 100 can also give a voice prompt "I think it is opened for you, come and check it" through the speaker.
- the electronic device 100 can first keep the screen in the off-screen state and perform voice interaction with the user.
- the electronic device 100 may further detect whether the user needs to watch the screen according to whether the voice command contains the first type of keywords and/or the second type of keywords.
- the electronic device 100 In some scenarios where the user wants to watch the screen but has not watched the screen, for example, the electronic device 100 is placed on a table, the user walks to the electronic device 100 to prepare to watch the screen while saying the wake-up word and the voice command "view gallery", the electronic The device 100 may light up the screen according to the presence of the first type of keywords and/or the second type of keywords in the voice command, and display the user interface involved in the voice command. In this way, the electronic device 100 can more accurately detect whether the user needs to watch the screen.
- the screen of the electronic device 100 is in an off-screen state and in a lock-screen state. Before the electronic device 100 displays the user interface involved in the voice command, the user is prompted to unlock the electronic device 100 .
- the user speaks the wake-up word “Xiaoyi Xiaoyi” and the voice command “view gallery” near the electronic device 100 .
- the electronic device 100 may detect that the user does not need to watch the screen according to the detecting means.
- the microphone in the electronic device 100 can collect voice input near the electronic device 100 .
- the electronic device 100 can recognize the voice command from the voice input.
- the voice instruction contains the first type of keywords.
- the electronic device 100 may display the unlocking interface as shown in FIG. 8D , and prompt the user to unlock the electronic device 100 through a voice prompt of “please unlock it for me first” through the speaker. For example, the user may input an unlock password on the unlock interface as shown in FIG.
- the electronic device 100 may receive the unlocking password and match the unlocking password with the stored unlocking password. If the received unlocking password matches the stored unlocking password, the electronic device 100 can invoke the gallery application to display the user interface of the gallery application as shown in FIG. 8B .
- the electronic device 100 may also be unlocked according to the voiceprint feature of the received voice input.
- the electronic device 100 may be unlocked according to face recognition.
- the electronic device 100 may only analyze whether a specific keyword is included in the received voice instruction to determine whether the user needs to watch the screen.
- the electronic device 100 can light up the screen and interact with the user through a graphical interface and voice.
- the electronic device 100 may keep the screen in an off-screen state, and only interact with the user by means of voice.
- the implementation manner of the electronic device 100 judging whether the user needs to watch the screen according to whether the voice command contains a specific keyword may refer to the foregoing embodiments, which will not be repeated here.
- the electronic device 100 may be a large-screen device such as a smart TV, or a smart speaker with a screen. Users can use these devices without viewing the screens of these devices. For example, users do not need to watch the screen when they use a smart TV or smart speaker with a screen to play music and control smart home devices.
- the electronic device 100 plays music through a remote control or a voice command, and the electronic device 100 lights up the screen.
- the user needs to turn off the screen of the electronic device 100 through a remote control or a voice command.
- the above-mentioned user operation for controlling the screen off of the electronic device 100 is complicated, and the electronic device 100 cannot intelligently decide whether to turn on the screen according to whether the user needs to watch the screen when the screen is in the off-screen state.
- the electronic device 100 with the screen in the off-screen state may respond to the first operation by starting the voice assistant and the detection device.
- the electronic device 100 can use the camera as a detection device to detect whether the user needs to watch the screen, and then intelligently decide whether to light the screen.
- the above-mentioned first operation may be a user operation acting on a physical key on the electronic device 100 , or a user operation acting on a key on a remote controller for controlling the electronic device 100 .
- the electronic device 100 is a smart TV.
- the first operation may be a user operation that acts on a power button on the smart TV, or a user operation that acts on an on/off button on a remote control of the smart TV.
- the above-mentioned first operation may also allow the user to speak a preset wake-up word (eg, "Xiaoyi Xiaoyi").
- the screen of the electronic device 100 is in an off-screen state.
- the voice wake-up function of the electronic device 100 is enabled. When the wake-up word is recognized from the voice input collected by the microphone, the electronic device 100 can activate the camera to determine whether a human face is detected.
- the electronic device 100 may determine that no human face has been detected.
- the electronic device 100 does not detect a human face, which may indicate that the user's face is not opposite to the screen of the electronic device 100, that is, the user does not need to watch the screen.
- the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user. For example, when receiving a voice command of "play music", the electronic device 100 can keep the screen in an off-screen state and play music.
- the above-mentioned voice instructions can also be, for example, making a call, sending a short message, playing music, and controlling a smart home device.
- the electronic device 100 may determine that a human face has been detected.
- the user's face relative to the screen does not necessarily mean that the user needs to watch the screen.
- the user may not watch the smart TV screen.
- the electronic device 100 can determine whether the image captured by the camera includes the first gesture, and then detect whether the user needs to watch the screen.
- the above first gesture can be used to indicate that the user does not need to watch the screen. For example, when a user sits in front of a smart TV and activates the smart TV's voice assistant and does not need to watch the screen, he or she can make the first gesture when saying the wake word.
- the above-mentioned first gesture may be a gesture of making a fist, a gesture of opening a palm, and the like. This embodiment of the present application does not limit the above-mentioned first gesture.
- the detection of a human face does not necessarily mean that the user needs to watch the screen.
- the electronic device 100 may also recognize the first gesture in the image captured by the camera, which may indicate that the user does not need to watch the screen.
- the electronic device 100 can keep the screen in an off-screen state and perform voice interaction with the user.
- the electronic device 100 can light up the screen, display a graphical interface, and perform voice interaction with the user. In this way, the electronic device 100 can interact with the user in a manner that combines a graphical interface and voice.
- the above-mentioned graphical interface may be a user interface of a voice assistant.
- the voice command involves displaying the user interface of the third application, for example, the voice command is "play video”
- the above-mentioned graphical interface may be the user interface of the third application.
- the electronic device can determine whether a face is detected and whether a first gesture is detected according to an image captured by a camera, and then detect whether the user needs to watch the screen.
- the electronic device 100 may determine whether to turn on the screen when the voice assistant is activated in the off-screen state according to whether the camera needs to watch the screen.
- the electronic device can keep the screen in an off-screen state and perform voice interaction with the user.
- the user does not need to perform a corresponding operation to turn off the screen after the electronic device 100 lights the screen, thereby simplifying the user to use the electronic device 100 as a sound box the operation used.
- FIG. 9 exemplarily shows a schematic diagram of a scenario in which the electronic device 100 keeps the screen in an off-screen state and performs voice interaction with the user.
- the electronic device 100 may be a smart TV.
- the screen of the smart TV is off, and the voice wake-up function is turned on.
- the microphone of the smart TV can collect the voice input near the smart TV in real time, and send it to the voice wake-up module, and the voice wake-up module recognizes whether the voice input contains a preset wake-up word. This way, the user can activate the voice assistant by saying a preset wake word.
- the voice wake-up module in the smart TV can recognize the wake-up word "Xiaoyi Xiaoyi". Further, the smart TV can activate the detection device to detect whether the user needs to watch the screen. As shown in FIG. 9 , the user's face does not face the screen of the smart TV, or the user's face flashes in front of the smart TV. When the camera is activated for detection, the smart TV can determine that there are images that do not contain human faces in the multi-frame images collected by the camera within a preset time period.
- Smart TVs can keep the screen off, run a voice assistant in the background, and interact with the user by voice.
- the smart TV can recognize the voice command "I want to listen to a song", and execute the operation corresponding to the voice command.
- a smart TV can call a music app to play music while keeping the screen off.
- the above-mentioned camera may be a low-power camera.
- the electronic device 100 may acquire the data collected by the above-mentioned camera to determine whether the user needs to watch the screen. Further, the electronic device 100 may determine whether to turn on the screen when the voice assistant is activated.
- FIG. 10 exemplarily shows a schematic diagram of another scenario where the electronic device 100 keeps the screen in an off-screen state and performs voice interaction with the user.
- the electronic device 100 may be a smart TV.
- the screen of the smart TV is off, and the voice wake-up function is turned on.
- the microphone of the smart TV can collect the voice input near the smart TV in real time, and send it to the voice wake-up module, and the voice wake-up module recognizes whether the voice input contains a preset wake-up word. This way, the user can activate the voice assistant by saying a preset wake word.
- the voice wake-up module in the smart TV can recognize the wake-up word "Xiaoyi Xiaoyi". Further, the smart TV can activate the detection device to detect whether the user needs to watch the screen. As shown in FIG. 10 , the user's face is opposite to the screen of the smart TV, and the user makes a fist gesture.
- the fist gesture may be the first gesture in the foregoing embodiment, and may be used to indicate that the user does not need to watch the screen of the electronic device 100 .
- the smart TV can determine that the multiple frames of images collected by the camera within a preset period of time all contain a human face, and the images collected by the camera contain a fist gesture. Smart TVs can keep the screen off, run a voice assistant in the background, and interact with the user by voice. Among them, the smart TV can recognize the voice command "I want to listen to a song", and execute the operation corresponding to the voice command. For example, a smart TV can call a music app to play music while keeping the screen off.
- the electronic device when the electronic device starts the voice assistant in the off-screen state, it may not directly light up the screen, but first use the image captured by the camera to detect whether the user needs to watch the screen, and then determine whether Light up the screen.
- the user wishes to use the electronic device 100 as a sound box, the user does not need to perform a corresponding operation to turn off the screen after the electronic device 100 turns on the screen, which can simplify the user's operation.
- the electronic device 100 may include: an AP 310 , a detection device 320 , a microphone 330 , a low-power processor 340 , a speaker 350 , and a display screen 360 .
- the voice assistant 370 may be included in the AP 310 .
- the voice assistant 370 may include a voice instruction recognition module 311 and a voice instruction execution module 312 .
- the detection device 320 may include a proximity light sensor 321 , a camera 322 , and a motion sensor 323 .
- the above AP 310 may be the processor 110 in FIG. 1 , or one or more processors among multiple processors included in the processor 110 .
- the above-mentioned microphone 330 may be the microphone 170C in FIG. 1 .
- the above-mentioned speaker 350 may be the speaker 170A in FIG. 1 .
- the display screen 360 described above may be one or more of the display screens 194 in FIG. 1 .
- the aforementioned proximity light sensor 321 may be the proximity light sensor 180G in FIG. 1 .
- the aforementioned cameras 322 may be one or more of the cameras 193 in FIG. 1 .
- the aforementioned motion sensor 323 may include the acceleration sensor 180E and the gyro sensor 180B in FIG. 1 .
- the speaker 350 , the display screen 360 and the detection device 320 can be connected with the AP 310 .
- the microphone 330 , the proximity light sensor 321 , the camera 322 and the motion sensor 323 can all be connected to the AP 310 through the low-power processor 340 .
- the low-power processor 340 may be integrated with a voice wake-up module, which may be used to wake up the AP 310 when a wake-up word is recognized.
- the microphone 330 and the low-power processor 340 can be in working state all the time, and one or more of the proximity light sensor 321 , the camera 322 and the motion sensor 323 can be Always in a working state, the AP310 can be in a dormant state, and the display screen 360 is off.
- the microphone can collect voice input near the electronic device 100 from time to time, and send the voice input to the low-power processor 340 .
- the low-power processor 340 can be used to identify whether the voice input contains a preset wake-up word (or called a wake-up command, such as "Xiaoyi Xiaoyi"). When the preset wake-up word is recognized, the low-power processor 340 can wake up the AP 310.
- the AP 310 may first obtain the detection result of the detection device 320 through the low-power processor 340, and then start the voice assistant 370 after obtaining the detection result.
- the above-mentioned activating the voice assistant 370 may include activating the voice instruction recognition module 311 and the voice instruction execution module 312 .
- the AP 310 may use the speaker 350 and/or the display screen 360 to interact with the user according to the detection result.
- the voice command execution module 311 in the AP 310 can execute the recognized voice command, and broadcast the voice to the user through the speaker 350 .
- the display screen 360 remains in an off-screen state.
- the voice input collected by the microphone 330 includes a voice instruction "inquire about the weather of the day".
- the voice command recognition module 311 can acquire the voice input and recognize the voice command therein.
- the voice instruction execution module 312 may perform operations corresponding to the voice instruction. Specifically, the voice instruction execution module 312 can call the weather application to query the weather of the day (eg, temperature, air quality), and voice broadcast the result of the query of the weather of the day through the speaker.
- the voice command recognition module 311 in the AP 310 can execute the recognized voice command, broadcast the voice to the user through the speaker 350, and display the user interface involved in the voice command through the display screen 360.
- the voice input collected by the microphone 330 includes a voice instruction "inquire about the weather of the day".
- the voice command recognition module 311 can acquire the voice input and recognize the voice command therein.
- the voice instruction execution module 312 may perform operations corresponding to the voice instruction. Specifically, the voice command execution module 312 can call the weather application to query the weather of the day (such as temperature, air quality), broadcast the results of the query of the weather of the day through the speaker, and display the user interface 210 shown in FIG. 5B .
- the above-mentioned camera 322 may be a low-power camera, such as an infrared camera.
- One or more of the proximity light sensor 321 , the camera 322 and the motion sensor can be in a working state all the time, and transmit the collected data to the low-power processor 340 .
- the low-power processor 340 recognizes that the voice input received by the microphone 330 contains a wake-up word, the low-power processor 340 can wake up the AP 310 . Then, the AP 310 may acquire the data collected by the detection device 320 from the low-power processor 340 and determine whether the user watches the display screen 360 .
- the proximity light sensor 321 , the camera 322 , and the motion sensor 323 may be connected to the AP 310 .
- the low-power processor 340 recognizes that the voice input received by the microphone 330 contains a wake-up word, the low-power processor 340 can wake up the AP 310 . Then.
- the AP 310 can activate one or more of the proximity light sensor 321 , the camera 322 , and the motion sensor 323 . Further, the AP 310 may determine whether the user watches the display screen 360 according to the data collected by the detection device 320 .
- the electronic device 100 in response to a user operation acting on a preset physical button for starting the voice assistant, the electronic device 100 whose screen is in an off-screen state can wake up the AP 310 .
- the above-mentioned preset physical keys may be one or more of the following keys on the electronic device 100: a power key, a volume up key, and a volume down key.
- the above-mentioned user operation for starting the voice assistant may be a long-press operation acting on the power button, and the long-press time is, for example, 1 second or 2 seconds, which is not limited in this embodiment of the present application.
- the user can activate the voice assistant by long pressing the power button.
- the AP 310 can detect through the detection device 320 according to the foregoing embodiment, and activate the voice command recognition module 311 and the voice command execution module 312 to execute the user's voice command, which will not be repeated here.
- the electronic device 100 may also include more or fewer components.
- the detection device can be used to detect whether the user needs to watch the screen, and then intelligently decide whether to light the screen. If it is detected that the user does not need to watch the screen, the electronic device 100 may run a voice assistant in the background to perform voice interaction with the user. In this way, the power consumption of the electronic device can be saved, and false touches can be avoided. If it is detected that the user needs to watch the screen, the electronic device 100 can light up the screen and interact with the user in a combination of a graphical interface and voice.
- the electronic device may detect the user's first operation when the screen is in an off-screen state. This first operation can be used to activate the voice assistant.
- the above-mentioned first operation may be a user operation in which the user speaks a wake-up word (such as "Xiaoyi Xiaoyi") in the foregoing embodiment.
- a wake-up word such as "Xiaoyi Xiaoyi"
- the first key may include one or more of the following: a power key, a volume up key, and a volume down key.
- the electronic device starts the voice assistant while keeping the screen off, and enables the voice assistant to interact with the user in the first manner.
- the above-mentioned first situation may be a situation in which the user does not watch the screen of the electronic device.
- the first sensor of the electronic device can detect that there is an object occlusion within a preset distance of the screen.
- the electronic device can determine that the user is not viewing the screen of the electronic device. As shown in FIG.
- the first sensor of the electronic device can detect that the screen does not exist within a preset distance of the screen. objects, and no faces were detected by the second sensor. Thus, the electronic device can determine that the user is not viewing the screen of the electronic device.
- the above-mentioned first manner may be to interact with the user only through voice.
- the electronic device may, in the second case, light up the screen, activate the voice assistant, and make the voice assistant interact with the user in the second manner.
- the above-mentioned second manner includes interacting with the user through a graphical interface.
- the above-mentioned second situation may be a situation in which the user watches the screen of the electronic device.
- the first sensor of the electronic device can detect that there is no object blocking within a preset distance of the screen, And the face is detected by the second sensor.
- the electronic device can determine that the user is viewing the screen of the electronic device. As shown in FIG.
- the electronic device when the user performs a hand-raising action, for example, the electronic device with the screen facing upward is changed from a horizontally placed posture to an inclined or vertical posture, and the screen of the electronic device after the posture adjustment can be opposite to the human face,
- the electronic device may detect that the posture of the electronic device is switched from the first posture to the second posture through the third sensor. Thus, the electronic device can determine that the user is viewing the screen of the electronic device.
- the above-mentioned first posture may be, for example, a posture in which the screen of the electronic device is placed horizontally upward.
- the above-mentioned second posture may be, for example, a posture in which the screen is tilted upward.
- the electronic device enables the voice assistant to interact with the user in the first manner, specifically, the electronic device enables the voice assistant to only run the first program.
- the first program may be a program for voice interaction with the user.
- the electronic device can interact with the user only through voice.
- the electronic device can run the second program again.
- the second program may be a program for obtaining a graphical interface for interacting with the user. That is to say, during the interaction between the voice assistant and the user, the electronic device can make the voice assistant run a related program for drawing a graphical interface when the power screen is required.
- the electronic device enables the voice assistant to interact with the user in the first manner.
- the electronic device may enable the voice assistant to run the second program and the first program. That is to say, when it is detected that the user does not need to watch the screen, the electronic device can still enable the voice assistant to run the relevant program for drawing the graphical interface. But the electronic device does not light up the screen for display. Further, when it is detected that the user needs to watch the screen, the electronic device can directly display the graphical interface obtained by running the second program on the screen. In this way, the electronic device can reduce the time delay for drawing the graphical interface.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Environmental & Geological Engineering (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
本申请提供一种语音交互方法及电子设备,涉及人工智能领域。在该方法中,响应于用于启动语音助手的用户操作,屏幕处于灭屏状态的电子设备可以启动检测装置来检测用户是否需要观看屏幕,进而智慧决策是否点亮屏幕。若确定用户不需要观看屏幕,电子设备可以保持屏幕处于灭屏状态,通过语音的方式与用户交互。这样,电子设备可以节省功耗,并且避免误触。
Description
本申请要求于2020年08月31日提交中国专利局、申请号为202010901726.8、申请名称为“一种语音交互方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能领域,尤其涉及一种语音交互方法及电子设备。
随着电子设备的发展,越来越多的电子设备中配置有可以与用户进行语音交互的应用,例如语音助手。用户可以通过语音助手与电子设备进行语音交互,来实现以前需要多次手动操作才能实现的功能。例如,打电话、播放音乐等。
目前,用户通过语音助手与有屏幕的电子设备进行语音交互时,电子设备会直接点亮屏幕,显示语音助手的用户界面以及执行接收到的语音指令所得到的相关内容。
发明内容
本申请提供了一种语音交互方法及电子设备。在该方法中,处于灭屏状态的电子设备可以在启动语音助手时,检测用户是否需要观看屏幕来智慧决策是否点亮屏幕。其中,在检测出用户不需要观看屏幕时,电子设备可以保持屏幕处于灭屏状态,通过语音的方式与用户交互。这样,电子设备可以节省功耗,并且避免误触。
第一方面,本申请实施提供了一种语音交互方法。该方法包括:电子设备可在屏幕处于灭屏状态下检测到用户的第一操作,第一操作可用于启动语音助手。电子设备可在第一情况下,在保持屏幕处于灭屏状态下启动语音助手,并使语音助手以第一方式与用户进行交互,第一方式为仅通过语音与用户进行交互。第一情况可以包括以下中的任一种:
通过第一传感器检测到屏幕的预设距离内不存在物体遮挡且通过第二传感器未检测到人脸;或,
通过第二传感器未检测到人脸;或,
通过第一传感器检测到屏幕的预设距离内存在物体遮挡。
结合第一方面,电子设备可以在第二情况下,点亮屏幕,启动语音助手,并使语音助手以第二方式与用户进行交互,第二方式包括通过图形界面与用户进行交互。第二情况包括以下中的任一种:
通过第一传感器检测到屏幕的预设距离内不存在物体遮挡且通过第二传感器检测到人脸;或,
通过第一传感器检测到屏幕的预设距离内不存在物体遮挡且通过第三传感器检测到电子设备的姿态从第一姿态切换到第二姿态;或
通过第二传感器检测到人脸;或,
通过第三传感器检测到电子设备的姿态从第一姿态切换到第二姿态。
结合第一方面,上述第一传感器可包括以下一项或多项:接近光传感器、红外光传感器、 雷达传感器。上述第二传感器可包括摄像头。上述第三传感器肯包括运动传感器。其中,运动传感器包括以下一项或多项:加速度传感器、陀螺仪传感器。
上述第一情况可以为用户未观看电子设备的屏幕的情况。示例性的,在用户将电子设备屏幕朝下放置于桌上时,电子设备的第一传感器可以检测到屏幕的预设距离内存在物体遮挡。从而,电子设备可以判断出用户未观看电子设备的屏幕。在用户将电子设备屏幕朝上放置于桌上,但并未将脸部与电子设备的屏幕相对时,电子设备的第一传感器可以检测到屏幕的预设距离内不存在物体,并通过第二传感器未检测到人脸。从而,电子设备可以判断出用户未观看电子设备的屏幕。
上述第二情况可以为用户观看电子设备的屏幕的情况。示例性的,在用户将电子设备的屏幕朝上放置于桌上,且将脸部与电子设备的屏幕相对时,电子设备的第一传感器可以检测到屏幕的预设距离内不存在物体遮挡,并通过第二传感器检测到人脸。从而,电子设备可以判断出用户观看电子设备的屏幕。在用户做抬手动作,例如将屏幕朝上的电子设备从水平放置的姿态变化为倾斜或竖直放置的姿态,姿态调整后的电子设备的屏幕可以与人脸相对,电子设备可以通过第三传感器检测到电子设备的姿态从第一姿态切换为第二姿态。从而,电子设备可以判断出用户观看电子设备的屏幕。上述第一姿态可以例如是电子设备屏幕朝上水平放置的姿态。上述第二姿态可以例如是屏幕朝上倾斜放置的姿态。
由上述方法可知,电子设备可以在检测到用户不需要观看屏幕时,保持屏幕处于灭屏状态,并与用户进行语音交互,从而节省电子设备的功耗,并且避免误触。
在一些实施例中,处于灭屏状态的电子设备可以启动语音助手时,先利用第一传感器(如接近光传感器)检测屏幕的预设距离内是否存在物体遮挡。若检测到屏幕的预设距离内存在物体遮挡,电子设备可以直接判断出用户未观看屏幕。这样,电子设备可以不启动第二传感器(如摄像头)来检测人脸,从而节省电子设备的功耗。由于在电子设备的屏幕未被遮挡时,电子设备无法直接判断出用户是否观看屏幕,电子设备可以进一步检测人脸来确定用户是否观看屏幕。即若先利用第一传感器检测到屏幕的预设距离内不存在物体遮挡,电子设备可以再启动第二传感器来检测是否存在人脸。若检测到人脸,则电子设备可以判断出用户观看屏幕,进而点亮屏幕,通过图形界面的方式和语音的方式与用户交互。若未检测到人脸,则电子设备可以判断出用户未观看屏幕,进而保持灭屏状态,仅通过语音的方式与用户交互。
结合第一方面,上述第一传感器、上述第二传感器、上述第三传感器均可在电子设备通过语音助手与用户交互的过程中持续工作。
在一些实施例中,在语音助手以第一方式与用户进行交互的过程中,电子设备检测到第二情况,则电子设备可以点亮屏幕并使得语音助手以第二方式与用户进行交互。也即是说,若在启动语音助手时判断出用户未观看屏幕,电子设备可以保持灭屏状态与用户仅通过语音的方式交互。进一步的,在上述仅通过语音的方式交互的过程中,若判断出用户观看屏幕,电子设备可以点亮屏幕通过图形界面和语音的方式与用户进行交互。
在一些实施例中,在语音助手以第一方式与用户进行交互的过程中,电子设备可以接收到用户输入的第一语音,并对第一语音进行识别。当识别出第一语音满足第一条件,电子设备可以点亮屏幕,并使得语音助手以第二方式与用户进行交互。其中,第一语音满足第一条件可包括:第一语音中包括以下一项或多项:第一类关键词、第二类关键词,其中,第一类关键词包括以下一类或多类应用程序名称:视频类、购物类、导航类;第二类关键词包括以下一项或多项动词:查看、显示。也即是说,电子设备还可以通过对接收到的用户输入的语音指令进行分析来判断用户是否需要观看屏幕。其中,对于视频类、购物类、导航类的应用 程序,往往是需要用户观看屏幕的。当检测到语音指令中包含有上述类别的应用程序,电子设备可以判断出用户需要观看屏幕。另外,若语音指令中有指示用户需要观看屏幕的动词,例如查看、显示等,电子设备也可以认为用户需要观看屏幕。
由于在启动语音助手时,电子设备根据上述第一传感器和/或上述第二传感器判断出用户未观看屏幕,电子设备可以先保持灭屏状态与用户仅通过语音的方式交互。在上述仅通过语音的方式交互的过程中,若检测到接收到的语音指令中包含上述第一类关键词和/或上述第二类关键词,电子设备可以点亮屏幕,通过图形界面的方式和语音的方式与用户交互。
在一些实施例中,在语音助手以第二方式与用户进行交互的过程中,电子设备检测到第一情况。进一步的,电子设备可以熄灭屏幕,并使语音助手以第一方式与用户进行交互。也即是说,若电子设备利用第一传感器、第二传感器、第三传感器中的一个或多个检测到用户观看屏幕,电子设备可以点亮屏幕,通过图形界面的方式和语音的方式与用户交互。在上述通过图形界面的方式和语音的方式与用户交互的过程中,若检测到用户未观看屏幕(如用户离开电子设备,用户的脸部不在与屏幕相对,或者用户将电子设备屏幕朝下放置于桌上),电子设备可以熄灭屏幕,与用户仅通过语音的方式交互。这样,可以节省电子设备的共享,并避免误触。
在上述实施例中,在通过图形界面的方式和语音的方式与用户交互的过程中,电子设备可以响应于用户停止语音播放的用户操作,不再通过语音的方式与用户交互,而仅通过图形界面的方式与用户交互。
上述第一传感器、上述第二传感器、上述第三传感器在电子设备通过语音助手与用户交互的过程中持续工作。这样,电子设备可以在语音助手与用户交互的过程中,实时检测用户是否需要观看屏幕。若检测到用户需要观看屏幕,电子设备可以点亮屏幕。若检测到用户不需要观看屏幕,电子设备可以熄灭屏幕。由上述方法可知,电子设备可以在语音助手与用户交互的过程中,智慧决策是否点亮屏幕。这不仅可以节省电子设备的共享,避免误触,还可以提升用户使用语音助手的体验。
结合第一方面,电子设备检测到第一情况的时间可以包括以下情况中的任一种情况:
在检测到第一操作时,电子设备检测到第一情况。或者,
在检测到第一操作后的第一时间,电子设备检测到第一情况;其中,第一时间与电子设备检测到第一操作的时间之间的间隔小于第一时长。或者,
在检测到第一操作前的第二时间,电子设备检测到第一情况;其中,第二时间与电子设备检测到第一操作的时间之间的间隔小于第二时长。
结合第一方面,电子设备检测到第二情况的时间可以包括以下情况中的任一种情况:
在检测到第一操作时,电子设备检测到第二情况。或者,
在检测到第一操作后的第一时间,电子设备检测到第二情况;其中,第一时间与电子设备检测到第一操作的时间之间的间隔小于第一时长。或者,
在检测到第一操作前的第二时间,电子设备检测到第二情况;其中,第二时间与电子设备检测到第一操作的时间之间的间隔小于第二时长。
结合第一方面,电子设备使语音助手以第一方式与用户进行交互,具体可以为:电子设备可以使语音助手仅运行第一程序。或,电子设备可以使语音助手运行第二程序和第一程序。
其中,第一程序可以为用于与用户进行语音交互的程序,第二程序可以为用于得到与用户交互的图形界面的程序。
结合第一方面,电子设备在屏幕处于灭屏状态下检测到第一操作,具体可以为:电子设 备在屏幕处于灭屏状态下接收到用户输入的第二语音。第二语音可包括用于启动语音助手的唤醒词。或者,电子设备在屏幕处于灭屏状态下检测到作用于第一按键的长按操作。第一按键可包括以下一项或多项:电源键、音量上键、音量下键。
第二方面,本申请实施提供了一种电子设备。该电子设备可包括:屏幕、输入装置、检测装置、至少一个处理器。上述检测装置包括以下一项或多项:第一传感器、第二传感器。输入装置可用于在屏幕处于灭屏状态下检测到用户的第一操作;第一操作用于启动语音助手。检测装置可用于在上述输入装置检测到用户的第一操作的情况下,检测是否存在第一情况。其中,第一情况包括以下中的任一种:
通过第一传感器检测到屏幕的预设距离内不存在物体遮挡且通过第二传感器未检测到人脸;或,
通过第二传感器未检测到人脸;或,
通过第一传感器检测到屏幕的预设距离内存在物体遮挡;
处理器可用于在检测装置检测到第一情况时,在保持屏幕处于灭屏状态下启动语音助手,并使语音助手以第一方式与用户交互。第一方式可以为仅通过语音与用户进行交互。
结合第二方面,检测装置还包括第三传感器。
检测装置还可用于检测是否存在第二情况;其中,第二情况包括以下中的任一种:
通过第一传感器检测到屏幕的预设距离内不存在物体遮挡且通过第二传感器检测到人脸;或,
通过第一传感器检测到屏幕的预设距离内不存在物体遮挡且通过第三传感器检测到电子设备的姿态从第一姿态切换到第二姿态;或
通过第二传感器检测到人脸;或,
通过第三传感器检测到电子设备的姿态从第一姿态切换到第二姿态。
处理器还可用于在检测装置检测到第二情况时,点亮屏幕,启动语音助手,并使语音助手以第二方式与用户进行交互。第二方式可包括通过图形界面与用户进行交互。
在一些实施例中,第二方式可以为通过图形界面和语音与用户进行交互。
在本申请提供的一些实施例中,第一传感器可包括以下一项或多项:接近光传感器、红外光传感器、雷达传感器。第二传感器可包括摄像头。第三传感器可包括运动传感器;其中,运动传感器包括以下一项或多项:加速度传感器、陀螺仪传感器。
结合第二方面,上述第一传感器、上述第二传感器、上述第三传感器均可在电子设备通过语音助手与用户交互的过程中持续工作。
在一些实施例中,检测装置还可用于在语音助手以第一方式与用户进行交互的过程中,检测是否存在第二情况。处理器还可用于在检测装置检测到第二情况时,点亮屏幕并使得语音助手以第二方式与用户进行交互,第二方式包括通过图形界面与用户进行交互。
在一些实施例中,输入装置还可用于在语音助手以第一方式与用户进行交互的过程中,接收到用户输入的第一语音。处理器还可用于对第一语音进行识别,并在识别出第一语音满足第一条件的情况下,点亮屏幕,使得语音助手以第二方式与用户进行交互。第二方式包括通过图形界面与所述用户进行交互。其中,第一语音满足第一条件可包括:第一语音中包括以下一项或多项:第一类关键词、第二类关键词,其中,第一类关键词包括以下一类或多类应用程序名称:视频类、购物类、导航类;第二类关键词包括以下一项或多项动词:查看、显示。
在一些实施例中,检测装置还可用于在语音助手以第二方式与用户进行交互的过程中, 检测是否存在第一情况。处理器还可用于在检测装置检测到第一情况时,熄灭屏幕,并使语音助手以第一方式与用户进行交互。
由上述方法可知,电子设备可以在检测到用户不需要观看屏幕时,保持屏幕处于灭屏状态,并与用户进行语音交互,从而节省电子设备的功耗,并且避免误触。
第三方面,本申请实施还提供了一种语音交互方法。该方法包括:电子设备可以在屏幕处于灭屏状态下检测到用户的第一操作,第一操作用于启动语音助手。电子设备可以在第三情况下,在保持屏幕处于灭屏状态下启动语音助手,并使语音助手以第一方式与用户进行交互,第一方式为仅通过语音与用户进行交互。第三情况可以包括:通过摄像头检测到人脸且检测到第一手势。
结合第三方面,电子设备可以在第四情况下,点亮屏幕,启动语音助手,并使语音助手以第二方式与用户进行交互,第二方式包括通过图形界面与用户进行交互。第四情况可以包括:通过摄像头检测到人脸且未检测到第一手势。
由上述语音交互的方法可知,电子设备可以根据摄像头采集的图像来判断是否检测到人脸,以及是否检测到第一手势,进而检测用户是否需要观看屏幕。电子设备可以根据摄像头是否需要观看屏幕,来确定在灭屏状态下启动语音助手时是否点亮屏幕。当检测到用户不需要观看屏幕,电子设备可以保持屏幕处于灭屏状态,与用户进行语音交互。这样,在用户使用电子设备且不观看电子设备的屏幕的场景中,用户无需在电子设备点亮屏幕后,再进行相应的操作来熄灭屏幕,从而简化了用户将电子设备作为音箱使用的操作。
第四方面,本申请实施提供了一种电子设备。该电子设备可包括:屏幕、输入装置、摄像头、至少一个处理器。其中:输入装置可用于在屏幕处于灭屏状态下检测到用户的第一操作;第一操作用于启动语音助手。摄像头可用于在输入装置检测到用户的第一操作的情况下,检测是否存在第三情况;其中,第三情况包括:通过摄像头检测到人脸且检测到第一手势。处理器可用于在摄像头检测到第三情况时,在保持屏幕处于灭屏状态下启动语音助手,并使语音助手以第一方式与用户交互;第一方式为仅通过语音与用户进行交互。
结合第四方面,摄像头还可用于检测是否存在第四情况;第四情况包括:通过摄像头检测到人脸且未检测到第一手势。处理器还可用于在摄像头检测到第四情况时,点亮屏幕,启动语音助手,并使语音助手以第二方式与用户进行交互,第二方式包括通过图形界面与用户进行交互。
第五方面,本申请实施例提供一种芯片,该芯片应用于第二方面提供的电子设备或第四方面提供的电子设备,该芯片包括一个或多个处理器,该一个或多个处理器用于调用计算机指令以使得第二方面提供的电子设备执行如第一方面中任一可能的实现方式,或使得第四方面提供的电子设备执行如第三方面中任一可能的实现方式。
第六方面,本申请实施例提供一种包含指令的计算机程序产品,当上述计算机程序产品在设备上运行时,使得上述第二方面提供的电子设备执行如第一方面中任一可能的实现方式,或使得上述第四方面提供的电子设备执行如第三方面中任一可能的实现方式。
第七方面,本申请实施例提供一种计算机可读存储介质,包括指令,当上述指令在设备上运行时,使得上述第二方面提供的电子设备执行如第一方面中任一可能的实现方式,或使得上述第四方面提供的电子设备执行如第三方面中任一可能的实现方式。
可以理解地,上述第五方面提供的芯片、第六方面提供的计算机程序产品和第七方面提供的计算机可读存储介质均用于执行本申请实施例所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。
图1为本申请实施例提供的一种电子设备的结构示意图;
图2、图3为本申请实施例提供的电子设备保持屏幕处于灭屏状态与用户进行语音交互的场景示意图;
图4A、图4B、图5A、图5B、图6A和图6B为本申请实施例提供的电子设备点亮屏幕,通过图形界面和语音的方式与用户交互的场景示意图;
图7A~图7E为本申请实施例提供的一组语音交互的场景示意图;
图8A~图8D为本申请实施例提供的另一组语音交互的场景示意图;
图9为本申请实施例提供的一种语音交互的场景示意图;
图10为本申请实施例提供的另一种语音交互的场景示意图;
图11为本申请实施例提供的另一种电子设备的结构示意图;
图12为本申请实施例提供的一种语音交互的方法流程图;
图13为本申请实施例提供的另一种语音交互的方法流程图。
下面将结合附图对本申请实施例中的技术方案进行清除、详尽地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,本文中所涉及的“第一”、“第二”……之类的描述仅仅用来将一个对象或者操作与另一个对象或操作区分开来,而不一定要求或者暗示这些对象或操作之间存在任何这种实际的关系或者顺序,也不一定要求或者暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
目前,用户通过语音助手与有屏幕的电子设备进行语音交互时,电子设备可以输出语音来响应用户的语音指令。另外,电子设备会直接点亮屏幕,显示语音助手的用户界面以及执行接收到的语音指令所得到的相关内容。但在一些场景中,例如电子设备放置于口袋中,用户通过语音助手与电子设备进行语音交互时,并不需要观看屏幕。而电子设备点亮屏幕显示语音助手的用户界面会浪费电子设备的功耗,并且容易导致误触。
基于上述问题,本申请实施例提供了一种语音交互的方法和电子设备。电子设备可配置有检测装置,例如,摄像头、接近光传感器、运动传感器。在该方法中,当接收到用于启动语音助手的第一操作,例如,当接收到包含有预设的唤醒词的语音输入,电子设备可以启动上述检测装置来检测用户是否需要观看屏幕。当根据检测装置的检测结果检测到用户不需要观看屏幕,电子设备可以与用户进行语音交互,而不点亮屏幕。这样,在一些用户无需观看屏幕的场景下,电子设备可以仅通过语音的方式与用户交互,而不点亮屏幕,从而节省了电子设备的功耗,并减少误触。
示例性的,电子设备可以利用摄像头采集图像,并根据摄像头采集的图像中是否包括人脸来检测用户是否需要观看屏幕。当上述摄像头采集的图像中不包括人脸,电子设备可以在 后台运行语音助手,通过语音的方式与用户交互,而不显示语音助手的用户界面。
电子设备可以利用接近光传感器来判断电子设备的屏幕是否被遮挡。当电子设备的屏幕被遮挡,例如,电子设备放置于口袋中、电子设备的屏幕下放置于桌上,用户一般不需要观看屏幕。也即,当根据接近光传感器检测到电子设备的屏幕被遮挡,电子设备可以在后台运行语音助手,通过语音的方式与用户交互,而不显示语音助手的用户界面。
电子设备可以利用运动传感器来检测电子设备的姿态变化,并根据电子设备姿态的变化来检测用户是否需要观看屏幕。例如,用户拿起电子设备并执行抬手动作或翻转动作时,电子设备的姿态发生变化。电子设备可以点亮屏幕,以实现抬手亮屏。电子设备可以在不显示语音助手的用户界面的情况下,通过语音的方式与用户交互。当电子设备检测到抬手动作,电子设备可以点亮屏幕,显示语音助手的用户界面,并结合语音的方式与用户交互。
在本申请实施例中,不限于接近光传感器,电子设备还可以通过其它类型的传感器来检测屏幕是否被遮挡,例如红外光传感器、雷达传感器等。
不限于摄像头,电子设备还可以通过其它类型的传感器来检测屏幕是否与人脸相对。
图1示例性示出了一种电子设备100的结构示意图。
下面以电子设备100为例对本申请的实施例进行具体说明。应该理解的是,图1所示的电子设备100仅是一个范例,电子设备100可以具有比图1中所示的更多或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图1中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,加速度传感器180E,接近光传感器180G,指纹传感器180H,触摸传感器180K等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
在一些实施例中,处理器110可以包括语音唤醒模块和语音指令识别模块。其中,语音唤醒模块和语音指令识别模块可以集成在不同的处理器芯片中,由不同的芯片执行。例如,语音唤醒模块可以集成在功耗较低的协处理器或DSP芯片中,语音指令识别模块可以集成在AP或NPU或其他芯片中。这样,可以在语音唤醒模块识别到预设的语音唤醒词后,再启动 语音指令识别的模块所在的芯片触发语音指令识别功能,从而节省电子设备的功耗。或者,语音唤醒模块和语音指令识别模块可以集成在相同的处理器芯片中,由同一芯片执行相关功能。例如,语音唤醒模块和语音指令识别模块均可集成在AP芯片或NPU或其他芯片中。
处理器110还可以包括语音指令执行模块,即在识别到语音指令后,执行语音指令对应的操作。例如,语音助手。语音助手可以为包括语音指令识别功能的应用。当识别到语音指令后,语音助手可直接执语音指令对应的操作。或者,若语音指令对应的操作涉及第三应用,则语音助手可调用第三应用执行相应的操作。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递 到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,抬手亮屏,计步器等应用。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电和防误触的目的。电子设备100可以利用接近光传感器180G检测在语音助手启动时,电子设备100的屏幕是否被遮挡,以便在电子设备100的屏幕有遮挡时通过语音播报的方式与用户交互,而不点亮屏幕,达到省电和防误触的目的。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。
下面介绍电子设备100通过识别唤醒词来启动语音助手的方法。
在一些实施例中,电子设备100可以通过麦克风接收到语音输入。其中,当用户在电子设备100附近说出唤醒语音,该语音输入中可包含唤醒语音。在接收到该语音输入后,电子设备100可以从该语音输入中分离出用户的唤醒语音。接着,电子设备100可以从唤醒语音中,利用声学模型从用户的语音信号中的解码出音素序列。在从唤醒语音中解码出音素序列后,电子设备100可以判断该解码出的音素序列中是否与已存储的唤醒词音素序列匹配,若是,则表明该唤醒语音中有唤醒词。当确定该唤醒语音中有唤醒词,电子设备100可以启动语音助手。
在另一些实施例中,电子设备100可以通过麦克风接收到语音输入。其中,当用户在电子设备100附近说出唤醒语音时,该语音输入中可以包括有唤醒语音。在接收到该语音输入后,电子设备100可以从该语音输入中分离出用户的唤醒语音。接着,电子设备100可以从唤醒语音中,利用声学模型从用户的语音信号中的解码出音素序列。然后,通过语音模型以及语音模型的发音字典,从解码出来的音素序列中进一步解码出文字信息。在电子设备100解码出文字信息后,电子设备100可以判断从唤醒语音中解码出的文字信息是否包括有已存储的唤醒词文本,若是,则表明该用户的语音信号中有唤醒词。当确定该唤醒语音中有唤醒词,电子设备100可以启动语音助手。
在本申请实施例中,启动语音助手可以为电子设备100启动应用处理器中的语音指令识别模块和语音指令执行模块。启动后的语音指令识别模块可用于识别麦克风采集的语音输入 中的语音指令,启动后的语音指令执行模块可用于执行所识别的语音指令。启动语音助手也可以称之为唤醒语音助手。
需要进行说明的是,当电子设备100的语音唤醒功能开启,电子设备100的语音唤醒模块可以时时处于工作状态。当语音唤醒模块从麦克风采集的语音输入中识别到唤醒词,电子设备100可以启动语音助手。
电子设备100识别语音指令的过程,可以参考前述电子设备100识别唤醒词的过程,这里不再赘述。
在本申请实施例中,电子设备100可通过麦克风采集语音输入。该语音输入可包括唤醒词和/或语音指令。其中,当用户一次性说出唤醒词和语音指令,例如“小艺小艺,我要给张三发短信”,电子设备100得到的语音输入中可包含唤醒词和语音指令。当用户只说出唤醒词,例如“小艺小艺”,电子设备100得到的语音输入即为唤醒词。在语音助手启动后,用户与语音助手进行语音交互的过程中用户可以只说语音指令,例如“我要给张三发短信”,电子设备100得到的语音输入即为语音指令。
除了上述通过识别唤醒词来启动语音助手,电子设备100还可以通过检测到的其他用户操作来启动语音助手。例如,响应于长按电源键的用户操作,电子设备100可以启动语音助手。上述长按电源键的时间可以为1秒或2秒,本申请实施例对此不作限定。
本申请实施例对用于启动语音助手的第一操作不作限定,该第一操作还可以为其他用户启动语音助手的用户操作。
在一种可能的实现方式中,电子设备100可以从用户的语音信号中提取出唤醒词和用户的声纹特征,当唤醒词与已存储的唤醒词模板匹配且用户的声纹特征与已存储的声纹特征模板匹配时,电子设备100可以启动检测装置(如接近光传感器、摄像头、运动传感器)检测用户是否需要观看屏幕,并识别接下来用户输入的语音指令。这样,可以实现由特定的用户才能启动语音助手识别并执行语音指令,提高了终端的信息安全。
下面介绍本申请实施例提供的一种语音交互方法。
屏幕处于灭屏状态的电子设备100可以响应第一操作,启动语音助手以及检测装置。
其中,电子设备100的屏幕处于灭屏状态可以指电子设备100的屏幕熄灭。其中,电子设备100中包含于屏幕的发光器件,例如发光二极管,均未发光。或者,电子设备100的屏幕处于灭屏状态还可以指电子设备100中包含于屏幕的发光器件有较少部分发光。示例性的,电子设备100开启熄屏显示功能。响应于熄灭屏幕的用户操作,例如作用于电源键的用户操作,电子设备100可以熄灭屏幕并在屏幕上显示时间。上述灭屏状态也可以称为黑屏状态或熄屏状态。
另外,电子设备100可以处于亮屏状态。其中,电子设备100中包含于屏幕的发光器件,例如发光二极管,可以均处于发光状态。并且,电子设备100的应用处理器可以处于工作状态。
电子设备100可以利用接近光传感器和摄像头作为检测装置来检测用户是否需要观看屏幕。
其中,电子设备100检测用户是否需要观看屏幕的过程可以参考图12所示的方法流程图。
下面具体以第一操作为用户说出预设的唤醒词(如“小艺小艺”)进行说明。
电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启。当从麦克风采 集的语音输入中识别到唤醒词,电子设备100可以先启动接近光传感器,来检测屏幕的预设距离内是否存在物体遮挡。
其中,屏幕的预设距离内存在物体遮挡,可以表示屏幕被遮挡,用户不需要观看电子设备的屏幕。例如,电子设备放置于口袋中,电子设备屏幕朝下放置于桌上。
屏幕的预设距离内不存在物体遮挡,可以表示屏幕未被遮挡。但在屏幕未被遮挡的情况下,用户可能需要观看电子设备的屏幕,也可能不需要观看电子设备的屏幕。例如,电子设备屏幕朝上放置于桌上,但用户没有看屏幕。
当根据接近光传感器检测到屏幕的预设距离内存在物体遮挡,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。例如,当接收到“查询天气”的语音指令,电子设备100可以保持屏幕处于灭屏状态,语音播报天气。上述语音指令还可以例如是:拨打电话、发送短信、播放音乐、控制智能家居设备。
当根据接近光传感器检测到屏幕的预设距离内不存在物体遮挡,电子设备100可以启动摄像头,来检测是否存在人脸。
其中,当用户需要观看屏幕时,用户的脸部会与屏幕相对,并且停留一段时间。电子设备通过摄像头(如前置摄像头),在一段连续的时间段内(如1秒、2秒),可以采集到多帧均包含人脸的图像。
当用户的脸部未与屏幕相对,摄像头采集的图像中不包含人脸。或者,当用户的脸部在屏幕前方闪过,摄像头在一段连续的时间段内采集的多帧图像中存在不包含人脸的图像。对于上述用户的脸部未与屏幕相对以及用户的脸部在屏幕前方闪过的场景,可以认为用户在这些场景中不需要观看屏幕。
当确定摄像头在预设时间段内采集的多帧图像中存在不包含人脸的图像,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。
当确定摄像头在预设时间段内采集的多帧图像中均包含人脸,电子设备100可以点亮屏幕,显示图形界面,并与用户进行语音交互。
其中,摄像头采集的一帧图像中包含人脸可以表示这一帧图像中包含完整的人脸或正面的人脸。若一帧图像中包含侧脸或不完整的人脸,电子设备100可以确定出这一帧图像中不包含人脸。
上述图形界面可以是语音助手的用户界面。或者,当语音指令中涉及显示第三应用的用户界面,例如,语音指令为“查看图库”、“播放视频”,上述图形界面可以是第三应用的用户界面。
在上述语音交互方法中,在电子设备100仅通过语音与用户进行交互时,电子设备100可以运行用于得到与用户进行交互的语音的第一程序以及用于得到与用户进行交互的图形界面的第二程序。其中,当判断出用户不需要观看屏幕,电子设备100可以保持屏幕处于灭屏状态。即电子设备100不会在屏幕上显示运行上述第二程序得到的图形界面。当判断出用户需要观看屏幕,电子设备100可以点亮屏幕,在屏幕上显示运行上述第二程序得到的图形界面。这样,在电子设备100利用语音助手与用户进行交互时,若判断出用户需要观看屏幕,电子设备100可以迅速将运行第二程序得到的图形界面显示在屏幕上,从而减少绘制图形界面的时延。
可选的,在电子设备100仅通过语音与用户进行交互时,电子设备100可以仅运行上述第一程序。然后,电子设备100可以通过扬声器输出运行上述第一程序得到的语音,来实现与用户的交互。电子设备100可以在判断出用户需要观看屏幕时,运行上述第二程序。进一 步的,电子设备100点亮屏幕,在屏幕上显示运行上述第二程序得到的图形界面。这样,电子设备100可以在判断出用户不需要观看屏幕时只运行上述第一程序,从而节省功耗。
相较于摄像头工作的功耗,接近光传感器工作的功耗较低。电子设备100先通过接近光传感器判断屏幕是否被遮挡,并在屏幕被遮挡的情况下,也即此时用户没有观看屏幕的需求,电子设备100可以保持屏幕处于灭屏状态,在后台启动和运行语音助手,与用户进行语音交互。当通过接近光传感器无法判定用户是否需要观看屏幕时,即在屏幕未被遮挡的情况下,电子设备100可以启动摄像头作进一步检测。这样,电子设备100可以节省在检测用户是否需要观看屏幕时的功耗。
需要进行说明的是,在上述语音交互方法中,本申请实施例对电子设备100启动语音助手和启动检测装置的时间先后顺序不作限定。
在一种可能的实现方式中,上述接近光传感器和上述摄像头可以实时处于工作状态。电子设备可以在检测到上述第一操作后获取上述接近光传感器和/或上述摄像头采集的数据,来确定用户是否需要观看屏幕。
在一种可能的实现方式中,电子设备100在启动语音助手时,可以输出开启提示,来提示用户输入语音指令。该开启提示可以是语音提示、文本提示、机械振动提示中的一种或多种。例如,该语音提示可以为电子设备100语音播报“嗨,我正在听”。该文本提示可以为电子设备100在屏幕上显示文本“嗨,我正在听”。
可选的,响应于第一操作,电子设备100可以同时启动语音助手和检测装置。电子设备100可以先在屏幕为灭屏状态时,通过语音的方式输出开启提示。也即电子设备100可以先语音播报“嗨,我正在听”。待通过检测装置进行检测之后,电子设备100可以确定是否点亮屏幕。例如,当确定检测到人脸,电子设备100可以点亮屏幕,通过文本的方式输出开启提示。也即电子设备100可以在屏幕上显示文本“嗨,我正在听”。若根据检测装置检测到用户不需要观看屏幕,电子设备100可以保持屏幕处于灭屏状态。
可选的,响应于第一操作,电子设备100可以先启动检测装置进行检测,待确定用户是否需要观看屏幕后,再启动语音助手。若根据检测装置检测到用户需要观看屏幕,电子设备100可以点亮屏幕,显示文本“嗨,我正在听”,并且语音播报“嗨,我正在听”。也即电子设备100可以通过文本和语音的方式输出开启提示。若根据检测装置检测到用户不需要观看屏幕,电子设备100可以保持屏幕处于灭屏状态,通过语音的方式输出开启提示。
下面结合应用场景,具体介绍本申请提供的语音交互方法。
图2示例性示出了一种电子设备保持屏幕处于灭屏状态与用户进行语音交互的场景示意图。
如图2所示,电子设备100被放置于口袋中。电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启,麦克风可实时采集电子设备100附近的语音输入,电子设备100可以识别该语音输入中是否包含有预设的唤醒词。这样,用户可以通过说出预设的唤醒词来启动语音助手。
当用户在电子设备100附近说出“小艺小艺,我要给张三发短信”,电子设备100可以识别到唤醒词“小艺小艺”。进而,电子设备100可以启动检测装置来检测用户是否需要观看屏幕。当启动接近光传感器进行检测,电子设备100可以确定屏幕被遮挡。进一步的,电子设备100可以保持屏幕处于灭屏状态(即屏幕处于黑屏状态),在后台运行语音助手,与用户进 行语音交互。其中,当识别到语音指令“我要给张三发短信”,电子设备100可以执行该语音指令对应的操作。例如,电子设备100可以调用通讯录应用查看是否存在名称为“张三”的联系人。若确定存在该联系人,电子设备100可以通过扬声器语音提示“好的,请说短信内容”,并调用短信应用为用户提供发送短信的服务
这样,当屏幕被遮挡,例如放置于口袋中、屏幕朝下放置于桌上,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互,从而节省电子设备的功耗,并且避免误触。
图3示例性示出了另一种电子设备保持屏幕处于灭屏状态与用户进行语音交互的场景示意图。
如图3所示,电子设备100屏幕朝上放置于桌上。电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启。
当用户在电子设备100附近说出“小艺小艺,开空调”,电子设备100可以识别到唤醒词“小艺小艺”。进而,电子设备100可以启动检测装置来检测用户是否需要观看屏幕。当启动接近光传感器进行检测,电子设备100可以确定屏幕未被遮挡。然后,电子设备100可以启动摄像头。根据摄像头采集的图像,电子设备100可以确定未检测到人脸。进一步的,电子设备可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。其中,当识别到语音指令“开空调”,电子设备100可以执行该语音指令对应的操作。例如,电子设备100可以在保持屏幕处于灭屏状态时,调用控制智能家居设备的应用,开启空调。并且,电子设备100可以通过扬声器语音提示“好的,正在开启空调”,以对用户说出的语音指令进行回复。这样,用户可以知道电子设备100已启动语音助手来识别和执行语音指令。
在上述实施例中,当屏幕未被遮挡,电子设备100可以启动摄像头作进一步检测。这样可以对用户是否需要观看屏幕进行更准确地判断。在屏幕未被遮挡,且未检测到人脸的场景下,电子设备可以保持屏幕处于灭屏状态,与用户进行语音交互,节省电子设备的功耗。
不限于上述发送短信、控制智能家居设备(如空调、点灯、电视、音箱)的场景,电子设备还可以根据检测装置的检测结果,在检测到用户不需要观看屏幕的时,保持屏幕处于灭屏状态,通过语音的方式为用户提供播放音乐,拨打和接听电话、查询天气、导航等功能。
需要进行说明的是,在上述图2和图3所示的实施例中,若识别到用户的语音指令为显示特定应用的用户界面,例如,语音指令为“查看图库”、“播放视频”等,电子设备100难以仅通过语音交互的方式为用户播报图库应用、视频应用等应用的用户界面。上述显示特定应用的用户界面的应用场景中,用户往往会观看电子设备的屏幕,即电子设备一般可以通过接近光传感器和摄像头来检测用户需要观看屏幕。若在上述应用场景中,电子设备通过接近光传感器检测到屏幕被遮挡,或者通过接近光传感器检测到屏幕未被遮挡,且通过摄像头未检测到人脸,电子设备100可以保持屏幕处于灭屏状态,并通过扬声器语音提示“已经为您找到,快来查看吧”。进一步的,当用户根据电子设备的语音提示查看屏幕,电子设备可以通过接近光传感器和摄像头检测到用户需要观看屏幕。进而电子设备100可以点亮屏幕,显示相关应用的用户界面。
这样,电子设备可以在检测到用户不需要观看屏幕时,保持屏幕处于灭屏状态,并与用户进行语音交互,从而节省电子设备的功耗,并且避免误触。
图4A和图4B示例性示出了一种电子设备点亮屏幕,通过图形界面和语音的方式与用户交互的场景示意图。
如图4A所示,用户手持电子设备100,且保持面部与电子设备100的屏幕相对。电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启。
当用户在电子设备100附近说出“小艺小艺,我要给张三发短信”,电子设备100可以识别到唤醒词“小艺小艺”。进而,电子设备100可以启动检测装置来检测用户是否需要观看屏幕。当启动接近光传感器进行检测,电子设备100可以检测到屏幕未被遮挡。然后电子设备100可以启动摄像头。根据摄像头采集的图像,电子设备100可以确定检测到人脸。
进一步的,电子设备100可以运行语音助手,并显示语音助手的用户界面,例如显示如图4A所示的语音转文本框202。该语音转文本框202可用于显示电子设备100识别出的语音指令“我要给张三发短信”。这样,用户可以比较电子设备100识别出的语音指令与自己说出的语音指令是否一致。
当识别出语音指令,电子设备100可以执行该语音指令对应的操作。示例性的,响应于语音指令“我要给张三发短信”,电子设备100可以先调用通讯录应用查看是否存在名称为“张三”的联系人。若确定存在该联系人,电子设备100可以通过文本显示和语音播报的方式提示用户说出短信内容。示例性的,电子设备100可以显示如图4B所示的用户界面,并通过扬声器语音提示“好的,请说短信内容”,来提示用户说出短信内容。
其中,图4B所示的用户界面可包括文本提示框203。该文本提示框203中的内容可以与电子设备100语音提示的内容相同,如“好的,请说短信内容”。
在上述实施例中,在屏幕未被遮挡,且检测到人脸的场景下,电子设备可以点亮屏幕,显示语音助手的用户界面,或者,当识别到语音指令中涉及显示第三应用的用户界面,电子设备可以调用第三应用,显示第三应用的用户界面。电子设备还可以结合语音播报的方式与用户交互。电子设备可以智慧决策是否点亮屏幕。当检测到用户需要观看屏幕,电子设备可以通过图形界面和语音的方式与用户交互,给用户良好的使用体验。
本申请实施例对电子设备语音播报的内容、文本提示的内容均不作限定。
在一些实施例中,电子设备100可以只利用接近光传感器来检测用户是否需要观看屏幕。
具体的,电子设备100的屏幕处于灭屏状态。响应于用于启动语音助手的第一操作,电子设备100可以启动接近光传感器。示例性的,若第一操作为用户说出唤醒词,电子设备100中的语音唤醒模块可以获取并处理麦克风采集的语音输入。当确定该语音输入中包含预设的唤醒词,电子设备100可以启动接近光传感器。
若根据接近光传感器的检测结果确定屏幕被遮挡,电子设备100可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。
若根据接近光传感器的检测结果确定屏幕未被遮挡,电子设备100可以点亮屏幕,运行语音助手。其中,电子设备100可以显示语音助手的用户界面,并与用户进行语音交互。这样,电子设备100可以通过图形界面与语音的方式与用户交互。
可选的,接近光传感器可以实时处于工作状态。若电子设备100通过接近光传感器在接收到第一操作之前的预设时间内确定屏幕未被遮挡,电子设备100可以在接收到第一操作后,启动语音助手。其中,电子设备100可以点亮屏幕,结合图形界面和语音的方式与用户进行 交互。上述接收到第一操作之前的预设时间可以为1秒、2秒,本申请实施例对此不作限定。
在一些实施例中,电子设备100可以只利用摄像头来检测用户是否需要观看屏幕。
具体的,电子设备100的屏幕处于灭屏状态。响应于用于启动语音助手的第一操作,电子设备100可以启动摄像头。示例性的,若第一操作为用户说出唤醒词,电子设备100中的语音唤醒模块可以获取并处理麦克风采集的语音输入。当确定语音输入中包含预设的唤醒词,电子设备100可以启动摄像头。
当确定摄像头在预设时间段内采集的多帧图像中存在不包含人脸的图像,电子设备100可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。
当确定摄像头在预设时间段内采集的多帧图像中均包含人脸,电子设备100可以点亮屏幕,运行语音助手。其中,电子设备100可以显示语音助手的用户界面,并与用户进行语音交互。这样,电子设备100可以通过图形界面与语音的方式与用户交互。
可选的,摄像头可以实时处于工作状态。若电子设备100通过摄像头在接收到第一操作之前的预设时间内检测到人脸,电子设备100可以在接收到第一操作后,启动语音助手。其中,电子设备100可以点亮屏幕,结合图形界面和语音的方式与用户进行交互。上述接收到第一操作之前的预设时间可以为1秒、2秒,本申请实施例对此不作限定。
在一些实施例中,电子设备100可以只利用运动传感器来检测用户是否需要观看屏幕。
具体的,电子设备100的屏幕处于灭屏状态。响应于用于启动语音助手的第一操作,电子设备100可以启动运动传感器。示例性的,若第一操作为用户说出唤醒词,电子设备100中的语音唤醒模块可以获取并处理麦克风采集的语音输入。当确定语音输入中包含预设的唤醒词,电子设备100可以启动运动传感器。该运动传感器可以包括加速度传感器、陀螺仪传感器。该运动传感器可用于检测电子设备100的姿态变化。不限于加速度传感器、陀螺仪传感器,该运动传感器还可以为其它类型可用于检测电子设备100的姿态变化的传感器。
当电子设备100根据运动传感器未检测到抬手动作,电子设备100可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。其中,电子设备100检测到抬手动作,电子设备的姿态变化可以为:电子设备100在屏幕朝上时从水平放置的姿态变化为倾斜或竖直放置的姿态。
当电子设备100根据运动传感器检测到抬手动作,电子设备100可以点亮屏幕,运行语音助手。其中,电子设备100可以显示语音助手的用户界面,并与用户进行语音交互。这样,电子设备100可以通过图形界面与语音的方式与用户交互。
图5A、图5B、图6A和图6B示例性示出了电子设备100在运行语音助手时,根据运动传感器智慧决策是否点亮屏幕的场景示意图。
如图5A所示,电子设备100的屏幕处于灭屏状态。响应于第一操作(如用户说出唤醒词“小艺小艺”),电子设备100可以启动运动传感器。当未检测到抬手动作,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。例如,响应于用户询问当日天气的语音指令“今天天气怎么样”,电子设备100可以搜索天气,并通过扬声器语音播报当日天气“纽约今天发布雷电黄色预警,全天有雷阵雨…”。
在上述电子设备100语音播报天气的过程中,若检测到抬手动作,电子设备100可以点亮屏幕,显示如图5B所示的用户界面210,并继续语音播报天气。用户界面210中可包括文 本提示框211。该文本提示框211可用于通过图标和文本的方式显示位置、日期以及天气等数据。
不限于上述抬手动作,当根据运动传感器检测到翻转动作、掏出口袋动作,电子设备100可以点亮屏幕,启动并运行语音助手,通过图形界面和语音的方式与用户交互。
可选的,运动传感器可以实时处于工作状态。若电子设备100通过运动传感器在接收到第一操作之前的预设时间内检测到抬手动作,电子设备100可以在接收到第一操作后,启动语音助手。其中,电子设备100可以点亮屏幕,结合图形界面和语音的方式与用户进行交互。上述接收到第一操作之前的预设时间可以为1秒、2秒,本申请实施例对此不作限定。
示例性的,如图6A所示,电子设备100的屏幕处于灭屏状态。电子设备100可以通过运动传感器检测到抬手动作。电子设备100的屏幕仍然保持灭屏状态。如图6B所示,在检测到抬手动作的预设时间内(如1秒、2秒),若检测到唤醒词“小艺小艺”,电子设备100可以点亮屏幕,执行用户的语音指令“今天天气怎么样”。其中,电子设备100可以显示如图6B所示的用户界面210,并通过扬声器语音播报当日天气“纽约今天发布雷电黄色预警,全天有雷阵雨…”。
也即是说,用户可以先拿起手机并做抬手动作。若用户在做抬手动作之后的预设时间内,例如1秒或者2秒内,用户说出唤醒词,电子设备100可以启动语音助手,点亮屏幕,结合图形界面和语音的方式与用户交互。
可选的,运动传感器可以实时处于工作状态。若电子设备100在接收到第一操作的同时检测到抬手动作,电子设备100可以在接收到第一操作后,启动语音助手。其中,电子设备100可以点亮屏幕,结合图形界面和语音的方式与用户进行交互。
也即是说,若用户一边拿起手机做抬手动作,一边说出唤醒词,电子设备100可以启动语音助手,点亮屏幕,结合图形界面和语音的方式与用户交互。
在一些实施例中,电子设备100可以结合接近光传感器和运动传感器来检测用户是否需要观看屏幕。
具体的,电子设备100的屏幕处于灭屏状态。响应于用于启动语音助手的第一操作,电子设备100可以先启动接近光传感器。示例性的,若第一操作为用户说出唤醒词,电子设备100中的语音唤醒模块可以获取并处理麦克风采集的语音输入。当确定语音输入中包含预设的唤醒词,电子设备100可以启动接近光传感器。
电子设备100可以利用接近光传感器来检测屏幕是否被遮挡。若确定屏幕被遮挡,电子设备100可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。
若确定屏幕未被遮挡,电子设备100可以启动运动传感器。电子设备100可以根据运动传感器检测电子设备100的姿态变化。例如,当检测到抬手动作,电子设备100可以点亮屏幕,运行语音助手,通过图形界面和语音的方式与用户交互。
下面对电子设备100利用检测装置(如接近光传感器、摄像头、运动传感器)进行检测的时间进行说明。
在一些实施例中,检测装置可以从电子设备100接收到第一操作开始,持续检测至该次语音交互结束。
上述语音交互结束可以表示语音助手停止运行,需要用户再次进行前述实施例中提及的 第一操作,来启动语音助手。例如,在语音指令为发送短信的应用场景中。当调用短信应用发送完短信,电子设备100可以停止运行语音助手。或者,在语音指令为查询天气的应用场景中。当播报完天气,电子设备100可以停止运行语音助手。当上述语音交互结束,响应于上述第一操作,电子设备100可以再次启动并运行语音助手。
图7A~图7E示例性示出了在从电子设备100识别到语音助手的唤醒词至该次语音交互结束这一过程中,检测装置持续进行检测的场景示意图。
如图7A所示,电子设备100屏幕朝上放置于桌上。电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启。
当用户在电子设备100附近说出“小艺小艺,给我讲白雪公主的故事”,电子设备100中的语音唤醒模块可以识别到唤醒词“小艺小艺”。进而,电子设备100可以启动检测装置来检测用户是否需要观看屏幕。
具体的,当启动接近光传感器进行检测,电子设备100可以检测到屏幕未被遮挡。然后,电子设备100可以启动摄像头。根据摄像头采集的图像,电子设备100可以确定未检测到人脸。进一步的,电子设备可以在后台运行语音助手,与用户进行语音交互。其中,电子设备100可以从麦克风采集的语音输入中识别语音指令“给我讲白雪公主的故事”,并执行该语音指令对应的操作。例如,电子设备100可以在保持屏幕处于灭屏状态时,调用浏览器应用搜索“白雪公主”的故事,并通过扬声器语音播报该故事“很久很久以前,有一个王后在冬季生下一个女孩…”。
在上述实施例中,从识别到唤醒词至该次语音交互结束的这一过程中,电子设备100可以持续利用检测装置来检测用户是否需要观看屏幕,并根据判断结果智慧决策是否点亮屏幕。
在一种可能的实现方式中,接近光传感器和摄像头处于关闭状态。当识别到上述唤醒词,电子设备100可以先启动接近光传感器,来检测屏幕是否被遮挡。
若在第一时刻确定屏幕未被遮挡,电子设备100可以在上述第一时刻关闭接近光传感器,并启动摄像头。电子设备100可以先利用接近光传感器进行检测,在确定屏幕未被遮挡后再开启摄像头进行检测。即摄像头在屏幕被遮挡时可以处于关闭状态。电子设备100可以在该次语音交互结束时,关闭摄像头。
电子设备100可以根据摄像头来检测是否有人脸。在未检测到人脸时,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。当检测到人脸,电子设备100可以点亮屏幕,在屏幕上显示对应的用户界面。这样,电子设备100可以通过图形界面和语音的方式与用户交互。
由上述分析可知,接近光传感器的工作时间可从识别到上述唤醒词开始至上述确定屏幕未被遮挡的第一时刻结束。摄像头的工作时间可从上述第一时刻开始至该次语音交互结束时结束。若屏幕一直处于被遮挡的状态,电子设备100可以只开启接近光传感器进行检测,从而节省功耗。
如图7A所示,电子设备100屏幕朝上放置于桌上。电子设备100根据接近光传感器可以确定屏幕未被遮挡。然后电子设备100可以开启摄像头进行检测。
如图7B所示,用户朝电子设备100走去,并拿起电子设备100。用户的面部与电子设备100的屏幕相对。电子设备100可以根据摄像头采集的图像检测到人脸。电子设备100可以点亮屏幕,显示如图7B所示的用户界面。该用户界面可包括文本提示框204。该文本提示框204可用于显示电子设备100根据识别到的语音指令所搜索到的结果。例如,语音指令为“给我讲白雪公主的故事”,文本提示框204可显示电子设备100搜索到的“白雪公主”的故事“嘴 唇赤红如雪,头发黑如乌木一样漂亮…”。如图7C所示,用户放下并离开电子设备100。电子设备100放置于桌上。电子设备100的摄像头处于工作状态。当根据摄像头采集的图像确定检测不到人脸,例如在预设时间段内摄像头采集的多帧图像中存在不包含人脸的图像,电子设备100可以熄灭屏幕,通过语音的方式与用户交互。例如,电子设备100可以熄灭屏幕,继续语音播报白雪公主的故事。
当上述白雪公主的故事语音播报完成,电子设备100可以停止运行语音助手,并关闭摄像头。
本申请实施例对接近光传感器和摄像头的工作时间不作限定。例如,在识别到有唤醒词时,电子设备100可以开启接近光传感器和摄像头。在该次语音交互结束时,电子设备100可以关闭接近光传感器和摄像头。或者,从识别到唤醒词至该次语音交互结束的这一过程中,接近光传感器和摄像头可以交替工作。
在一种可能的实现方式中,在结合图形界面和语音的方式与用户交互时,响应于相关的用户操作,电子设备100可以停止或继续通过语音的方式与用户交互。
如图7D所示,当根据检测装置检测到用户需要观看屏幕,电子设备100可以显示语音助手的用户界面,并与用户进行语音交互。例如,电子设备100可以显示如图7D所示的文本提示框204。该文本提示框204可包含“白雪公主”的故事的文字内容,还可以包含上一页控件204A、下一页控件204B和停止语音播报控件204C。其中,上一页控件204A和下一页控件204B可用于控制显示在文本提示框204中的文字内容。例如,响应于作用在上一页控件204A的触摸操作,电子设备100可以显示如图7B所示的用户界面。图7D所示的文本提示框204中的内容可以为图7B所示的文本提示框204中内容的续接。停止语音播报控件204C可用于电子设备100停止与用户语音交互。例如,响应于作用在图7D所示的停止语音播报控件204C的触摸操作,电子设备100可以停止语音播报“白雪公主”的故事。
另外,如图7E所示,响应于作用在停止语音播报控件204C的触摸操作,电子设备100可以将停止语音播报控件204C切换为继续语音播报控件204D。该继续语音播报控件204D可用于电子设备100继续与用户进行语音交互。例如,响应于作用在继续语音播报控件204D的触摸操作,电子设备100可以从停止语音播报时播报的内容处继续语音播报。或者,电子设备100可以语音播报当前显示在文本提示框204中的内容。
上述文本提示框204还可以包含更多或更少的控件,本申请实施例对此不作限定。
可选的,在上述图7A~图7E所示的实施例中,电子设备100可以利用接近光传感器和运动传感器来检测用户是否需要观看屏幕,进而确定是否点亮屏幕。
其中,当识别到唤醒词,电子设备100可以先开启接近光传感器。当确定屏幕被遮挡,电子设备100可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。并且,接近光传感器可以持续工作,以检测屏幕是否被遮挡。若在第一时刻,电子设备100根据接近光传感器确定屏幕未被遮挡,电子设备100可以关闭接近光传感器,并开启运动传感器。在未检测到抬手动作时,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。当根据运动传感器检测到抬手动作,电子设备100可以点亮屏幕,通过图形界面和语音的方式与用户交互。即接近光传感器的工作时间可以从识别到唤醒词开始,至上述第一时刻结束。运动传感器的工作时间可以从上述第一时刻开始,至该次语音交互结束时结束。这样,根据屏幕是否被遮挡以及电子设备100的姿态变化,例如电子设备100是否检测到抬手动作、翻转动作、掏出口袋动作,电子设备100智慧决策是否点亮屏幕,从而节省电子设备的功耗, 避免误触。并且,电子设备100可以在检测到用户需要观看屏幕时点亮屏幕,不影响用户查看相关的用户界面。
可选的,在上述图7A~图7E所示的实施例中,电子设备100可以只利用接近光传感器作为检测装置来检测用户是否需要观看屏幕,进而确定是否点亮屏幕。即接近光传感器的工作时间可以从识别到唤醒词开始,在该次语音交互结束时结束。其中,当确定屏幕未被遮挡时,电子设备100可以点亮屏幕,运行语音助手,通过图形界面和语音的方式与用户交互。当确定屏幕被遮挡时,电子设备100可以熄灭屏幕,与用户进行语音交互。这样,根据屏幕是否被遮挡,电子设备100的屏幕可以在灭屏状态和亮屏状态之间切换,不仅不影响用户在需要时观看屏幕,查看相关的用户界面,还可以节省电子设备的功耗,并避免误触。
可选的,在上述图7A~图7E所示的实施例中,电子设备100可以只利用摄像头作为检测装置来检测用户是否需要观看屏幕,进而确定是否点亮屏幕。即摄像头的工作时间可以从识别到唤醒词开始,在该次语音交互结束时结束。其中,当检测到人脸,电子设备100可以点亮屏幕,运行语音助手,通过图形界面和语音的方式与用户交互。当未检测到人脸,电子设备100可以熄灭屏幕,与用户进行语音交互。这样,根据是否能检测到人脸,电子设备100的屏幕可以在灭屏状态和亮屏状态之间切换,不仅不影响用户在需要时观看屏幕,查看相关的用户界面,还可以节省电子设备的功耗,并避免误触。
可选的,在上述图7A~图7E所示的实施例中,电子设备100可以只利用运动传感器作为检测装置来检测用户是否需要观看屏幕,进而确定是否点亮屏幕。即运动传感器的工作时间可以从识别到唤醒词开始,在该次语音交互结束时结束。其中,在未检测到抬手动作时,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。当检测到抬手动作,电子设备100可以点亮屏幕,运行语音助手,通过图形界面和语音的方式与用户交互。这样,根据电子设备100的姿态变化,例如电子设备100是否检测到抬手动作、翻转动作、掏出口袋动作,电子设备100可以智慧决策是否点亮屏幕,从而节省电子设备的功耗,避免误触。并且,电子设备100可以在检测到用户需要观看屏幕时点亮屏幕,不影响用户查看相关的用户界面。
从识别到有唤醒词至该次语音交互结束的这一过程中,电子设备100可以持续利用检测装置来检测用户是否需要观看屏幕。这样,在检测到用户不需要观看屏幕时,电子设备可以在屏幕处于灭屏状态下,通过语音的方式与用户交互。在检测到用户需要观看屏幕时,点亮屏幕,通过图形界面和语音的方式与用户交互。这样,电子设备可以智慧决策是否点亮屏幕。屏幕可以在灭屏状态与屏幕亮屏状态之间进行切换。在用户不需要观看屏幕的场景中,电子设备的屏幕处于灭屏状态,可以节省电子设备的功耗,并且避免误触。在用户需要观看屏幕的场景中,电子设备可以显示对应的用户界面,不会影响用户的体验。
在一些实施例中,检测装置可以从电子设备100识别到语音助手的唤醒词开始进行检测,并在点亮屏幕后结束检测。
仍以图7A~图7C所示的实施例进行说明。
如图7A所示,电子设备100屏幕朝上放置于桌上。电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启。当识别到唤醒词,电子设备100可以先开启接近光传感器进行检测,并在检测到屏幕未被遮挡时,关闭接近光传感器,开启摄像头进行检测。当利用摄像头确定未检测到人脸,电子设备100可以保持屏幕处于灭屏状态,在后台运行语音助手,通过语音的方式和用户交互。
如图7B所示,用户的面部与电子设备100的屏幕相对。电子设备100可以根据摄像头 采集的图像检测到人脸。电子设备100可以点亮屏幕,显示如图7B所示的用户界面,通过图形界面和语音的方式与用户交互。另外,电子设备100可以关闭摄像头。
即当检测到用户需要观看屏幕,电子设备100可以关闭检测装置,不再检测后续阶段用户是否需要观看屏幕。那么在电子设备100点亮屏幕,显示如图7B所示的用户界面之后,在该次语音交互结束之前,若用户的面部与电子设备100的屏幕不再相对,例如,如图7C所示,用户放下并离开电子设备100,电子设备100的屏幕仍可以保持亮屏状态。
在一些实施例中,检测装置可以从电子设备100识别到语音助手的唤醒词开始进行检测,并在与用户完成一轮语音交互后结束检测。
上述与用户完成一轮语音交互可以为用户说出一条语音指令,电子设备100运行语音助手,对上述用户说出的一条语音指令进行回复。例如,如图2所示用户说出语音指令“我要给张三发短信”和电子设备100可以回复语音提示“好的,请说短信内容”,即为一轮语音交互。如图3所示用户说出语音指令“开空调”和电子设备100回复“好的,正在开启空调”,即为一轮语音交互。
以图2所示的实施例进行说明。
如图2所示,电子设备100放置于口袋中。电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启。从识别到唤醒词“小艺小艺”至电子设备100运行语音助手回复“好的,请说短信内容”这一过程中,电子设备100可以利用检测装置,如接近光传感器和摄像头来检测用户是否需要观看屏幕。当检测到用户不需要观看屏幕,电子设备100可以保持屏幕处于灭屏状态,通过语音的方式与用户交互。当检测到用户需要观看屏幕,电子设备100可以点亮屏幕,通过图形界面和语音的方式与用户交互。即屏幕可以从灭屏状态切换为亮屏状态。
当完成上述一轮语音交互,电子设备100可以关闭检测装置。电子设备100的屏幕为灭屏状态还是为亮屏状态可以由电子设备100关闭检测装置时,屏幕的状态的来确定。当电子设备100关闭检测装置时,屏幕的状态为灭屏状态,电子设备100可以保持屏幕处于灭屏状态,在该次语音交互过程中的后续阶段通过语音的方式与用户交互。当电子设备100关闭检测装置时,屏幕的状态为亮屏状态,电子设备100可以保持屏幕为亮屏状态,在该次语音交互过程中的后续阶段通过图形界面和语音的方式与用户交互。
电子设备100利用检测装置来检测用户是否需要观看屏幕的方法可以参考前述实施例的说明,这里不再赘述。
在另一些实施例中,检测装置可以从电子设备100识别到语音助手的唤醒词开始进行检测,并在与用户完成N轮语音交互后结束检测。N可以为大于1的整数。
本申请实施例对电子设备利用检测装置检测用户是否需要观看屏幕的检测时间不作限定。
在一些实施例中,电子设备100可以结合检测装置的检测结果和分析接收到的语音指令中是否包含特定关键词的分析结果,来检测用户是否需要观看屏幕。
上述特定关键词可以包括第一类关键词和第二类关键词。其中,第一类关键词可以为涉及特定类别的应用的关键词,这些特定类别的应用一般通过用户界面与用户交互。例如,视频类的应用:华为视频、爱奇艺等,购物类的应用:淘宝、京东等,导航类的应用:百度地图、Google Maps等。第二类关键词可以为涉及特定动作的关键词,这些特定动作可以为指 示用户需要观看屏幕的动作。例如:查看、显示等。
在一些应用场景中,电子设备100在识别到唤醒词后,根据检测装置的检测结果检测到用户不需要观看屏幕,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。但接收到的语音指令中涉及显示用户界面,且电子设备100无法通过语音播报的形式为用户描述需要显示的用户界面时,例如,语音指令为“查看图库”、“播放视频”,电子设备100可以点亮屏幕,显示语音指令中涉及的用户界面。
在利用检测装置的检测结果检测到用户不需要观看屏幕的基础上,电子设备100可以进一步识别语音指令中是否包含上述第一类关键词和/或上述第二类关键词,来确定是否点亮屏幕。
在一种可能的实现方式中,电子设备100可以先识别语音指令中是否包含第一类关键词。若确定语音指令中包含第一类关键词,电子设备100可以点亮屏幕,显示语音指令中涉及的用户界面。这样,电子设备100可以通过图形界面和语音的方式与用户交互。若确定语音指令中不包含第一类关键词,电子设备100可以再识别语音指令中是否包含第二类关键词。若确定语音指令中包含第二类关键词,电子设备100可以点亮屏幕,显示语音指令中涉及的用户界面。这样,电子设备100可以通过图形界面和语音的方式与用户交互。若确定语音指令中不包含上述第一类关键词和上述第二类关键词,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。
在另一种可能的实现方式中,电子设备100也可以先识别语音指令中是否包含上述第二类关键词。若确定语音指令中不包含第二类关键词,电子设备100可以再识别语音指令中是否包含上述第一类关键词,来检测用户是否需要观看屏幕,进而智慧决策是否点亮屏幕。
电子设备100识别语音指令的方法可以参考前述实施例,这里不再赘述。
图8A~图8D示例性示出了电子设备100结合检测装置的检测结果和分析接收到的语音指令中是否包含特定关键词的分析结果,来检测用户是否需要观看屏幕的实施例。
如图8A所示,电子设备100屏幕朝上放置于桌上。电子设备100的屏幕处于灭屏状态,且语音唤醒功能开启。用户在电子设备100附近说出唤醒词“小艺小艺”。电子设备100中的麦克风可以采集到电子设备100附近的语音输入。电子设备100中的语音唤醒模块可以获取麦克风采集的语音输入,并识别到该语音输入中包含唤醒词。然后,电子设备100可以启动检测装置来检测用户是否需要观看屏幕。电子设备100可以利用接近光传感器、摄像头和运动传感器中的一个或多个来检测用户是否需要观看屏幕,具体的检测方法可以参考前述实施例,这里不再赘述。
当根据检测装置检测到用户不需要观看屏幕,电子设备100可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。例如,电子设备100可以保持屏幕处于灭屏状态,通过扬声器语音提示“嗨,我正在听”,来提示用户说出语音指令。
如图8B所示,在识别到唤醒词后,电子设备100中的麦克风可以采集电子设备100附近的语音输入。电子设备100可以从该语音输入中识别出语音指令。示例性的,用户在电子设备100附近说出语音指令“查看图库”。电子设备100可以识别出该语音指令中包含第一类关键词“图库”。电子设备100可以执行该语音指令。具体的,电子设备100可以调用图库应用,显示如图8B所示的图库应用的用户界面。另外,电子设备100还可以通过扬声器语音提示“已打开图库,快来查看吧”。
如图8C所示,在识别到唤醒词后,电子设备100中的麦克风可以采集电子设备100附近的语音输入。电子设备100可以从该语音输入中识别出语音指令。示例性的,用户在电子 设备100附近说出语音指令“我要看视频A”。电子设备100可以先识别该语音指令中是否包含第一类关键词。该语音指令中不包含第一类关键词。当确定该语音指令中不包含第一类关键词,电子设备100可以再识别该语音指令中是否包含第二类关键词。该语音指令中包含第二类关键词“看”。当确定该语音指令中包含第二类关键词,电子设备100可以执行该语音指令。具体的,电子设备100可以调用华为视频应用,显示如图8C所示的华为视频应用的用户界面。该用户界面中可包含语音指令中指示的视频A。另外,电子设备100还可以通过扬声器语音提示“以为您打开,快来查看吧”。
由上述实施例可知,在根据检测装置检测到用户不需要观看屏幕的情况下,电子设备100可以先保持屏幕处于灭屏状态,与用户进行语音交互。当接收到语音指令,电子设备100可以进一步根据语音指令中是否包含有第一类关键词和/或第二类关键词,来检测用户是否需要观看屏幕。在一些用户想要观看屏幕但还没有观看屏幕的场景中,例如,电子设备100放置于桌上,用户一边说出唤醒词以及语音指令“查看图库”,一边走向电子设备100准备观看屏幕,电子设备100可以根据语音指令中包含有第一类关键词和/或第二类关键词,来点亮屏幕,显示语音指令中涉及的用户界面。这样,电子设备100可以更加准确地检测用户是否需要观看屏幕。
在一些实施例中,电子设备100的屏幕处于灭屏状态,且处于锁屏状态。电子设备100显示语音指令中涉及的用户界面之前,提示用户对电子设备100解锁。
如图8A和图8D所示,用户在电子设备100附近说出唤醒词“小艺小艺”和语音指令“查看图库”。在识别到唤醒词后,电子设备100可以根据检测装置检测到用户不需要观看屏幕。电子设备100中的麦克风可以采集电子设备100附近的语音输入。电子设备100可以从该语音输入中识别出语音指令。该语音指令中包含第一类关键词。电子设备100可以显示如图8D所示的解锁界面,并通过扬声器语音提示“请先帮我解锁”,来提示用户对电子设备100解锁。例如,用户可以在如图8D所示的解锁界面输入解锁密码。电子设备100可以接收该解锁密码,并将该解锁密码与已存储的解锁密码进行匹配。若接收到的解锁密码与已存储的解锁密码匹配,电子设备100可以调用图库应用,显示如图8B所示的图库应用的用户界面。
本申请实施例对上述解锁的方式不作限定。例如,电子设备100还可以根据接收到的语音输入的声纹特征进行解锁。或者,电子设备100可以根据人脸识别进行解锁。上述解锁的方式可以参考现有技术中的实现方式。
在一些实施例中,电子设备100可以仅分析接收到的语音指令中是否包含特定关键词,来判断用户是否需要观看屏幕。
示例性的,当确定接收到的语音指令中包含上述第一类关键词和/或上述第二类关键词,电子设备100可以点亮屏幕,通过图形界面和语音的方式与用户交互。当确定接收到的语音指令不包含上述第一类关键词和上述第二类关键词,电子设备100可以保持屏幕为灭屏状态,仅通过语音的方式与用户交互。
电子设备100根据语音指令中是否包含特定关键词来判断用户是否需要观看屏幕的实现方式可以参考前述实施例,这里不再赘述。
在一些实施例中,电子设备100可以为智能电视等大屏设备,或者为有屏幕的智能音箱。用户可以在使用这些设备时,不观看这些设备的屏幕。例如,用户利用智能电视或者有屏幕 的智能音箱播放音乐、控制智能家居设备时,可以不需要观看屏幕。
目前,用户在使用电子设备100时,例如通过遥控器或语音指令让电子设备100播放音乐,电子设备100会点亮屏幕。为了在电子设备100的屏幕保持熄灭的状态下使用电子设备100,用户需通过遥控器或者语音指令让电子设备100的屏幕熄灭。
上述控制电子设备100的屏幕熄灭的用户操作复杂,电子设备100不能在屏幕处于灭屏状态时,根据用户是否需要观看屏幕来智慧决策是否点亮屏幕。
下面介绍本申请实施例提供的另一种语音交互方法。
其中,该语音交互方法可以参考图13所示的方法流程图。
屏幕处于灭屏状态的电子设备100可以响应第一操作,启动语音助手以及检测装置。其中,电子设备100可以利用摄像头作为检测装置来检测用户是否需要观看屏幕,进而智慧决策是否点亮屏幕。
上述第一操作可以为作用于电子设备100上的物理按键的用户操作,或者,为作用于用于控制电子设备100的遥控器上的按键的用户操作。例如,电子设备100为智能电视。第一操作可以为作用于智能电视上的电源键的用户操作,或者,为作用于智能电视的遥控器上的开/关机键的用户操作。
若电子设备100开启语音唤醒功能,上述第一操作还可以为用户说出预设的唤醒词(如“小艺小艺”)。
下面具体以第一操作为用户说出预设的唤醒词进行说明。
电子设备100的屏幕处于灭屏状态。电子设备100的语音唤醒功能开启。当从麦克风采集的语音输入中识别到唤醒词,电子设备100可以启动摄像头,来判断是否检测到人脸。
其中,若确定摄像头在预设时间段内采集的多帧图像中存在不包含人脸的图像,电子设备100可以确定未检测到人脸。电子设备100未检测到人脸,可以表示用户的面部未与电子设备100的屏幕相对,即用户不需要观看屏幕。
当确定未检测到人脸,电子设备100可以保持屏幕处于灭屏状态,与用户进行语音交互。例如,当接收到“播放音乐”的语音指令,电子设备100可以保持屏幕处于灭屏状态,播放音乐。上述语音指令还可以例如是:拨打电话、发送短信、播放音乐、控制智能家居设备。
若确定摄像头在预设时间段内采集的多帧图像中均包含人脸,电子设备100可以确定检测到人脸。对于智能电视、有屏幕的智能音箱等设备,用户的面部与屏幕相对不一定表示用户需要观看屏幕。例如,在用户坐在智能电视前面,通过唤醒词启动智能电视的语音助手,来播放音乐的场景中,用户可以不观看智能电视的屏幕。
当确定检测到人脸,电子设备100可以判断摄像头采集的图像中是否包含第一手势,进而检测用户是否需要观看屏幕。
上述第一手势可用于指示用户不需要观看屏幕。例如,用户坐在智能电视前面,在启动智能电视的语音助手且不需要观看屏幕时,可以在说出唤醒词时做第一手势。
上述第一手势可以为握拳的手势、张开手掌的手势等等。本申请实施例对上述第一手势不作限定。
可以理解的是,检测到人脸不一定表示用户需要观看屏幕。但在检测到人脸的条件下,电子设备100还可以在摄像头采集的图像中识别出第一手势,可以表示用户不需要观看屏幕。
当检测到人脸,且检测到第一手势,电子设备100可以保持屏幕处于灭屏状态,与用户 进行语音交互。
当检测到人脸,且未检测到第一手势,电子设备100可以点亮屏幕,显示图形界面,并与用户进行语音交互。这样,电子设备100可以结合图形界面和语音的方式与用户进行交互。
上述图形界面可以是语音助手的用户界面。或者,当语音指令中涉及显示第三应用的用户界面,例如,语音指令为“播放视频”,上述图形界面可以是第三应用的用户界面。
由上述语音交互的方法可知,电子设备可以根据摄像头采集的图像来判断是否检测到人脸,以及是否检测到第一手势,进而检测用户是否需要观看屏幕。电子设备100可以根据摄像头是否需要观看屏幕,来确定在灭屏状态下启动语音助手时是否点亮屏幕。当检测到用户不需要观看屏幕,电子设备可以保持屏幕处于灭屏状态,与用户进行语音交互。这样,在用户使用电子设备100且不观看电子设备100的屏幕的场景中,用户无需在电子设备100点亮屏幕后,再进行相应的操作来熄灭屏幕,从而简化了用户将电子设备100作为音箱使用的操作。
下面结合应用场景,具体介绍上述语音交互方法。
图9示例性示出了一种电子设备100保持屏幕处于灭屏状态与用户进行语音交互的场景示意图。
如图9所示,电子设备100可以为智能电视。智能电视的屏幕处于灭屏状态,且语音唤醒功能开启。智能电视的麦克风可实时采集智能电视附近的语音输入,并发送给语音唤醒模块,由语音唤醒模块识别该语音输入中是否包含有预设的唤醒词。这样,用户可以通过说出预设的唤醒词来启动语音助手。
当用户在智能电视附近说出“小艺小艺,我要听歌”,智能电视中的语音唤醒模块可以识别到唤醒词“小艺小艺”。进而,智能电视可以启动检测装置来检测用户是否需要观看屏幕。如图9所示,用户的脸部未与智能电视的屏幕相对,或者用户的脸部在智能电视的前面闪过。当启动摄像头进行检测,智能电视可以确定摄像头在预设时间段内采集的多帧图像中存在不包含人脸的图像。智能电视可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。其中,智能电视可以识别语音指令“我要听歌”,并执行该语音指令对应的操作。例如,智能电视可以在保持屏幕处于灭屏状态时,调用音乐应用播放音乐。
在本申请提供的一些实施例中,上述摄像头可以是低功耗摄像头。例如,红外摄像头。上述摄像头可以实时处于工作状态。当检测到第一操作,电子设备100可以获取上述摄像头采集的数据,来确定用户是否需要观看屏幕。进而,电子设备100可以确定在启动语音助手时是否点亮屏幕。
图10示例性示出了另一种电子设备100保持屏幕处于灭屏状态与用户进行语音交互的场景示意图。
如图10所示,电子设备100可以为智能电视。智能电视的屏幕处于灭屏状态,且语音唤醒功能开启。智能电视的麦克风可实时采集智能电视附近的语音输入,并发送给语音唤醒模块,由语音唤醒模块识别该语音输入中是否包含有预设的唤醒词。这样,用户可以通过说出预设的唤醒词来启动语音助手。
当用户在智能电视附近说出“小艺小艺,我要听歌”,智能电视中的语音唤醒模块可以识别到唤醒词“小艺小艺”。进而,智能电视可以启动检测装置来检测用户是否需要观看屏幕。 如图10所示,用户的脸部与智能电视的屏幕相对,且用户做握拳手势。该握拳手势可以为前述实施例的第一手势,可用于指示用户不需要观看电子设备100的屏幕。当启动摄像头进行检测,智能电视可以确定摄像头在预设时间段内采集的多帧图像中均包含人脸,且摄像头采集的图像中包含握拳手势。智能电视可以保持屏幕处于灭屏状态,在后台运行语音助手,与用户进行语音交互。其中,智能电视可以识别语音指令“我要听歌”,并执行该语音指令对应的操作。例如,智能电视可以在保持屏幕处于灭屏状态时,调用音乐应用播放音乐。
在图9和图10所示的场景中,电子设备在灭屏状态下启动语音助手时,可以不直接点亮屏幕,而是先利用摄像头采集的图像来检测用户是否需要观看屏幕,进而确定是否点亮屏幕。在用户希望将电子设备100作为音箱使用的场景中,用户无需在电子设备100点亮屏幕后,再进行相应的操作来熄灭屏幕,这样可以简化用户操作。
下面结合本申请实施例提供的语音交互方法,介绍电子设备100的另一种结构示意图。
如图11所示,电子设备100可以包括:AP310、检测装置320、麦克风330、低功耗处理器340、扬声器350、显示屏360。其中,AP310中可包含语音助手370。语音助手370可包含语音指令识别模块311和语音指令执行模块312。检测装置320中可包含接近光传感器321、摄像头322、运动传感器323。
上述AP310可以是图1中的处理器110,或者是处理器110包括的多个处理器中的一个或多个处理器。上述麦克风330可以是图1中的麦克风170C。上述扬声器350可以是图1中的扬声器170A。上述显示屏360可以是图1中的显示屏194中的一个或多个。上述接近光传感器321可以是图1中的接近光传感器180G。上述摄像头322可以是图1中的摄像头193中的一个或多个。上述运动传感器323可以包括图1中的加速度传感器180E和陀螺仪传感器180B。
扬声器350、显示屏360和检测装置320可与AP310连接。麦克风330、接近光传感器321、摄像头322和运动传感器323均可通过低功耗处理器340与AP310连接。低功耗处理器340中可集成有语音唤醒模块,可用于在识别到唤醒词时唤醒AP310。
当电子设备100的屏幕处于灭屏状态,且语音唤醒功能开启,麦克风330和低功耗处理器340可以时时处于工作状态,接近光传感器321、摄像头322和运动传感器323中的一个或多个可以时时处于工作状态,AP310可以处于休眠状态,显示屏360熄灭。麦克风可以时时采集电子设备100附近的语音输入,并将该语音输入发送给低功耗处理器340。低功耗处理器340可用于识别该语音输入中是否包含预设的唤醒词(或称为唤醒指令,如“小艺小艺”)。当识别到预设的唤醒词,低功耗处理器340可以唤醒AP310。
在一种可能的实现方式中,在被低功耗处理器340唤醒后,AP310可以先通过低功耗处理器340获取检测装置320的检测结果,在得到检测结果后再启动语音助手370。上述启动语音助手370可以包括启动语音指令识别模块311和语音指令执行模块312。AP310可以根据检测结果利用扬声器350和/或显示屏360与用户交互。
若根据检测装置320检测到用户不需要观看屏幕,AP310中的语音指令执行模块311可执行识别到的语音指令,通过扬声器350为用户进行语音播报。其中,显示屏360保持灭屏状态。示例性的,麦克风330采集的语音输入中包含语音指令“查询当日天气”。语音指令识别模块311可以获取该语音输入,并识别其中的语音指令。语音指令执行模块312可执行该语音指令对应的操作。具体的,语音指令执行模块312可调用天气应用查询当日天气(如温度、空气质量),并通过扬声器将当日天气的查询的结果进行语音播报。
若根据检测装置320检测到用户需要观看屏幕,AP310中的语音指令识别模块311可执行识别到的语音指令,通过扬声器350为用户进行语音播报,以及通过显示屏360显示语音指令中涉及的用户界面。示例性的,麦克风330采集的语音输入中包含语音指令“查询当日天气”。语音指令识别模块311可以获取该语音输入,并识别其中的语音指令。语音指令执行模块312可执行该语音指令对应的操作。具体的,语音指令执行模块312可调用天气应用查询当日天气(如温度、空气质量),并通过扬声器将当日天气的查询的结果进行语音播报,以及显示如图5B所示的用户界面210。
在本申请提供的一些实施例中,上述摄像头322可以为低功耗摄像头,例如红外摄像头。
接近光传感器321、摄像头322和运动传感器中的一种或多种均可时时处于工作状态,并将采集的数据传输给低功耗处理器340。当低功耗处理器340识别到麦克风330接收的语音输入中包含有唤醒词,低功耗处理器340可以唤醒AP310。然后,AP310可以从低功耗处理器340获取检测装置320采集的数据,并确定用户是否观看显示屏360。
或者,接近光传感器321、摄像头322、运动传感器323可以与AP310连接。当低功耗处理器340识别到麦克风330接收的语音输入中包含有唤醒词,低功耗处理器340可以唤醒AP310。然后。AP310可以启动接近光传感器321、摄像头322、运动传感器323中的一种或多种。进一步的,AP310可以根据检测装置320采集的数据来确定用户是否观看显示屏360。
上述AP310通过检测装置320检测用户是否需要观看屏幕的方法可以参考前述实施例,这里不再赘述。
在另一种可能的实现方式中,响应于作用在预置的物理按键上用于启动语音助手的用户操作,屏幕处于灭屏状态的电子设备100可以唤醒AP310。上述预置的物理按键可以为电子设备100上的以下一种或多种按键:电源键、音量上键、音量下键。示例性的,上述用于启动语音助手的用户操作可以为作用在电源键上的长按操作,长按时间例如是1秒或2秒,本申请实施例对此不作限定。
也即是说,用户可以通过长按电源键来启动语音助手。
当被用于启动语音助手的用户操作唤醒,AP310可以按照前述实施例通过检测装置320进行检测,以及启动语音指令识别模块311和语音指令执行模块312来执行用户的语音指令,这里不再赘述。
不限于图11所示的部件,电子设备100还可以包含更多或更少的部件。
由图11所示的电子设备100可知,电子设备100在启动语音助手时,可以利用检测装置来检测用户是否需要观看屏幕,进而智慧决策是否点亮屏幕。若检测到用户不需要观看屏幕,电子设备100可以在后台运行语音助手,与用户进行语音交互。这样,可以节省电子设备的功耗,并且避免误触。若检测到用户需要观看屏幕,电子设备100可以点亮屏幕,结合图形界面和语音的方式与用户进行交互。
在本申请提供的一些实施例中,电子设备可以在屏幕处于灭屏状态下检测到用户的第一操作。该第一操作可用于启动语音助手。其中,上述第一操作可以为前述实施例中用户说出唤醒词(如“小艺小艺”)的用户操作。或者,可以为前述实施例中用户长按第一按键的用户操作。该第一按键可以包括以下一项或多项:电源键、音量上键、音量下键。
在本申请提供的一些实施例中,电子设备在第一情况下,在保持屏幕处于灭屏状态下启动语音助手,并使语音助手以第一方式与用户进行交互。其中,上述第一情况可以为用户未观看电子设备的屏幕的情况。示例性的,在用户将电子设备屏幕朝下放置于桌上或如前述图 2所示放置与口袋中时,电子设备的第一传感器可以检测到屏幕的预设距离内存在物体遮挡。从而,电子设备可以判断出用户未观看电子设备的屏幕。如图3所示,在用户将电子设备屏幕朝上放置于桌上,但并未将脸部与电子设备的屏幕相对时,电子设备的第一传感器可以检测到屏幕的预设距离内不存在物体,并通过第二传感器未检测到人脸。从而,电子设备可以判断出用户未观看电子设备的屏幕。上述第一方式可以为仅通过语音与用户交互。
在本申请提供的一些实施例中,电子设备可以在第二情况下,点亮屏幕,启动语音助手,并使语音助手以第二方式与用户进行交互。其中,上述第二方式包括通过图形界面与用户进行交互。上述第二情况可以为用户观看电子设备的屏幕的情况。示例性的,在用户将电子设备的屏幕朝上放置于桌上,且将脸部与电子设备的屏幕相对时,电子设备的第一传感器可以检测到屏幕的预设距离内不存在物体遮挡,并通过第二传感器检测到人脸。从而,电子设备可以判断出用户观看电子设备的屏幕。如图5B所示,在用户做抬手动作,例如将屏幕朝上的电子设备从水平放置的姿态变化为倾斜或竖直放置的姿态,姿态调整后的电子设备的屏幕可以与人脸相对,电子设备可以通过第三传感器检测到电子设备的姿态从第一姿态切换为第二姿态。从而,电子设备可以判断出用户观看电子设备的屏幕。上述第一姿态可以例如是电子设备屏幕朝上水平放置的姿态。上述第二姿态可以例如是屏幕朝上倾斜放置的姿态。
在本申请提供的一些实施例中,电子设备使语音助手以第一方式与用户进行交互具体可以为电子设备使语音助手仅运行第一程序。第一程序可以为用于与用户进行语音交互的程序。其中,当语音助手仅运行第一程序,电子设备可以仅通过语音的方式与用户进行交互。进一步的,当检测到用户需要观看屏幕,电子设备可以再运行第二程序。第二程序可以为用于得到与用户交互的图形界面的程序。也即是说,在语音助手与用户交互的过程中,电子设备可以在需要电量屏幕时,在使语音助手运行绘制图形界面的相关程序。
电子设备使语音助手以第一方式与用户进行交互具体还可以为电子设备使语音助手运行第二程序和第一程序。也即是说,当检测到用户不需要观看屏幕,电子设备仍可使语音助手运行绘制图形界面的相关程序。但电子设备并不点亮屏幕进行显示。进一步的,当检测到用户需要观看屏幕,电子设备可以直接将已运行第二程序得到的图形界面显示在屏幕上。这样,电子设备可以减少绘制图形界面的时延。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
Claims (23)
- 一种语音交互方法,其特征在于,包括:电子设备在屏幕处于灭屏状态下检测到用户的第一操作,所述第一操作用于启动语音助手;所述电子设备在第一情况下,在保持所述屏幕处于灭屏状态下启动所述语音助手,并使所述语音助手以第一方式与所述用户进行交互,所述第一方式为仅通过语音与所述用户进行交互;所述第一情况包括以下中的任一种:通过第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过第二传感器未检测到人脸;或,通过所述第二传感器未检测到人脸;或,通过所述第一传感器检测到所述屏幕的预设距离内存在物体遮挡。
- 根据权利要求1所述的方法,其特征在于,还包括:所述电子设备在第二情况下,点亮所述屏幕,启动所述语音助手,并使所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互;所述第二情况包括以下中的任一种:通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过所述第二传感器检测到人脸;或,通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态;或通过所述第二传感器检测到人脸;或,通过所述第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态。
- 根据权利要求1所述的方法,其特征在于,还包括:在所述语音助手以所述第一方式与所述用户进行交互的过程中,所述电子设备检测到第二情况,则所述电子设备点亮所述屏幕并使得所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互;其中,所述第二情况包括以下中的任一种:通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过所述第二传感器检测到人脸;或,通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态;或通过所述第二传感器检测到人脸;或,通过所述第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:在所述语音助手以所述第一方式与所述用户进行交互的过程中,所述电子设备接收到所述用户输入的第一语音,对所述第一语音进行识别;在所述电子设备识别出所述第一语音满足第一条件的情况下,所述电子设备点亮所述屏 幕,并使得所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互;其中,所述第一语音满足第一条件包括:所述第一语音中包括以下一项或多项:第一类关键词、第二类关键词,其中,所述第一类关键词包括以下一类或多类应用程序名称:视频类、购物类、导航类;所述第二类关键词包括以下一项或多项动词:查看、显示。
- 根据权利要求2-4中任一项所述的方法,其特征在于,还包括:在所述语音助手以所述第二方式与所述用户进行交互的过程中,所述电子设备检测到所述第一情况;所述电子设备熄灭所述屏幕,并使所述语音助手以所述第一方式与所述用户进行交互。
- 根据权利要求1-4中任一项所述的方法,其特征在于,包括:在检测到所述第一操作时,所述电子设备检测到所述第一情况;或者,在检测到所述第一操作后的第一时间,所述电子设备检测到所述第一情况;其中,所述第一时间与所述电子设备检测到所述第一操作的时间之间的间隔小于第一时长;或者,在检测到所述第一操作前的第二时间,所述电子设备检测到所述第一情况;其中,所述第二时间与所述电子设备检测到所述第一操作的时间之间的间隔小于第二时长。
- 根据权利要求2所述的方法,其特征在于,包括:在检测到所述第一操作时,所述电子设备检测到所述第二情况;或者,在检测到所述第一操作后的第一时间,所述电子设备检测到所述第二情况;其中,所述第一时间与所述电子设备检测到所述第一操作的时间之间的间隔小于第一时长;或者,在检测到所述第一操作前的第二时间,所述电子设备检测到所述第二情况;其中,所述第二时间与所述电子设备检测到所述第一操作的时间之间的间隔小于第二时长。
- 根据权利要求1-7中任一项所述的方法,其特征在于,所述使所述语音助手以第一方式与所述用户进行交互,具体包括:所述电子设备使所述语音助手仅运行第一程序;或,所述电子设备使所述语音助手运行第二程序和所述第一程序;其中,所述第一程序为用于与所述用户进行语音交互的程序,所述第二程序为用于得到与所述用户交互的图形界面的程序。
- 根据权利要求2-5或7中任一项所述的方法,其特征在于,所述第二方式为通过图形界面和语音与所述用户进行交互。
- 根据权利要求1-9中任一项所述的方法,其特征在于,所述电子设备在所述屏幕处于灭屏状态下检测到第一操作,具体包括:所述电子设备在所述屏幕处于灭屏状态下接收到所述用户输入的第二语音,所述第二语音包括用于启动所述语音助手的唤醒词。
- 根据权利要求1-9中任一项所述的方法,其特征在于,所述电子设备在所述屏幕处 于灭屏状态下检测到第一操作,具体包括:所述电子设备在所述屏幕处于灭屏状态下检测到作用于第一按键的长按操作,所述第一按键包括以下一项或多项:电源键、音量上键、音量下键。
- 根据权利要求2-11中任一项所述的方法,其特征在于,所述方法还包括以下中的一项或多项:所述第一传感器包括以下一项或多项:接近光传感器、红外光传感器、雷达传感器;所述第二传感器包括摄像头;所述第三传感器包括运动传感器;其中,所述运动传感器包括以下一项或多项:加速度传感器、陀螺仪传感器。
- 一种电子设备,其特征在于,包括:屏幕、输入装置、检测装置、至少一个处理器;所述检测装置包括以下一项或多项:第一传感器、第二传感器;其中:所述输入装置用于在所述屏幕处于灭屏状态下检测到用户的第一操作;所述第一操作用于启动语音助手;所述检测装置用于在所述输入装置检测到用户的第一操作的情况下,检测是否存在第一情况;其中,所述第一情况包括以下中的任一种:通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过第二传感器未检测到人脸;或,通过所述第二传感器未检测到人脸;或,通过所述第一传感器检测到所述屏幕的预设距离内存在物体遮挡;所述处理器用于在所述检测装置检测到所述第一情况时,在保持所述屏幕处于灭屏状态下启动所述语音助手,并使所述语音助手以第一方式与所述用户交互,所述第一方式为仅通过语音与所述用户进行交互。
- 根据权利要求13所述的电子设备,其特征在于,所述检测装置还包括第三传感器;所述检测装置还用于检测是否存在第二情况;其中,所述第二情况包括以下中的任一种:通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过所述第二传感器检测到人脸;或,通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过所述第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态;或通过所述第二传感器检测到人脸;或,通过所述第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态;所述处理器还用于在所述检测装置检测到所述第二情况时,点亮所述屏幕,启动所述语音助手,并使所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互。
- 根据权利要求13所述的电子设备,其特征在于,所述检测装置还包括第三传感器;所述检测装置还用于在所述语音助手以所述第一方式与所述用户进行交互的过程中,检测是否存在第二情况;其中,所述第二情况包括以下中的任一种:通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过所述第二传感 器检测到人脸;或,通过所述第一传感器检测到所述屏幕的预设距离内不存在物体遮挡且通过所述第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态;或通过所述第二传感器检测到人脸;或,通过所述第三传感器检测到所述电子设备的姿态从第一姿态切换到第二姿态;所述处理器还用于所述在所述检测装置检测到所述第二情况时,点亮所述屏幕并使得所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互。
- 根据权利要求13所述的电子设备,其特征在于,所述输入装置还用于在所述语音助手以所述第一方式与所述用户进行交互的过程中,接收所述用户输入的第一语音;所述处理器还用于对所述第一语音进行识别,并在识别出所述第一语音满足第一条件的情况下,点亮所述屏幕,使得所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互;其中,所述第一语音满足第一条件包括:所述第一语音中包括以下一项或多项:第一类关键词、第二类关键词,其中,所述第一类关键词包括以下一类或多类应用程序名称:视频类、购物类、导航类;所述第二类关键词包括以下一项或多项动词:查看、显示。
- 根据权利要求14-16中任一项所述的电子设备,其特征在于,所述检测装置还用于在所述语音助手以所述第二方式与所述用户进行交互的过程中,检测是否存在所述第一情况;所述处理器还用于在所述检测装置检测到所述第一情况时,熄灭所述屏幕,并使所述语音助手以所述第一方式与所述用户进行交互。
- 一种语音交互方法,其特征在于,包括:电子设备在屏幕处于灭屏状态下检测到用户的第一操作,所述第一操作用于启动语音助手;所述电子设备在第三情况下,在保持所述屏幕处于灭屏状态下启动所述语音助手,并使所述语音助手以第一方式与所述用户进行交互,所述第一方式为仅通过语音与所述用户进行交互;所述第三情况包括:通过摄像头检测到人脸且检测到第一手势。
- 根据权利要求18所述的方法,其特征在于,还包括:所述电子设备在第四情况下,点亮所述屏幕,启动所述语音助手,并使所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互;所述第四情况包括:通过所述摄像头检测到人脸且未检测到所述第一手势。
- 一种电子设备,其特征在于,包括:屏幕、输入装置、摄像头、至少一个处理器;其中:所述输入装置用于在所述屏幕处于灭屏状态下检测到用户的第一操作;所述第一操作用于启动语音助手;所述摄像头用于在所述输入装置检测到用户的第一操作的情况下,检测是否存在第三情况;其中,所述第三情况包括:通过所述摄像头检测到人脸且检测到第一手势;所述处理器用于在所述摄像头检测到所述第三情况时,在保持所述屏幕处于灭屏状态下启动所述语音助手,并使所述语音助手以第一方式与所述用户交互;所述第一方式为仅通过语音与所述用户进行交互。
- 根据权利要求20所述的电子设备,其特征在于,还包括:所述摄像头还用于检测是否存在第四情况;所述第四情况包括:通过所述摄像头检测到人脸且未检测到所述第一手势;所述处理器还用于在所述摄像头检测到所述第四情况时,点亮所述屏幕,启动所述语音助手,并使所述语音助手以第二方式与所述用户进行交互,所述第二方式包括通过图形界面与所述用户进行交互。
- 一种计算机存储介质,其特征在于,包括:计算机指令;当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-12中任一项或18-19中任一项所述的方法。
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-12中任一项或18-19中任一项所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21860113.6A EP4199488A4 (en) | 2020-08-31 | 2021-08-09 | VOICE INTERACTION METHOD AND ELECTRONIC DEVICE |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010901726.8 | 2020-08-31 | ||
CN202010901726.8A CN114125143B (zh) | 2020-08-31 | 2020-08-31 | 一种语音交互方法及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022042274A1 true WO2022042274A1 (zh) | 2022-03-03 |
Family
ID=80354554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/111407 WO2022042274A1 (zh) | 2020-08-31 | 2021-08-09 | 一种语音交互方法及电子设备 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4199488A4 (zh) |
CN (1) | CN114125143B (zh) |
WO (1) | WO2022042274A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115424623A (zh) * | 2022-03-23 | 2022-12-02 | 北京罗克维尔斯科技有限公司 | 语音交互方法、装置、设备及计算机可读存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115996304B (zh) * | 2022-09-08 | 2024-09-10 | 深圳创维-Rgb电子有限公司 | 消息推送方法、装置、终端设备及介质 |
CN116052668B (zh) * | 2023-03-28 | 2023-06-02 | 北京集度科技有限公司 | 一种语音识别处理方法、装置、车辆及计算机程序产品 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975063A (zh) * | 2016-04-27 | 2016-09-28 | 吴波 | 一种控制智能终端的方法和装置 |
CN107222641A (zh) * | 2017-07-12 | 2017-09-29 | 珠海格力电器股份有限公司 | 一种移动终端解锁方法和移动终端 |
CN107360370A (zh) * | 2017-07-27 | 2017-11-17 | 深圳市泰衡诺科技有限公司 | 一种用于智能设备的照片拍摄方法及照片拍摄装置 |
US20170365257A1 (en) * | 2016-06-15 | 2017-12-21 | Realtek Semiconductor Corp. | Voice control system and method thereof |
CN109557999A (zh) * | 2017-09-25 | 2019-04-02 | 北京小米移动软件有限公司 | 亮屏控制方法、装置及存储介质 |
CN109933253A (zh) * | 2019-01-23 | 2019-06-25 | 努比亚技术有限公司 | 应用启动控制方法、终端及计算机可读存储介质 |
CN110658906A (zh) * | 2019-08-30 | 2020-01-07 | 华为技术有限公司 | 显示的方法及电子设备 |
WO2020073288A1 (zh) * | 2018-10-11 | 2020-04-16 | 华为技术有限公司 | 一种触发电子设备执行功能的方法及电子设备 |
WO2020151580A1 (zh) * | 2019-01-25 | 2020-07-30 | 华为技术有限公司 | 一种屏幕控制和语音控制方法及电子设备 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929390A (zh) * | 2012-10-16 | 2013-02-13 | 广东欧珀移动通信有限公司 | 一种在待机状态下应用程序的启动方法及装置 |
EP3349116A4 (en) * | 2015-09-30 | 2019-01-02 | Huawei Technologies Co., Ltd. | Speech control processing method and apparatus |
CN106055144B (zh) * | 2016-05-24 | 2018-12-18 | 北京小米移动软件有限公司 | 控制触控屏状态的方法及装置、电子设备 |
CN106200913A (zh) * | 2016-06-28 | 2016-12-07 | 珠海市魅族科技有限公司 | 一种屏幕状态处理方法以及终端 |
US11314898B2 (en) * | 2017-02-28 | 2022-04-26 | Samsung Electronics Co., Ltd. | Operating method of electronic device for function execution based on voice command in locked state and electronic device supporting the same |
CN107333047B (zh) * | 2017-08-24 | 2020-03-31 | 维沃移动通信有限公司 | 一种拍摄方法、移动终端及计算机可读存储介质 |
CN107831996B (zh) * | 2017-10-11 | 2021-02-19 | Oppo广东移动通信有限公司 | 人脸识别启动方法及相关产品 |
CN108418953B (zh) * | 2018-02-05 | 2020-04-24 | Oppo广东移动通信有限公司 | 终端的屏幕控制方法和装置、可读存储介质、终端 |
CN108391001A (zh) * | 2018-02-05 | 2018-08-10 | 广东欧珀移动通信有限公司 | 终端的屏幕控制方法和装置、可读存储介质、终端 |
CN108399009A (zh) * | 2018-02-11 | 2018-08-14 | 易视腾科技股份有限公司 | 利用人机交互手势唤醒智能设备的方法及装置 |
DK180639B1 (en) * | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
CN109036398A (zh) * | 2018-07-04 | 2018-12-18 | 百度在线网络技术(北京)有限公司 | 语音交互方法、装置、设备及存储介质 |
CN108777748A (zh) * | 2018-09-14 | 2018-11-09 | 李业科 | 一种基于手机姿态变化感测的通话灭屏方式 |
CN109240107B (zh) * | 2018-09-30 | 2022-07-19 | 深圳创维-Rgb电子有限公司 | 一种电器设备的控制方法、装置、电器设备和介质 |
CN109195213B (zh) * | 2018-11-26 | 2023-06-30 | 努比亚技术有限公司 | 移动终端屏幕控制方法、移动终端及计算机可读存储介质 |
CN109712621B (zh) * | 2018-12-27 | 2021-03-16 | 维沃移动通信有限公司 | 一种语音交互控制方法及终端 |
CN109688474A (zh) * | 2018-12-28 | 2019-04-26 | 南京创维信息技术研究院有限公司 | 电视语音控制方法、装置和计算机可读存储介质 |
CN110058777B (zh) * | 2019-03-13 | 2022-03-29 | 华为技术有限公司 | 快捷功能启动的方法及电子设备 |
CN110362290A (zh) * | 2019-06-29 | 2019-10-22 | 华为技术有限公司 | 一种语音控制方法及相关装置 |
CN111262975B (zh) * | 2020-01-08 | 2021-06-08 | 华为技术有限公司 | 亮屏控制方法、电子设备、计算机可读存储介质和程序产品 |
-
2020
- 2020-08-31 CN CN202010901726.8A patent/CN114125143B/zh active Active
-
2021
- 2021-08-09 EP EP21860113.6A patent/EP4199488A4/en active Pending
- 2021-08-09 WO PCT/CN2021/111407 patent/WO2022042274A1/zh unknown
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975063A (zh) * | 2016-04-27 | 2016-09-28 | 吴波 | 一种控制智能终端的方法和装置 |
US20170365257A1 (en) * | 2016-06-15 | 2017-12-21 | Realtek Semiconductor Corp. | Voice control system and method thereof |
CN107222641A (zh) * | 2017-07-12 | 2017-09-29 | 珠海格力电器股份有限公司 | 一种移动终端解锁方法和移动终端 |
CN107360370A (zh) * | 2017-07-27 | 2017-11-17 | 深圳市泰衡诺科技有限公司 | 一种用于智能设备的照片拍摄方法及照片拍摄装置 |
CN109557999A (zh) * | 2017-09-25 | 2019-04-02 | 北京小米移动软件有限公司 | 亮屏控制方法、装置及存储介质 |
WO2020073288A1 (zh) * | 2018-10-11 | 2020-04-16 | 华为技术有限公司 | 一种触发电子设备执行功能的方法及电子设备 |
CN109933253A (zh) * | 2019-01-23 | 2019-06-25 | 努比亚技术有限公司 | 应用启动控制方法、终端及计算机可读存储介质 |
WO2020151580A1 (zh) * | 2019-01-25 | 2020-07-30 | 华为技术有限公司 | 一种屏幕控制和语音控制方法及电子设备 |
CN110658906A (zh) * | 2019-08-30 | 2020-01-07 | 华为技术有限公司 | 显示的方法及电子设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4199488A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115424623A (zh) * | 2022-03-23 | 2022-12-02 | 北京罗克维尔斯科技有限公司 | 语音交互方法、装置、设备及计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN114125143A (zh) | 2022-03-01 |
EP4199488A1 (en) | 2023-06-21 |
EP4199488A4 (en) | 2024-01-10 |
CN114125143B (zh) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020151580A1 (zh) | 一种屏幕控制和语音控制方法及电子设备 | |
WO2022042274A1 (zh) | 一种语音交互方法及电子设备 | |
WO2020156269A1 (zh) | 一种具有柔性屏幕的电子设备的显示方法及电子设备 | |
WO2020020063A1 (zh) | 对象识别方法及移动终端 | |
EP3968144A1 (en) | Voice control method and related apparatus | |
EP4006713A1 (en) | Voice-controlled split-screen display method and electronic device | |
WO2020078337A1 (zh) | 一种翻译方法及电子设备 | |
WO2020207328A1 (zh) | 图像识别方法和电子设备 | |
CN111819533B (zh) | 一种触发电子设备执行功能的方法及电子设备 | |
CN110070863A (zh) | 一种语音控制方法及装置 | |
WO2021052139A1 (zh) | 手势输入方法及电子设备 | |
WO2019184946A1 (zh) | 人脸识别控制方法及移动终端 | |
WO2022161077A1 (zh) | 语音控制方法和电子设备 | |
US20220116758A1 (en) | Service invoking method and apparatus | |
WO2022007944A1 (zh) | 一种设备控制方法及相关装置 | |
WO2023273321A1 (zh) | 一种语音控制方法及电子设备 | |
WO2019223493A1 (zh) | 对象识别方法及移动终端 | |
CN115083401A (zh) | 语音控制方法及装置 | |
WO2024093515A1 (zh) | 一种语音交互方法及相关电子设备 | |
WO2022188551A1 (zh) | 信息处理方法与装置、主控设备和受控设备 | |
WO2023231936A1 (zh) | 一种语音交互方法及终端 | |
CN114089902A (zh) | 手势交互方法、装置及终端设备 | |
WO2022095983A1 (zh) | 一种防止手势误识别的方法及电子设备 | |
CN113572798B (zh) | 设备控制方法、系统、设备和存储介质 | |
CN114120987B (zh) | 一种语音唤醒方法、电子设备及芯片系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21860113 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021860113 Country of ref document: EP Effective date: 20230314 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |