CN115206308A

CN115206308A - Man-machine interaction method and electronic equipment

Info

Publication number: CN115206308A
Application number: CN202110381295.1A
Authority: CN
Inventors: 孙海洋; 曾俊飞; 查永东
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-04-08
Filing date: 2021-04-08
Publication date: 2022-10-18

Abstract

The application provides a man-machine interaction method and electronic equipment, the method can be applied to electronic equipment with voice recognition and voice interaction functions such as a mobile phone and a tablet personal computer, if a voice instruction sent by a user or an answer replied to the electronic equipment comprises a wake-up word, the method can accurately determine the position of the wake-up word in the voice instruction, and the wake-up word in the voice instruction is prevented from interrupting the current interaction flow, so that the interruption of the task currently executed is avoided, and the continuity of man-machine conversation is ensured; in addition, for the robot and other equipment with the sound source positioning function and/or the image acquisition function, the method can determine whether the deflection is required according to the sound source direction of the voice instruction, and estimate the interaction will of the user according to the acquired image and the like, so that the voice interaction with the user is more accurate, and the user experience is improved.

Description

Man-machine interaction method and electronic equipment

Technical Field

The present application relates to the field of electronic technologies, and in particular, to a human-computer interaction method and an electronic device.

Background

With the development of the technology, more and more electronic devices support "human-computer interaction", or "voice interaction", which gradually becomes a way for users to convey intentions and control the electronic devices, and the human-computer interaction mainly controls the electronic devices through voice instructions of the users, so that the hands of the users are liberated, and the users can conveniently control the electronic devices.

Before a user interacts with an electronic device, the electronic device may be awakened by an "awakening word". When the electronic device is awakened, a response of successful awakening can be provided for the user, and the voice instruction of the user is collected and Automatic Speech Recognition (ASR) is carried out. In the voice recognition process after the electronic device is awakened, if the acquired voice command includes an awakening word, the awakening word may interrupt the current human-computer interaction process, and the voice command of the user is collected again to perform voice recognition. The interruption of the current human-computer interaction in the process may not be expected by the user, that is, the wake-up word directly interrupts the task currently being executed, so that the electronic device restarts to acquire the voice command of the mobile phone user, thereby causing discontinuous human-computer conversation, affecting the use process of the user, and reducing the experience of the human-computer interaction.

Disclosure of Invention

The application provides a human-computer interaction method and electronic equipment, wherein the electronic equipment can comprise equipment with a voice recognition function, such as a mobile phone, a robot, a tablet computer and the like, and the method can provide coherent immersive experience for a user and improve the visual experience of the user.

In a first aspect, a method for human-computer interaction is provided, where the method includes: receiving a wake-up word sent by a user, and starting a voice recognition function of the electronic equipment in response to the wake-up word; acquiring a first voice instruction of the user, and determining a first time period occupied by the awakening word in a time period corresponding to the first voice instruction when the first voice instruction is detected to comprise the awakening word; removing the awakening words in the first time interval, and identifying target voice commands except the awakening words in the first voice command; responding to the target voice instruction and answering.

In a possible scenario, for example, a user wakes up a mobile phone by a wake-up word "xiaoyu", after the mobile phone is woken up, the mobile phone enters a state of monitoring a voice instruction for use, and if the voice instruction sent by the user includes the wake-up word "xiaoyu" again, the wake-up word can interrupt a current human-computer interaction process and reenter a next human-computer interaction process, which may not be expected by the user, that is, the wake-up word directly interrupts a task currently being executed, so that the mobile phone needs to restart to collect the voice instruction of the user, which may cause discontinuous human-computer conversation, affect a use process of the user, and reduce experience of human-computer interaction.

By the method, after the user wakes up the electronic equipment through the wake-up word in the voice interaction process of the user and the electronic equipment, if the voice instruction sent by the user comprises the wake-up word again, the method can prevent the wake-up word in the voice instruction from interrupting the current interaction flow, thereby preventing the task currently executed by the electronic equipment from being interrupted directly, restarting the process of collecting the voice instruction of the user, ensuring the continuity of man-machine conversation and improving the user experience.

It should be understood that the Automatic Speech Recognition (ASR) module of the mobile phone is not always turned on and is in a working state, and when a user sends a speech instruction, the ASR module of the mobile phone is turned off; or when the mobile phone answers the user, the ASR module is closed, so that the condition that the voice of the mobile phone is collected and the collection and recognition of the voice instruction of the user are interfered is avoided. Through the awakening word, the mobile phone can detect whether the ASR module is in an open state or not after being awakened, and if the ASR module is in a dormant or non-working close state, the ASR module can be triggered to be opened, namely, the voice recognition function of the electronic equipment is started.

Optionally, when the mobile phone obtains and recognizes the awakening word "xiaozhi", if it is determined that the mobile phone is currently in a state of turning on the ASR module, the awakening may be omitted, and the current conversation process is continued.

With reference to the first aspect, in some implementations of the first aspect, the first time period is an end time period, an intermediate time period, or a start time period of the time period corresponding to the first voice instruction.

After the mobile phone is awakened, a first voice instruction of the user is monitored, and when the fact that the awakening word "small art" is included in the first voice instruction is detected, the position of the awakening word "small art" in the first voice instruction can be judged firstly, wherein the position mainly includes the first voice instruction, the middle of the first voice instruction and the end of the first voice instruction. For example, the first voice command issued by the user may be "imitate the voice of a cow, mini art" (the wakeup word is located at the end of the first voice command), "imitate the voice of an animal, mini art, imitate the voice of a cow (the wakeup word is located in the middle of the first voice command)" or "mini art, imitate the voice of a cow" (the wakeup word is located at the head of the first voice command).

With reference to the first aspect and the foregoing implementation manners, in some implementation manners of the first aspect, when the first time period is an end time period of the time period corresponding to the first voice instruction, the method further includes: detecting a voice instruction which is closest to the awakening word in the first voice instruction and a time interval from the voice instruction to the awakening word; and when the time interval is greater than or equal to a first preset value, suspending the current conversation process and responding to the awakening word, and restarting the voice recognition function of the electronic equipment so that the electronic equipment acquires a second voice instruction.

It should be understood that the first preset value can be used to determine whether the current user wishes to interrupt the dialog flow. Illustratively, when the first voice command issued by the user is: the awakening word simulates the cry of a cow and the artistic handicap is positioned at the tail of the voice instruction. The closest voice instruction of the awakening word ' Xiao Yi ' is ' the sound of imitating a cow ', and the mobile phone can judge that the user sends out the mother of the awakening word ' Xiao Yi ' according to the time interval between the sound of imitating a cow ' and the ' Xiao Yi '. When the time interval between the "voice" imitating the cry of cattle "and the first" small "of the" little art "is smaller than the first preset value, it can be judged that the user may only take the awakening word" little art "as a part of the Buddhist and hopes to continue the current conversation process without switching to the next new conversation process.

Optionally, the mobile phone may record the time of the awakening word "Xiaoyi" in the first voice instruction according to the first voice instructionAnd (4) information. The recording and marking rule of the time information is not limited in the embodiment of the present application, and for example, if the initial wakeup word is used to wake up the mobile phone as the starting time, the time period when the wakeup word appears in the first voice command again is t ₁ -t ₂ (ii) a If the initial awakening word is used for awakening the mobile phone as the starting time, the time period when the awakening word reappears in the first voice command is T ₁ -T ₂ The position of the wake-up word in the first voice command can be determined according to the time information.

In a second aspect, a method for human-computer interaction is provided, where the method includes: acquiring a first voice instruction of a user, and detecting the sound source direction of the first voice instruction according to the first voice instruction; determining a first angle between the sound source direction of the first voice instruction and a first sight line direction currently faced by the electronic equipment; when the first angle is larger than or equal to a first preset angle, determining a second angle between the sound source direction of the first voice instruction and the sound source direction of a second voice instruction, wherein the second voice instruction is a voice instruction which is sent out by a user before the first voice instruction and is closest to the first voice instruction; and when the second angle is smaller than or equal to a second preset angle, the electronic equipment responds to the first voice command and responds.

In another possible scenario, some electronic devices may have the capability of sound source localization or the function of image acquisition of a camera, such as a robot or the like. After the robot is awakened by the awakening word, the direction of the user can be determined according to the sound source positioning function, the camera with the image acquisition function is rotated, and the direction or the position of the user positioned according to the sound source positioning is directly rotated. In this process, the direction of user's place may appear great judgement error because sound is by the wall reflection scheduling problem, when this kind of great error appears, can appear not be the phenomenon of facing to the people after equipment rotates.

By the method, the robot awakening process is more consistent with the expectation of a human, and when the included angle theta between the sound source direction of the voice instruction of the user and the current facing sight line direction of the robot is larger than or equal to a first preset angle and the interaction willingness of the user is strong, the robot can determine to automatically turn to the user; when the included angle theta between the sound source direction of the voice instruction of the user and the current sight line direction facing the robot is larger than or equal to a first preset angle and the interaction willingness of the user is low, the robot can turn back, the interaction flow of the user and the robot cannot be interrupted in the process, and better human-computer interaction experience is brought to the user.

It should be understood that, when the included angle θ between the sound source direction of the first voice command and the sight line direction is greater than or equal to the first preset angle, it may be considered that the user issuing the voice command and the robot are not in a face-to-face positional relationship, or that the user issuing the voice command is not in the central area range of the image captured by the robot, and the range corresponding to the central area is not limited in the embodiment of the present application.

It should also be understood that the "previous voice command" here is the closest voice command preceding the first voice command. Alternatively, the "previous voice instruction" may be a user's wake word instruction, such as: the Xiaoyi is a Xiaoyi. Or the "previous voice instruction" is another voice instruction after the wakeup word, such as: please simulate the cry of a cow. The embodiments of the present application do not limit this.

With reference to the second aspect, in certain implementations of the second aspect, the method further includes: detecting a time interval of the first voice command and the second voice command; when the time interval is smaller than or equal to a second preset value, a steering execution function is called, and the electronic equipment is rotated to face or infinitely approach the sound source direction of the first voice command.

Optionally, the display conditions that an included angle between the sound source direction of the first voice command and the sound source direction of the previous voice command is greater than or equal to a second preset angle and the time interval between the two voice commands is greater than or equal to a second preset value may satisfy any one of the display conditions, or satisfy both the display conditions, call a steering execution function, and convert the robot direction. The embodiments of the present application do not limit this.

With reference to the second aspect and the foregoing implementations, in some implementations of the second aspect, the method further includes: acquiring a first image of the electronic equipment in the first sight line direction; and when the first image comprises the user and a third angle between the sight line direction of the user and the sound source direction of the first voice instruction is smaller than or equal to a third preset angle, calling a steering execution function, and rotating the electronic equipment to face or infinitely approach the sound source direction of the first voice instruction.

Optionally, the robot may further capture an image through a camera, and detect a direction in which the eyes of the user gaze in the captured image to estimate the user's willingness to interact.

With reference to the second aspect and the foregoing implementation manners, in some implementation manners of the second aspect, the method further includes: the electronic equipment acquires a second image facing or infinitely approaching the sound source direction of the first voice instruction; and when the second image does not comprise the user or a fourth angle between the sight line direction of the user and the current second sight line direction of the electronic equipment is larger than a fourth preset angle, rotating the electronic equipment to restore to the first sight line direction.

In summary, in the voice interaction process between the user and the electronic device, after the user wakes up the electronic device by the wake-up word, if the voice instruction sent by the user or the answer returned to the electronic device includes the wake-up word again, the method can prevent the wake-up word in the voice instruction from interrupting the current interaction flow, thereby preventing the task currently being executed by the electronic device from being interrupted directly, restarting the process of acquiring the voice instruction of the user, ensuring the continuity of man-machine conversation, and improving the user experience.

In addition, for electronic equipment such as a robot with sound source positioning capability and the like, the method provided by the embodiment of the application can determine whether deflection is required according to the sound source direction of the voice instruction, and estimate the interaction intention of the user according to the collected image and the like, so that voice interaction with the user can be performed more accurately. Specifically, when an included angle theta between a sound source direction of a voice instruction of a user and a current facing sight line direction of the robot is larger than or equal to a first preset angle and the interaction will of the user is strong, the robot can determine to automatically turn to the user; when the included angle theta between the sound source direction of the voice instruction of the user and the current sight line direction facing the robot is larger than or equal to a first preset angle and the interaction willingness of the user is low, the robot can turn back, the interaction flow of the user and the robot cannot be interrupted in the process, and better human-computer interaction experience is brought to the user.

In a third aspect, an electronic device is provided, including: one or more processors; one or more memories; a module in which a plurality of applications are installed; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform steps that cause the electronic device to: receiving a wake-up word sent by a user, and starting a voice recognition function of the electronic equipment in response to the wake-up word; acquiring a first voice instruction of the user, and determining a first time interval occupied by the awakening word in a time interval corresponding to the first voice instruction when the first voice instruction is detected to comprise the awakening word; removing the awakening words in the first time interval, and identifying target voice instructions except the awakening words in the first voice instruction; responding to the target voice instruction and answering.

With reference to the third aspect, in some implementations of the third aspect, the first time period is an end time period, an intermediate time period, or a start time period of the time period corresponding to the first voice instruction.

With reference to the third aspect and the foregoing implementation manners, in some implementation manners of the third aspect, when the first time period is an end time period of the time period corresponding to the first voice instruction, the electronic device may further perform the following steps: detecting a voice instruction which is closest to the awakening word in the first voice instruction and a time interval from the voice instruction to the awakening word; and when the time interval is greater than or equal to a first preset value, suspending the current conversation process and responding to the awakening word to restart the voice recognition function of the electronic equipment so that the electronic equipment acquires a second voice instruction.

In a fourth aspect, an electronic device is provided, comprising: a camera; one or more processors; one or more memories; a module in which a plurality of applications are installed; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform steps that cause the electronic device to: acquiring a first voice instruction of a user, and detecting the sound source direction of the first voice instruction according to the first voice instruction; determining a first angle between the sound source direction of the first voice instruction and a first sight line direction currently faced by the electronic equipment; when the first angle is larger than or equal to a first preset angle, determining a second angle between the sound source direction of the first voice instruction and the sound source direction of a second voice instruction, wherein the second voice instruction is a voice instruction which is sent by a user before the first voice instruction and is closest to the first voice instruction; and when the second angle is smaller than or equal to a second preset angle, the electronic equipment responds to the first voice command and responds.

With reference to the fourth aspect, in certain implementations of the fourth aspect, the one or more programs, when executed by the processor, cause the electronic device to perform the steps of: detecting a time interval of the first voice command and the second voice command; when the time interval is smaller than or equal to a second preset value, a steering execution function is called, and the electronic equipment is rotated to face or infinitely approach the sound source direction of the first voice command.

With reference to the fourth aspect and the implementations described above, in some implementations of the fourth aspect, the one or more programs, when executed by the processor, cause the electronic device to perform the steps of: acquiring a first image of the electronic equipment in the first sight line direction; and when the first image comprises the user and a third angle between the sight line direction of the user and the sound source direction of the first voice instruction is smaller than or equal to a third preset angle, calling a steering execution function, and rotating the electronic equipment to face or infinitely approach the sound source direction of the first voice instruction.

With reference to the fourth aspect and the implementations described above, in some implementations of the fourth aspect, the one or more programs, when executed by the processor, cause the electronic device to perform the steps of: collecting a second image facing or approaching to the sound source direction of the first voice instruction infinitely; and when the second image does not comprise the user or a fourth angle between the sight line direction of the user and the current second sight line direction of the electronic equipment is larger than a fourth preset angle, rotating the electronic equipment to restore to the first sight line direction.

In a fifth aspect, the present application provides an apparatus, included in an electronic device, having functionality to implement the above aspects and possible implementations of the above aspects. The functions may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the above-described functions. Such as a display module or unit, a detection module or unit, a processing module or unit, etc.

In a sixth aspect, the present application provides an electronic device, comprising: a touch display screen, wherein the touch display screen comprises a touch sensitive surface and a display; one or more audio devices; a camera; one or more processors; a memory; a plurality of application programs; and one or more computer programs. Wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by the electronic device, cause the electronic device to perform a method of human-computer interaction in any one of the possible implementations of the above aspects.

In a seventh aspect, the present application provides an electronic device comprising one or more processors and one or more memories. The one or more memories are coupled to the one or more processors for storing computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform a method of human interaction in any of the possible implementations of the above aspects.

In an eighth aspect, the present application provides a computer-readable storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform any one of the above possible human-computer interaction methods.

In a ninth aspect, the present application provides a computer program product, which, when run on an electronic device, causes the electronic device to perform any one of the above possible human-computer interaction methods.

Drawings

Fig. 1 is a schematic structural diagram of an example of an electronic device according to an embodiment of the present application.

Fig. 2 is a block diagram of a software structure of an electronic device according to an embodiment of the present application.

FIG. 3 is a diagram of a graphical user interface illustrating a human-computer interaction process.

Fig. 4 is a schematic flowchart of an example human-computer interaction method provided in the embodiment of the present application.

Fig. 5 is a scene schematic diagram of an example of human-computer interaction provided in the embodiment of the present application.

Fig. 6 is a schematic flowchart of an example human-computer interaction method according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

The man-machine interaction method provided by the embodiment of the application can be applied to electronic devices such as mobile phones, tablet computers, wearable devices, vehicle-mounted devices, augmented Reality (AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, personal Digital Assistants (PDAs), and the like, and the embodiment of the application does not limit the specific types of the electronic devices at all.

Fig. 1 is a schematic structural diagram of an example of an electronic device according to an embodiment of the present disclosure. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller may be, among other things, a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the time sequence signal to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, a charger, a flash, a camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the electronic device 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of receiving a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, audio module 170 and wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit the audio signal to the wireless communication module 160 through the PCM interface, so as to implement the function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of electronic device 100. Processor 110 and display screen 194 communicate via a DSI interface to implement display functions of electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transmit data between the electronic device 100 and a peripheral device. And the method can also be used for connecting a headset and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the interface connection relationship between the modules illustrated in the embodiments of the present application is only an illustration, and does not limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In other embodiments, the power management module 141 may be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou satellite navigation system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode), a flexible light-emitting diode (FLED), a MiniLED, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor, which processes input information quickly by referring to a biological neural network structure, for example, by referring to a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a Universal Flash Storage (UFS), and the like.

The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into analog audio signals for output, and also used to convert analog audio inputs into digital audio signals. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into a sound signal. The electronic apparatus 100 can listen to music through the speaker 170A or listen to a handsfree call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 100 receives a call or voice information, it can receive voice by placing the receiver 170B close to the ear of the person.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking near the microphone 170C through the mouth. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and perform directional recording.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus 100 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.

The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip phone, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, the electronic device 100 may utilize the distance sensor 180F to range to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic apparatus 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there are no objects near the electronic device 100. The electronic device 100 can utilize the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. Electronic device 100 may adaptively adjust the brightness of display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint characteristics to unlock a fingerprint, access an application lock, photograph a fingerprint, answer an incoming call with a fingerprint, and so on.

The temperature sensor 180J is used to detect temperature. In some embodiments, electronic device 100 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature is below another threshold to avoid the low temperature causing the electronic device 100 to shut down abnormally. In other embodiments, when the temperature is lower than a further threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human voice vibrating a bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the electronic apparatus 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 is also compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

The software system of the electronic device 100 may employ a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the application adopts a layered architecture

The system is an example illustrating a software structure of the electronic device 100.

Fig. 2 is a block diagram of a software structure of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, will

The system is divided into four layers, namely an application program layer, an application program framework layer and an android runtime from top to bottom (

runtime) and system libraries, and the kernel layer. The application layer may include a series of application packages.

As shown in fig. 2, the application package may include applications such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 2, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether the screen has a status bar or participate in executing operations such as screen locking and screen capturing.

Content providers are used to store and retrieve data and make it accessible to applications. The stored data may include video data, image data, audio data, and the like, and may also include call record data of dialing and answering, browsing history of the user, bookmark, and the like, which are not described herein again.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide communication functions for the electronic device 100. Such as management of call status (including connection, disconnection, etc. of the phone).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so forth.

The notification manager enables the application to display notification information in the status bar of the screen, which can be used to convey a message to the user, and the notification information can disappear automatically after the status bar stays for a short time without the user performing an interactive process such as a closing operation. Such as a message that the notification manager may inform the user that the download is complete. The notification manager may also be a notification that appears in the form of a chart or scrollbar text at the top status bar of the system, such as a notification of a running application in the background; alternatively, the notification manager may also be a notification that appears on the screen in the form of a dialog window, such as prompting a text message in a status bar; or, the notification manager may also control the electronic device to emit a warning sound, a vibration of the electronic device, a flashing of an indicator light of the electronic device, and the like, which is not described herein again.

A runtime includes a core library and a virtual machine.

runtime is responsible for the scheduling and management of the android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application layer and the application framework layer as binary files. Virtual machines are used to perform functions such as lifecycle management, stack management, thread management, security and exception management, and garbage collection of objects.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media libraries (media libraries), three-dimensional (3D) graphics processing libraries (e.g., openGL ES), two-dimensional (2D) graphics engines, and the like.

The surface manager is used to manage the display subsystem of the electronic device and provides a fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The two-dimensional graphics engine is a two-dimensional drawing engine.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

For convenience of understanding, the following embodiments of the present application will specifically describe a human-computer interaction method provided by the embodiments of the present application, by taking an electronic device having a structure shown in fig. 1 and fig. 2 as an example, and combining the drawings and an application scenario.

First, before introducing the method for human-computer interaction provided in the embodiments of the present application, a few possible application scenarios are listed.

In a possible scenario, the method for human-computer interaction provided by the embodiment of the present application may be applied to a scenario including a separate electronic device. For example, the electronic device may be a different electronic device such as a mobile phone, a tablet, a smart screen, etc. described in conjunction with the structure shown in fig. 1, which is not limited in this embodiment. The following describes a method for displaying a prompt of a human-computer interaction instruction provided by the present application in detail by taking a mobile phone as an example.

Fig. 3 is a schematic diagram of a Graphical User Interface (GUI) of a human-computer interaction process, where (a) in fig. 3 illustrates that, in an unlocking mode of a mobile phone, a screen display system of the mobile phone displays currently output interface content 301, and the interface content 301 is a main interface of the mobile phone. The interface contents 301 display various applications (apps) such as mail, calculator, setting, music, and the like. It should be understood that the interface content 301 may also include other and more applications, which are not limited in this application.

In a possible implementation manner, in the using process of the voice assistant, a user can start the function of the smart voice of the mobile phone through the setting application. Illustratively, as shown in fig. 3 (a), the user may click on an icon of the setup application, and in response to the user's click operation, the mobile phone displays a main interface 302 of the setup application as shown in fig. 3 (b). The setup application may include a number of menus on the main interface 302, such as menus for WLAN, bluetooth, desktop and wallpaper, display and brightness, sound and smart assistant. The user can click the intelligent assistant menu on the interface 302, in response to the click operation of the user, the mobile phone displays the intelligent assistant interface 303 as shown in (c) of fig. 3, the intelligent assistant interface 303 includes options such as intelligent voice, intelligent vision, intelligent screen recognition, scene intelligence, intelligent search, and the like, and in addition, the intelligent assistant interface 303 also displays a wake-up word "small art", which can be used for the user to wake up the mobile phone, so that the mobile phone enters a state of monitoring and collecting the voice instruction of the user.

As shown in fig. 3 (c), the user clicks the smart voice option on the smart assistant interface 303, and in response to the user clicking, the mobile phone displays the smart voice interface 304 as shown in fig. 3 (d). The intelligent voice interface 304 may include a voice wake-up switch, a power switch, an Artificial Intelligence (AI) letter switch, a driving scene switch, and the like. In the embodiment of the application, the user can click the voice wake-up switch to start the voice interaction function of the mobile phone. In other words, after the voice wake-up switch is turned on, the mobile phone may be woken up by the wake-up word "xiaozhi", start to collect the voice command of the user and enter the voice recognition stage.

When the user starts the voice interaction function of the mobile phone, if the user sends out the awakening word of the Xiao Yi, a floating window can be displayed on the screen of the mobile phone to prompt the user to start to collect the voice instruction of the user at present and start the voice interaction process. For example, as shown in (e) of fig. 3, after the user issues a wake word of "art", the mobile phone is woken up by the wake word, and a floating window 10 may be displayed on the screen, where the floating window 10 includes the conversation contents of the user and the mobile phone (for example: hi 8230and the monitoring icon 10-1 when the mobile phone monitors the voice command of the user, the monitoring icon 10-1 may flash dynamically to show that the voice command of the user is being monitored, which is not limited in the embodiment of the present application.

As shown in fig. 3 (e), the handset listens to the voice command of the user: imitate the cry of cattle and art. The handset can recognize the content of the voice command and display the recognized voice command in the floating window 10, and in the existing scheme, the handset can respond according to the voice command, for example, imitate the sound of a cow.

However, when the voice command includes the awakening word "mini art" again, the mobile phone may be interrupted by the awakening word "mini art" in the voice command, so as to interrupt the current human-computer interaction process, and resume collecting the voice command of the user and perform voice recognition. As shown in (f) of fig. 3, after the mobile phone recognizes that the voice command of the user includes the wake-up word "art", in response to the voice command including the awakening word "art", the mobile phone will re-enter the next man-machine interaction process and respond in the floating window 10: hi, i listen to 8230, and a dynamically flashing listening icon 10-1 is displayed on the handset to listen to the user's voice command to indicate that the handset is currently resuming listening to the user's voice command.

In the above scenario, if the voice instruction sent by the user includes the awakening word "Xiaoyi", the awakening word may interrupt the current human-computer interaction process and reenter the next human-computer interaction process, which may not be expected by the user, that is, the awakening word directly interrupts the task currently being executed, so that the mobile phone needs to restart to collect the voice instruction of the user, which may cause discontinuous human-computer conversation, affect the use process of the user, and reduce the human-computer interaction experience.

The embodiment of the application provides a man-machine interaction method, which can avoid the interruption of a man-machine interaction process by a wakeup word in a voice instruction so as to bring better man-machine interaction experience to a user.

Fig. 4 is a schematic flowchart of an example of a method for human-computer interaction provided in an embodiment of the present application, and it should be understood that the method 400 may be applied to an electronic device such as a mobile phone, a PC, a vehicle-mounted device, and the like having the structure shown in fig. 1 and fig. 2. As shown in fig. 4, the method 400 includes:

401, a first voice instruction of a user is obtained, and it is detected that the first voice instruction includes a wakeup word.

Illustratively, in connection with the scenario illustrated in fig. 3 (e), if the user currently desires to have a conversation with the handset, the following are:

the user: and (5) small art.

The mobile phone comprises: hi, i listen to 8230

The user: imitate the cry of cattle and the art of Xiao Yi.

The mobile phone comprises: moumoumou \8230

When the mobile phone detects that the user utters the awakening word 'Xiaoyi', the mobile phone is awakened, and the mobile phone enters a state of monitoring the voice instruction of the user.

402, it is determined whether the ASR module is in an on state.

It should be understood that the ASR module of the mobile phone is not always turned on and is in a working state, and when the user sends a voice instruction, the mobile phone turns off an Automatic Speech Recognition (ASR) function, that is, turns off the ASR module; or when the mobile phone answers the user, the ASR module is also closed, so that the condition that the voice of the mobile phone is collected and the collection and the recognition of the voice instruction of the user are interfered is avoided. Through step 402, the handset first detects whether the ASR module is in an on state, and if the ASR module is in a sleep or off state, the ASR module may be triggered to be turned on.

403, when the mobile phone determines that the ASR module is in the on state, determining a position of the wakeup word in the first voice instruction.

In a possible implementation manner, after the mobile phone is woken up, a first voice instruction of the user is monitored, and when it is detected that the first voice instruction includes the wake-up word "xiaozhi", a position of the wake-up word "xiaozhi" in the first voice instruction may be first determined, where the position may mainly include a first position of the first voice instruction, a middle position of the first voice instruction, and an end position of the first voice instruction. For example, the first voice command issued by the user may be "imitate the voice of a cow, mini art" (the wakeup word is located at the end of the first voice command), "imitate the voice of an animal, mini art, imitate the voice of a cow (the wakeup word is located in the middle of the first voice command)" or "mini art, imitate the voice of a cow" (the wakeup word is located at the head of the first voice command).

404-1, when the position of the wakeup word in the first voice command is the end, executing step 405, and determining whether the duration of the voice command with the closest wakeup word distance is less than a first preset value.

And 406, recording time information corresponding to the awakening word when the time interval between the awakening word and the closest voice instruction is smaller than a first preset value.

It should be understood that the first preset value can be used to determine whether the current user wishes to interrupt the conversation process. Illustratively, when the first voice command issued by the user is: the awakening word simulates the cry of a cow and the small art and is positioned at the tail of the voice instruction. According to step 406, the voice command closest to the awakening word "Xiao Yi" is "imitation cow's voice", and the mobile phone can judge that the user sends out the mother of the awakening word "Xiao Yi" according to the time interval between the "imitation cow's voice and the" Xiao Yi ". When the time interval between the "voice" imitating the cry of cattle "and the first" small "of the" little art "is smaller than the first preset value, it can be judged that the user may only take the awakening word" little art "as a part of the Buddhist and hopes to continue the current conversation process without switching to the next new conversation process.

Optionally, the mobile phone may record time information of the wakeup word "xiaozhi art" in the first voice instruction according to the first voice instruction. The recording and marking rule of the time information is not limited in the embodiment of the present application, and for example, if the initial wakeup word is used to wake up the mobile phone as the starting time, the time period when the wakeup word appears in the first voice command again is t ₁ -t ₂ (ii) a If the initial awakening word is used for awakening the mobile phone as the starting time, the time period when the awakening word reappears in the first voice command is T ₁ -T ₂ The position of the wake-up word in the first voice command may be determined according to the time information.

And 407, ignoring the awakening word according to the time information corresponding to the awakening word, and identifying the first voice instruction.

408, normal response. Optionally, the normal response may include feedback from the mobile phone according to the user's question, and continuous conversation with the user, or may further include voice response such as "kaye", "good", and the embodiment of the present application is not limited thereto.

409, when the time length of the voice instruction with the closest wakeup word distance is greater than or equal to a first preset value, pausing the current dialog box and starting a next new dialog box.

And 410, starting the ASR, recognizing a second voice instruction of the user of the new dialog box, performing normal response again according to the second voice instruction of the user, or returning to the step 401, detecting whether the second voice instruction comprises the awakening word again, and repeating the above process, wherein the process is not repeated for simplicity and convenience.

Illustratively, when the first voice command issued by the user is: the awakening word is positioned at the tail of the first voice instruction and simulates the cry of a cow and the small art. When the time interval between the sound of the ' imitation cow's cry ' and the first ' small ' of the ' small art ' is greater than or equal to a first preset value, it can be judged that the user may wish to interrupt the current conversation process and enter the next new conversation process. In other words, the handset can use the awakening word "mini art" included again in the first voice command as the awakening word of the next conversation process, and the handset is awakened again to interrupt the previous conversation process of "imitating the sound of a cow". Optionally, the handset may then reply to "hi, i am listening to 8230", which is not limited by the embodiments of the present application.

Optionally, the first preset value may be 1 second, 2 seconds, and the like, which is not limited in this embodiment of the application.

411, when the mobile phone determines that the ASR module is not in the on state, the ASR module is turned on, and the monitoring function is started. After the ASR monitoring function is turned on, the process of obtaining the voice instruction of the user in step 401 is continuously executed, which is not described herein again.

For step 403, when it is determined that the wakeup word is located at the first position in the first voice instruction or located in the middle of the first voice instruction, that is, 404-3, when the wakeup word is located at the first position in the first voice instruction, or 404-2, when the wakeup word is located in the middle of the first voice instruction, step 406-408 is performed, time information corresponding to the wakeup word is recorded, the wakeup word is ignored according to the time information corresponding to the wakeup word, the first voice instruction is identified, and a normal response is performed, which is not described herein again for simplicity.

In one possible scenario, if speech recognition is restarted within a short time immediately after the end of speech recognition, the mobile phone can determine whether the user continues to speak, and if the user does not continue to speak, the mobile phone can continue to talk with the user using the previous speech recognition result.

Furthermore, in another possible scenario, some electronic devices may have the capability of sound source localization or the function of image acquisition of a camera, such as a robot or the like. After the robot is awakened by the awakening word, the direction of the user can be determined according to the sound source positioning function, the camera with the image acquisition function is rotated, and the direction or the position of the user positioned according to the sound source positioning is directly rotated. In the process, the direction of the user may have a large judgment error due to the problem that sound is reflected by a wall, and the like, and when the large error occurs, the phenomenon that the user does not face the person after the equipment rotates can occur.

It should be understood that the robot may have a part or all of the structure shown in fig. 1 or a software architecture shown in fig. 2, which is not limited by the embodiment of the present application.

Exemplarily, fig. 5 is a scene schematic diagram of an example of human-computer interaction provided in the embodiment of the present application. As shown in fig. 5, assuming that the robot has a sound source localization capability and an image capturing function, the robot may determine a sound source direction according to a voice instruction of a user, and may determine a self sight estimation (gaze) direction according to an image captured by a camera. Wherein the angle between the direction of the line of sight and the direction of the sound source is denoted as θ.

Optionally, in the process that the robot determines the direction of its own sight estimation (size estimation) according to an image acquired by the camera, a camera coordinate system may be established, and based on the public parameters of the camera, the size target and the coordinates of the user's eye positions are transformed into the camera coordinates through algorithms such as six three-dimensional key points, and the like.

The embodiment of the application also provides a man-machine interaction method aiming at the electronic equipment such as the robot with the sound source positioning capability, so that the man-machine interaction process can be prevented from being interrupted by the awakening words in the voice instruction, and better man-machine interaction experience is brought to a user.

Fig. 6 is a schematic flowchart of an example of a method for human-computer interaction provided in the embodiment of the present application, and it should be understood that the method 600 may be applied to an electronic device such as a robot with a sound source localization capability. As shown in fig. 6, method 600 includes:

601, the robot acquires a first voice instruction of the user.

The robot detects a sound source direction of the first voice command according to the first voice command 602.

603, the robot judges whether an included angle theta between the sound source direction of the first voice command and the current sight line direction of the robot is larger than or equal to a first preset angle.

604, when the included angle θ between the sound source direction and the sight line direction of the first voice command is greater than or equal to a first preset angle, the robot determines whether the interaction will of the user is less than a preset value.

Optionally, in step 604, the robot may capture an image through a camera, and detect a direction in which eyes of a user gaze in the captured image to estimate the willingness of the user to interact. For example, table 1 lists an example of possible ranges of willingness to interact with the user.

TABLE 1

Included angle range between user gazing direction and robot sight line direction	Estimation range of interactive will
		0°-30°	0.8-1.0
30°-60°	0.5-0.8
		60°-90°	0.1-0.5

As shown in table 1, when the estimated interaction will range is determined to be 0.8-1.0 according to the included angle range between the gazing direction of the user and the visual line direction of the robot, the robot can judge that the current interaction will of the user is strong; when the estimation range of the interaction will is determined to be 0.5-0.8 according to the included angle range of the gazing direction of the user and the visual line direction of the robot, the robot can judge that the current interaction will of the user is general; when the estimated interactive willingness range is determined to be 0.1-0.5 according to the included angle range between the gazing direction of the user and the visual line direction of the robot, the robot can judge that the current interactive willingness of the user is low, and the method is not limited in the embodiment of the application.

Alternatively, the preset value may be set to 0.5, and when the estimated current willingness to interact with the user is greater than or equal to the preset value, the following step 605 is executed continuously.

605, the robot determines whether an included angle between the sound source direction of the first voice command and the sound source direction of the previous voice command is smaller than a second preset angle, and whether a time interval between two voice commands is smaller than a second preset value.

606, when the included angle between the sound source direction of the first voice instruction and the sound source direction of the previous voice instruction is smaller than a second preset angle, and the time interval between the two voice instructions is smaller than a second preset value, the robot performs normal response.

It should be understood that the "previous voice command" here is the closest voice command preceding the first voice command. Alternatively, the "previous voice instruction" may be a user's wake word instruction, such as: and (5) small art. Or the "previous voice instruction" is another voice instruction after the wakeup word, such as: please simulate the cry of a cow. The embodiments of the present application do not limit this.

It should also be understood that the normal response here may be understood as that the robot recognizes the first voice command of the user and makes corresponding feedback according to the first voice command, which is not described herein again.

607, when the included angle between the sound source direction of the first voice command and the sound source direction of the previous voice command is greater than or equal to a second preset angle and the time interval between the two voice commands is greater than or equal to a second preset value, the robot calls a steering execution function to convert the robot direction.

Optionally, the display conditions that an included angle between the sound source direction of the first voice command and the sound source direction of the previous voice command is greater than or equal to a second preset angle and a time interval between the two voice commands is greater than or equal to a second preset value may be satisfied, or both, a steering execution function is called, and the robot direction is converted. The embodiments of the present application do not limit this.

And 608, responding to the steering execution function, after the robot converts the direction, determining the user interaction intention after converting the direction. Optionally, the process of determining the willingness of the user to interact may be determined by collecting an image and determining a direction of the user's gaze in the image, for which reference is made to the related description of step 604, which is not described herein again.

In one possible implementation manner, in step 608, after the robot performs function conversion in response to the steering, the user interaction willingness after determining the direction is low, the robot may turn back to the direction of the line of sight again, and simultaneously, step 606 is performed to perform corresponding feedback on the voice command of the user to perform normal response.

In another possible scenario, if the first voice command of the user may include a wake-up word, the method shown in fig. 4 may be combined, and an image is collected and the user's willingness to interact is estimated according to the direction of the user's gaze in the image. When the robot does not detect a person in the image, it may be determined that the user has a low willingness to interact with the robot, or the current human-computer interaction process is interrupted.

In another possible scenario, if the first voice instruction is a wakeup word, and an included angle θ between a sound source direction of the wakeup word and a current facing sight direction of the robot is greater than or equal to a first preset angle, the robot may determine whether to respond to the wakeup according to whether there is a user in a currently acquired image and whether to estimate an interaction intention of the user.

For example, if an included angle θ between the sound source direction of the wake-up word and the current direction of the sight line facing the robot is greater than or equal to a first preset angle, and the interaction will of the current user is strong, it may be set that the robot needs to wake up the robot only by two consecutive wake-up words in the same sound source direction, that is, the robot responds to the wake-up word of the user.

Or if the included angle theta between the sound source direction of the awakening word and the current sight line direction of the robot is larger than or equal to the first preset angle, the robot can turn back to the angle before awakening after turning to the sound source direction of the awakening word without detecting the user, and voice interaction with the person before awakening is continued.

By the method, the robot awakening process is more consistent with the expectation of a human, and when the included angle theta between the sound source direction of the voice instruction of the user and the current facing sight line direction of the robot is larger than or equal to a first preset angle and the interaction willingness of the user is strong, the robot can determine to automatically turn to the user; when an included angle theta between a sound source direction of a voice instruction of a user and a sight line direction which is currently faced by the robot is larger than or equal to a first preset angle and the interaction will of the user is low, the robot can turn back, the interaction flow of the user and the robot cannot be interrupted in the process, and better human-computer interaction experience is brought to the user.

In summary, in the voice interaction process between the user and the electronic device, after the user wakes up the electronic device through the wake-up word, if the voice instruction sent by the user or the answer replied to the electronic device includes the wake-up word again, the method can prevent the wake-up word in the voice instruction from interrupting the current interaction flow, thereby preventing the task currently being executed by the electronic device from being interrupted directly, restarting the process of collecting the voice instruction of the user, ensuring the continuity of man-machine conversation, and improving the user experience.

In addition, for electronic equipment such as a robot with sound source positioning capability and the like, the method provided by the embodiment of the application can determine whether deflection needs to occur according to the sound source direction of the voice instruction, and estimate the interaction will of the user according to the collected image and the like, so that voice interaction can be performed with the user more accurately. Specifically, when an included angle theta between a sound source direction of a voice instruction of a user and a sight line direction currently faced by the robot is larger than or equal to a first preset angle and the interaction will of the user is strong, the robot can determine to automatically turn to the user; when the included angle theta between the sound source direction of the voice instruction of the user and the current sight line direction facing the robot is larger than or equal to a first preset angle and the interaction willingness of the user is low, the robot can turn back, the interaction flow of the user and the robot cannot be interrupted in the process, and better human-computer interaction experience is brought to the user.

It will be appreciated that the electronic device, in order to implement the above-described functions, comprises corresponding hardware and/or software modules for performing the respective functions. The present application is capable of being implemented in hardware or a combination of hardware and computer software in conjunction with the exemplary algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, with the embodiment described in connection with the particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In this embodiment, the electronic device may be divided into functional modules according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module may be implemented in the form of hardware. It should be noted that, the division of the modules in this embodiment is schematic, and is only one logic function division, and another division manner may be available in actual implementation.

In the case of dividing each functional module according to each function, the electronic device such as a robot or a mobile phone according to the above embodiments may include: the device comprises a collecting unit, a detecting unit and a processing unit.

The acquisition unit, the detection unit, and the processing unit cooperate with each other to support an electronic device such as a robot or a mobile phone to perform the above steps, and/or to perform other processes of the techniques described herein.

It should be noted that all relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

The electronic device provided by the embodiment is used for executing the method for playing the video, so that the same effect as the implementation method can be achieved.

In case an integrated unit is employed, the electronic device may comprise a processing module, a storage module and a communication module. The processing module may be configured to control and manage an action of the electronic device, and for example, may be configured to support the electronic device to perform the steps performed by the acquisition unit, the detection unit, and the processing unit. The memory module may be used to support the electronic device in executing stored program codes and data, etc. The communication module can be used for supporting the communication between the electronic equipment and other equipment.

The processing module may be a processor or a controller, among others. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., a combination of one or more microprocessors, a Digital Signal Processing (DSP) and a microprocessor, or the like. The storage module may be a memory. The communication module may specifically be a radio frequency circuit, a bluetooth chip, a Wi-Fi chip, or other devices that interact with other electronic devices.

In an embodiment, when the processing module is a processor and the storage module is a memory, the electronic device according to this embodiment may be a device having the structure shown in fig. 1.

The present embodiment also provides a computer-readable storage medium, in which computer instructions are stored, and when the computer instructions are executed on an electronic device, the electronic device is caused to execute the relevant method steps to implement the method for human-computer interaction in the foregoing embodiments.

The present embodiment also provides a computer program product, which when running on a computer, causes the computer to execute the relevant steps described above, so as to implement the human-computer interaction method in the foregoing embodiments.

In addition, an apparatus, which may be specifically a chip, a component or a module, may include a processor and a memory connected to each other; the memory is used for storing computer execution instructions, and when the device runs, the processor can execute the computer execution instructions stored by the memory, so that the chip can execute the human-computer interaction method in the method embodiments.

The electronic device, the computer-readable storage medium, the computer program product, or the chip provided in this embodiment are all configured to execute the corresponding method provided above, and therefore, the beneficial effects that can be achieved by the electronic device, the computer-readable storage medium, the computer program product, or the chip may refer to the beneficial effects in the corresponding method provided above, and are not described herein again.

Through the description of the foregoing embodiments, those skilled in the art will understand that, for convenience and simplicity of description, only the division of the functional modules is used for illustration, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the device may be divided into different functional modules, so as to complete all or part of the functions described above.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed to a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented as a software functional unit and sold or used as a separate product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of human-computer interaction, the method comprising:

receiving a wake-up word sent by a user, and starting a voice recognition function of the electronic equipment in response to the wake-up word;

acquiring a first voice instruction of the user, and determining a first time interval occupied by the awakening word in a time interval corresponding to the first voice instruction when the first voice instruction is detected to comprise the awakening word;

removing the awakening words in the first time period, and identifying target voice instructions except the awakening words in the first voice instruction;

responding to the target voice instruction and answering.

2. The method of claim 1, wherein the first time period is an end time period, a middle time period or a start time period of the time period corresponding to the first voice instruction.

3. The method according to claim 1 or 2, wherein when the first time period is an end time period of the time period corresponding to the first voice instruction, the method further comprises:

detecting a voice instruction which is closest to the awakening word in the first voice instruction and a time interval from the voice instruction to the awakening word;

and when the time interval is greater than or equal to a first preset value, suspending the current conversation process and responding to the awakening word, and restarting the voice recognition function of the electronic equipment so that the electronic equipment acquires a second voice instruction.

4. A method of human-computer interaction, the method comprising:

acquiring a first voice instruction of a user, and detecting the sound source direction of the first voice instruction according to the first voice instruction;

determining a first angle between the sound source direction of the first voice instruction and a first sight line direction currently faced by the electronic equipment;

when the first angle is larger than or equal to a first preset angle, determining a second angle between the sound source direction of the first voice instruction and the sound source direction of a second voice instruction, wherein the second voice instruction is a voice instruction which is sent by a user before the first voice instruction and is closest to the first voice instruction;

and when the second angle is smaller than or equal to a second preset angle, responding to the first voice instruction by the electronic equipment, and answering.

5. The method of claim 4, further comprising:

detecting a time interval of the first voice instruction and the second voice instruction;

and when the time interval is smaller than or equal to a second preset value, calling a steering execution function, and rotating the electronic equipment to face or infinitely approach the sound source direction of the first voice instruction.

6. The method according to claim 4 or 5, characterized in that the method further comprises:

acquiring a first image of the electronic equipment in the first sight line direction;

when the first image comprises the user and the third angle between the sight line direction of the user and the sound source direction of the first voice instruction is smaller than or equal to the third preset angle, a steering execution function is called, and the electronic equipment is rotated towards or infinitely close to the sound source direction of the first voice instruction.

7. The method of claim 6, further comprising:

the electronic equipment acquires a second image facing to or approaching to the sound source direction of the first voice instruction infinitely;

when the second image does not include the user or a fourth angle between the sight line direction of the user and a current second sight line direction of the electronic equipment is larger than a fourth preset angle, the electronic equipment is rotated to be restored to the first sight line direction.

8. An electronic device, comprising:

one or more processors;

one or more memories;

a module in which a plurality of applications are installed;

the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the method of any of claims 1-7.

9. A computer-readable storage medium having stored thereon computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1-7.

10. A computer program product, characterized in that, when run on a computer, causes the computer to perform the method according to any one of claims 1 to 7.