WO2023245316A1

WO2023245316A1 - Human-computer interaction method and device, computer device and storage medium

Info

Publication number: WO2023245316A1
Application number: PCT/CN2022/099701
Authority: WO
Inventors: 杜琳
Original assignee: 北京小米移动软件有限公司
Priority date: 2022-06-20
Filing date: 2022-06-20
Publication date: 2023-12-28
Also published as: CN117616368A

Abstract

The present disclosure relates to the field of computers, and relates to a human-computer interaction method and device, for solving the problems of human-computer interaction efficiency and accuracy. The method comprises: detecting a line-of-sight direction of a user; under the condition that the line-of-sight direction is not changed within a preset time threshold, starting detection of the head motion; according to head motion information obtained by detecting the head motion, generating a corresponding operation instruction; and executing the operation instruction, and feeding back an operation result to the user. The technical solution provided by the present disclosure is suitable for a non-manual interaction scenario, and realizes efficient and high-accuracy human-computer interaction.

Description

A human-computer interaction method, device, computer device and storage medium

Technical field

The present disclosure relates to the field of computers, and in particular, to a human-computer interaction method, device, computer device and storage medium.

Background technique

In related technologies, human-computer interaction usually relies on buttons, mouse, keyboard, touch screen, voice, etc. In many more complex scenarios, commonly used interaction methods perform poorly in terms of convenience and accuracy. For example, in scenarios such as virtual reality (VR)/augmented reality (AR) applications and remote control of other devices, there are high requirements for the efficiency and accuracy of human-computer interaction.

Although voice interaction can require no hands-on operation, the speaking and recognition process takes a long time, resulting in low efficiency. Eye movement interaction technology can make up for some shortcomings of existing human-computer interaction methods. Human-computer interaction operations such as pointing, movement, and selection can be easily realized through the direction of gaze. However, the parameters that eye movement interaction technology can provide are limited, and it is difficult to adapt to complex Operation scenarios and various activities of the user's body may cause deviations in eye movement data, leading to operational errors.

In summary, there is a lack of an efficient and highly accurate human-computer interaction method that can meet the needs of use in complex scenarios.

Contents of the invention

In order to overcome problems existing in related technologies, the present disclosure provides a human-computer interaction method, device, computer device and storage medium. Through the two dimensions of sight and head movement, combined with the control of human-computer interaction, it is suitable for complex and delicate application scenarios, achieving efficient and high-accuracy human-computer interaction.

According to a first aspect of an embodiment of the present disclosure, a human-computer interaction method is provided, including:

Detect the user’s gaze direction;

When the gaze direction does not change within a preset time threshold, start the detection of head movement;

Generate corresponding operating instructions based on the head movement information obtained from head movement detection;

Execute the operation instructions and feed back the operation results to the user.

In some embodiments, the step of detecting the user's gaze direction includes:

Perform eye tracking on the user to determine the gaze direction, which indicates the position where the user is looking in the gaze coordinate system, which determines the vector direction of the position where the user is gazing with data on the X, Y, and Z axes.

In some embodiments, the step of initiating detection of head motion includes:

Acquire the sensor signal of at least one sensor in the inertial detection unit (IMU), and determine the head movement relative to a fixed coordinate system based on the sensor signal, and the fixed coordinate system determines the head movement with data on the X, Y, and Z axes vector direction.

In some embodiments, the IMU includes at least one accelerometer sensor that measures acceleration signals and at least one gyro sensor that measures angular signals.

In some embodiments, the step of initiating detection of head motion includes:

The head image data is acquired through the image acquisition device, and the head movement is determined relative to a fixed coordinate system that determines the vector direction of the head movement using data on the X, Y, and Z axes.

In some embodiments, the step of initiating detection of head motion includes:

Detect the vibration characteristics of the sound waves of the user’s voice;

Based on the vibration characteristics, the head movement is analyzed.

In some embodiments, gaze direction and/or head movement is detected by the head mounted device.

In some embodiments, the operation results are fed back through the head-mounted device and/or a second device connected to the head-mounted device, and the second device is connected to the head-mounted device through a wired or wireless connection.

According to a second aspect of the disclosed embodiment, the present invention discloses a human-computer interaction device, including:

Gaze detection module, used to detect the direction of the user's gaze;

A motion detection module for initiating the detection of head motion when the direction of vision does not change beyond a preset time threshold;

The instruction generation module is used to generate corresponding operation instructions based on the head movement information obtained by detecting the head movement;

An execution module is used to execute the operation instructions and feed back the operation results to the user.

In some embodiments, the gaze detection module is used to track the user's eyeballs and determine the gaze direction. The gaze direction indicates the position of the user's gaze in the gaze coordinate system. The gaze coordinate system is represented by X, Y, Z. The axis data determines the vector direction of where the user is looking.

In some embodiments, the motion detection module includes:

Sensor detection submodule, used to obtain the sensor signal of at least one sensor in the IMU, and determine the movement of the head relative to the fixed coordinate system based on the sensor signal. The IMU includes at least one accelerometer sensor that measures acceleration signals and At least one gyro sensor for measuring angular signals, the detection of head movement is performed relative to a fixed coordinate system that determines the vector direction of the head movement using data on the X, Y, and Z axes, the fixed coordinate system The system is any one of the following coordinate systems: user head coordinate system, user body coordinate system, and earth coordinate system;

The image detection submodule is used to obtain head image data through an image collection device, and determine head movement based on the head image data.

The sound detection submodule is used to detect the vibration characteristics of the sound wave of the user's voice, and analyze the head movement based on the vibration characteristics.

According to a third aspect of embodiments of the present disclosure, a computer device is provided, including:

processor;

Memory used to store instructions executable by the processor;

Wherein, the processor is configured as:

Detect the user’s gaze direction;

When the gaze direction does not change beyond the preset time threshold, start the detection of head movement;

According to a fourth aspect of an embodiment of the present disclosure, a non-transitory computer-readable storage medium is provided, which when instructions in the storage medium are executed by a processor of a mobile terminal, enables the mobile terminal to execute a human-computer interaction method , the method includes:

Detect the user’s gaze direction;

The technical solution provided by the embodiments of the present disclosure may include the following beneficial effects: first detect the user's gaze direction, and when the gaze direction does not change beyond a preset time threshold, start the detection of head movement, thereby detecting the head movement according to the direction of the user's gaze. The head movement information obtained by head movement detection is used to generate corresponding operation instructions, and then the operation instructions are executed, and the operation results are fed back to the user. Through the two dimensions of sight and head movement, combined with the control of human-computer interaction, it is suitable for complex and delicate application scenarios, achieving efficient and high-accuracy human-computer interaction.

It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure.

Description of the drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

Figure 1 is a flow chart of a human-computer interaction method according to an exemplary embodiment.

Figure 2 is a flow chart of yet another human-computer interaction method according to an exemplary embodiment.

FIG. 3 is a schematic diagram of the coordinate system of the user's head according to an exemplary embodiment.

Figure 4 is a schematic diagram of a user's body coordinate system according to an exemplary embodiment.

Figure 5 is a schematic diagram of a geodetic coordinate system according to an exemplary embodiment.

Figure 6 is a schematic diagram showing the relative relationship between the user's head coordinate system, the user's body coordinate system and the earth coordinate system according to an exemplary embodiment.

FIG. 7 is a schematic diagram of the glasses coordinate system according to an exemplary embodiment.

Figure 8 is a block diagram of a human-computer interaction device according to an exemplary embodiment.

FIG. 9 is a schematic structural diagram of the motion detection module 802 according to an exemplary embodiment.

Figure 10 is a block diagram of a device according to an exemplary embodiment.

Detailed ways

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the appended claims.

Among the commonly used human-computer interaction methods, although voice interaction can require no hands-on operation, the speaking and recognition process takes a long time, resulting in low efficiency. Eye movement interaction technology can make up for some shortcomings of existing human-computer interaction methods. Human-computer interaction operations such as pointing, movement, and selection can be easily realized through the direction of gaze. However, the parameters that eye movement interaction technology can provide are limited, and it is difficult to adapt to complex Operation scenarios and various activities of the user's body may cause deviations in eye movement data, leading to operational errors.

In order to solve the above problems, the present disclosure provides a human-computer interaction method and device. Through the two dimensions of sight and head movement, combined with the control of human-computer interaction, it is suitable for complex and delicate application scenarios, achieving efficient and high-accuracy human-computer interaction.

An exemplary embodiment of the present disclosure provides a human-computer interaction method, which performs human-computer interaction through joint control of line of sight and head movement, and simultaneously detects the user's line of sight direction and head movement information. In the line of sight direction at a preset time, If there is no change within the threshold, the operating instructions are generated based on the head movement and executed and feedback is obtained. The specific process is shown in Figure 1, including:

Step 101: Detect the user's line of sight direction.

In this step, for example, the user's eyeballs can be tracked to determine the gaze direction. The gaze direction indicates the position where the user is looking in the gaze coordinate system. The gaze coordinate system determines the vector direction of the user's gaze position using data on the X, Y, and Z axes. .

Step 102: If the sight direction does not change beyond the preset time threshold, start the detection of head movement.

In this step, when the line of sight direction does not change beyond the preset time threshold, it is determined that the line of sight direction is stable. For example, the time threshold can be preset to 0.8 seconds. When the gaze is locked on the same target for more than 0.8 seconds without changing the gaze, the gaze direction can be determined to be stable.

In this step, the gaze direction can be determined by tracking the user's eyes. For example, the eye image data can be obtained, and then the gaze direction can be determined based on changes in the eye image data.

When the gaze direction is stable, the target to which the gaze direction is predicted is the target of the user's pre-operation. At this time, the detection of head movement can be started to determine the user's operation intention.

Step 103: Generate corresponding operation instructions based on the head motion information obtained by detecting the head motion.

In this step, the operation instructions are generated based on the detected head movement information and the preset instruction rules. For example, the default instruction rule for the "confirm" operation instruction is "nod twice within two seconds", and the instruction rule for the "cancel" operation instruction is "shake your head within one second." Therefore, after detecting the head movement of "nodding twice within two seconds", the "confirm" operation command can be generated, and after detecting the head movement of "shaking the head within one second", the "cancel" operation can be generated instruction.

Step 104: Execute the operation instruction and feed back the operation result to the user.

In this step, the operation instructions are executed, the operation results are obtained, and the operation results are fed back to the user. The feedback operation result can be a confirmation message, such as "cancel operation successfully"; it can also be a response interface to the operation, such as entering the page of the object selected for viewing when the operation instruction indicates "confirm viewing".

An exemplary embodiment of the present disclosure also provides a human-computer interaction method, in which the user's eyeballs are tracked to determine the line of sight direction. The line of sight direction is used to represent the position indicated by the user's gaze in the line of sight coordinate system, where the line of sight coordinate system determines the vector direction of the position of the user's gaze with the numerical values of the X, Y, and Z axes. As shown in Figure 2, it specifically includes the following steps:

Step 201: Perform eye tracking on the user to determine the user's line of sight direction.

As shown in Figure 3, in the line of sight coordinate system, the line of sight coordinate system can be the coordinate system of the user's head, with the center of gravity of the head as the origin, the numerical representation of the X, Y, and Z axes as space point 1, and the distance from the origin to space point 1. The vector direction is the user's visual direction.

As shown in Figure 4, the line of sight coordinate system can also be the user's body coordinate system, with the body center of gravity as the origin, the numerical representation of the X, Y, and Z axes as space point 1, and the vector direction from the origin to space point 1 as the user's visual direction. .

As shown in Figure 5, the line of sight coordinate system can also be a geodetic coordinate system, with a fixed position relative to the ground as the origin, and the numerical representation of the X, Y, and Z axes as space point 1. The vector direction from the origin to space point 1 is The user's visual direction.

When detecting head movement, it is performed relative to a fixed coordinate system, which is any one of the following coordinate systems: user head coordinate system, user body coordinate system, and earth coordinate system. Detecting the motion parameters of the head can include detecting various movements of the head relative to the fixed coordinate system, including nodding (reciprocating rotation on the X axis), shaking the head (reciprocating rotation on the Y and Z axes), and movements under different tilt postures .

Step 202: Obtain a preset time threshold, and when the sight direction does not change beyond the time threshold, start detection of head movement.

The user's eye gaze direction is detected in real time. When the user's gaze direction remains fixed and unchanged for more than a preset time threshold, such as 2 seconds, it can be determined that the gaze direction is stable.

Among them, according to the hardware configuration, any of the following methods can be used for head motion detection:

1) When using a device such as a wearable device equipped with an IMU to detect head movement, obtain a sensor signal from at least one sensor in the IMU, and determine the movement of the head based on the sensor signal. The sensors include, but are not limited to, any one or more of the following sensors: accelerometer, gyroscope, and geomagnetometer.

When using the IMU for detection, the movement of the head can be detected based on the changes in data of the three sensors of the accelerometer, gyroscope, and geomagnetometer in each dimension of the coordinate system relative to time.

2) You can also use a camera or other shooting equipment to obtain head image data through an image acquisition device, and determine the head movement based on the head image data. For example, one or more cameras on a mobile phone or computer are used to capture head image data such as photos of the user's face. According to one implementation, parameters of the user's line of sight direction and head movement, that is, eye image data and head image data, can be detected simultaneously through visual means. When using head image data for detection, a three-dimensional model of the user's head can be established. Then, the head image obtained by a collection device such as a camera is matched to estimate the current posture of the user's head, and the parameters of the head movement are detected based on changes in time. You can also use multiple cameras, or ToF, LiDAR and other sensors to obtain three-dimensional head information, and estimate the current posture and head movement parameters based on the three-dimensional head model.

3) The user's voice characteristics can be detected to determine head movement. For example, the head movement is the movement of the user's vocal cords. By detecting the vibration characteristics of sound waves, and analyzing the head movement based on the vibration characteristics. The voice recognition system can also be used to recognize the voice commands issued by the user and operate the target locked by sight.

Step 203: Generate corresponding operation instructions based on the head motion information obtained by detecting the head motion.

When using a wearable device equipped with an IMU to detect head movement, the obtained head movement information includes the data of the three sensors of accelerometer, gyroscope, and geomagnetometer in each dimension of the fixed coordinate system relative to time. changes to detect head movement.

When using a camera or other shooting equipment to obtain head image data through an image acquisition device, the obtained head movement information includes changes in the head posture and position relative to time in each dimension of the fixed coordinate system. This change indicates the head's Specific movement trajectories, based on which the head movement can be determined.

When detecting the user's voice characteristics to determine head movement, the obtained head movement information includes information indicating whether the vocal cords vibrate; further, the vibration trigger amplitude can also be preset, and the head movement information includes the amplitude of the user's voice. Information, after the amplitude of the sound emitted by the user reaches the vibration trigger amplitude, it is determined that head movement has occurred. It can also be combined with a speech recognition system. The obtained head movement information includes information indicating whether the vocal cords vibrate and the user's voice command information. The user's intention can be determined based on the voice command information.

Step 204: Execute the operation instruction and feed back the operation result to the user.

In this step, the operation results can be fed back in different ways according to the software and hardware configuration in the application scenario. According to one implementation, an operation result containing text and/or images can be formed and displayed on the display screen. The operation results can also be fed back through device vibration, for example, a short vibration once if successful/continuous vibration for two seconds if failed. The operation results can also be played back through voice. The above feedback methods can be combined or applied singly. Those skilled in the art should know that the output methods of operation result information are not limited to the above listed.

According to one embodiment, the feedback of the above-mentioned operation result information can be performed through a head-mounted device and/or a second device connected to the head-mounted device. The second device is connected to the head-mounted device through a wired or wireless connection. device connection. For example, it is displayed through the display screen of the head-mounted device, or displayed through at least one display screen external to the head-mounted device, or displayed on the display screen of the head-mounted device and the external display screen simultaneously.

The above-mentioned line of sight coordinate system can be any one of the following coordinate systems: user head coordinate system, user body coordinate system, and earth coordinate system. When using head-mounted devices such as smart glasses and helmets to detect line of sight, a coordinate system can also be established based on the head-mounted device. As shown in FIG. 6 , it is an example of the relative relationship between the user head coordinate system 601 , the user body coordinate system 602 , and the earth coordinate system 603 .

Among them, when using a wearable device such as a smart glasses device to detect the direction of gaze and/or head movement, the coordinate system constructed by the wearable device is used as the coordinate system of the user's head. The user's gaze direction is detected through the eye tracking module integrated in the smart glasses device, and the user's head movement parameters are detected through the inertial detection unit (IMU) motion sensing module integrated in the glasses device. As shown in Figure 7, the smart glasses device can be used as a reference object to construct the glasses coordinate system as the user head coordinate system.

For the user head coordinate system, the specific position of the user's head can be used as the origin, for example, the center of gravity of the head can be used as the origin; the X, Y, and Z axes are used to calibrate the vector directions therein.

For the user body coordinate system, the specific position of the user's body can be used as the origin, for example, the body center of gravity or the projection of the body center of gravity on the ground as the origin; the X, Y, and Z axes are used to calibrate the vector directions.

The head coordinate system and the user body coordinate system are local coordinate systems relative to the user and are moving coordinate systems. A global coordinate system such as a geodetic coordinate system, which is a static coordinate system, can also be used.

It should be noted that the line of sight coordinate system or fixed coordinate system is mainly used as a reference system for motion detection. The coordinate systems that can be used are not limited to the head, body and earth coordinate systems listed above.

According to one embodiment, the fixed coordinate system is a stationary coordinate system, and the line of sight coordinate system is a moving coordinate system relative to the fixed coordinate system. For example, the fixed coordinate system is the earth coordinate system, and the sight coordinate system is the head coordinate system.

According to an implementation manner, both the fixed coordinate system and the line-of-sight coordinate system are moving coordinate systems. For example, the fixed coordinate system is the user body coordinate system, and the sight coordinate system is the user body coordinate system or head coordinate system.

According to one implementation, both the fixed coordinate system and the line-of-sight coordinate system are static coordinate systems. For example, both are geodetic coordinate systems.

According to an embodiment, the line of sight coordinate system and the fixed coordinate system are the same coordinate system. Of course, the line of sight coordinate system can also adopt a different coordinate system from the fixed coordinate system. The settings of the sight coordinate system and the fixed coordinate system can be customized according to the application environment and user needs, and can be set flexibly to adapt to the hardware configuration, saving costs and improving efficiency.

An exemplary embodiment of the present disclosure also provides a human-computer interaction method that selects different coordinate systems as fixed coordinate systems according to different scenarios when detecting the movement of the user's head.

When the user's body is stationary, the fixed coordinate system is the user's body coordinate system. For example, take the center of the user's feet as the origin, and the sides, top of the head, and front are the X-axis, Y-axis, and Z-axis directions of the coordinate system respectively. At this time, the scene where the direction of sight is basically stable relative to a fixed coordinate system can include the user looking at a stationary object in front of him, looking at a specified position on the screen, etc.

Correspondingly, the line of sight coordinate system can be a head coordinate system, with the specific position of the user's head as the origin, for example, the center of gravity of the head as the origin; and the vector directions therein are calibrated with the X, Y, and Z axes.

Correspondingly, the line of sight coordinate system can be the user's body coordinate system, and the specific position of the user's body can be the origin, for example, the center of gravity of the body or the projection of the body's center of gravity on the ground as the origin; the X, Y, and Z axes are used to calibrate the vector directions therein.

Correspondingly, the line of sight coordinate system can be a geodetic coordinate system, with a fixed position relative to the ground as the origin, the numerical representation of the X, Y, and Z axes as space point 1, and the vector direction from the origin to space point 1 as the user's visual direction. .

When the user's body is moving, such as driving a moving vehicle, the fixed coordinate system may be a geodetic coordinate system. At this time, the scene where the direction of sight is basically stable relative to a fixed coordinate system can include the user looking ahead on the road, looking at the instrument panel in the car, etc.

When the user uses a wearable device such as a smart glasses device, the fixed coordinate system may be the coordinate system of the user's head. Specifically, it may be a coordinate system established with the center of the smart glasses device as the origin.

Preferably, the head coordinate system may be a coordinate system established with the center of the smart glasses device as the origin.

An exemplary embodiment of the present disclosure also provides a human-computer interaction method that detects line of sight and head movement in different ways according to hardware configuration.

For line of sight detection, it can be detected visually, that is, through a camera and other shooting equipment, image data such as eye photos are taken, and analysis is performed to determine the line of sight. Preferably, when detecting the line of sight direction, an infrared lighting module can also be added to obtain a clearer image of the user's eye area under different ambient light conditions.

An exemplary embodiment of the present disclosure also provides a human-computer interaction method configured as a wearable device such as a head-mounted device. The wearable device simultaneously detects the parameters of the user's gaze direction and head movement. Since the user will keep the gaze direction fixed while performing head movements such as nodding during actual use, the change in gaze direction relative to the wearable device is different from that of the wearable device. The head movement direction of the head coordinate system is exactly the opposite, so that joint detection can effectively avoid false detection while ensuring a low missed detection rate, and is more accurate than the traditional single detection method.

An exemplary embodiment of the present disclosure also provides a human-computer interaction device, the structure of which is shown in Figure 8, including:

Gaze detection module 801, used to detect the direction of the user's gaze;

The motion detection module 802 is used to start the detection of head movement when the direction of sight does not change beyond the preset time threshold;

The instruction generation module 803 is used to generate corresponding operation instructions based on the head movement information obtained by detecting the head movement;

Execution module 804 is used to execute the operation instructions and feed back the operation results to the user.

The line of sight detection module 801 is used to track the user's eyeballs and determine the direction of the line of sight. The line of sight direction indicates the position of the user's gaze in the line of sight coordinate system. The line of sight coordinate system determines the user's gaze based on the data of the X, Y, and Z axes. The vector direction of the gaze location.

The structure of the motion detection module 802 is shown in Figure 9, including:

Sensor detection sub-module 901 is used to obtain the sensor signal of at least one sensor in the IMU, and determine the movement of the head relative to the fixed coordinate system based on the sensor signal. The IMU includes at least one accelerometer sensor that measures acceleration signals. and at least one gyro sensor for measuring angular signals, the detection of head movement is performed relative to a fixed coordinate system that determines the vector direction of the head movement using data on the X, Y, and Z axes, the fixed The coordinate system is any one of the following coordinate systems: user head coordinate system, user body coordinate system, and earth coordinate system;

Image detection sub-module 902 is used to obtain head image data through an image acquisition device, and determine head movement based on the head image data relative to the fixed coordinate system;

The sound detection sub-module 903 is used to detect the vibration characteristics of the sound wave of the user's voice, and analyze the head movement based on the vibration characteristics.

Regarding the devices in the above embodiments, the specific manner in which each module performs operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

FIG. 10 is a block diagram of a device 1000 for human-computer interaction according to an exemplary embodiment. For example, the device 1000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.

Referring to Figure 10, the device 1000 may include one or more of the following components: a processing component 1002, a memory 1004, a power component 1006, a multimedia component 1008, an audio component 1010, an input/output (I/O) interface 1012, a sensor component 1014, and communications component 1016.

Processing component 1002 generally controls the overall operations of device 1000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 1002 may include one or more processors 1020 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components. For example, processing component 1002 may include a multimedia module to facilitate interaction between multimedia component 1008 and processing component 1002.

Memory 1004 is configured to store various types of data to support operations at device 1000 . Examples of such data include instructions for any application or method operating on device 1000, contact data, phonebook data, messages, pictures, videos, etc. Memory 1004 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

Power supply component 1006 provides power to various components of device 1000. Power supply components 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 1000 .

Multimedia component 1008 includes a screen that provides an output interface between the device 1000 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action. In some embodiments, multimedia component 1008 includes a front-facing camera and/or a rear-facing camera. When the device 1000 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.

Audio component 1010 is configured to output and/or input audio signals. For example, audio component 1010 includes a microphone (MIC) configured to receive external audio signals when device 1000 is in operating modes, such as call mode, recording mode, and speech recognition mode. The received audio signals may be further stored in memory 1004 or sent via communications component 1016 . In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.

The I/O interface 1012 provides an interface between the processing component 1002 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.

Sensor component 1014 includes one or more sensors for providing various aspects of status assessment for device 1000 . For example, the sensor component 1014 can detect the open/closed state of the device 1000, the relative positioning of components, such as the display and keypad of the device 1000, and the sensor component 1014 can also detect the position change of the device 1000 or a component of the device 1000. , the presence or absence of user contact with the device 1000 , device 1000 orientation or acceleration/deceleration and temperature changes of the device 1000 . Sensor assembly 1014 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 1014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 1016 is configured to facilitate wired or wireless communication between apparatus 1000 and other devices. Device 1000 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 1016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communications component 1016 also includes a near field communications (NFC) module to facilitate short-range communications. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In an exemplary embodiment, apparatus 1000 may be configured by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for executing the above method.

In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions, such as a memory 1004 including instructions, which can be executed by the processor 1020 of the device 1000 to complete the above method is also provided. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A computer device including:

processor;

Memory used to store instructions executable by the processor;

Wherein, the processor is configured as:

Detect the user’s gaze direction;

A non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, enable the mobile terminal to perform a human-computer interaction method, the method includes:

Detect the user’s gaze direction;

Other embodiments of the invention will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary technical means in the technical field that are not disclosed in the present disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the present invention is not limited to the precise construction described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Industrial applicability

In this article, the user's gaze direction is first detected, and when the gaze direction does not change beyond the preset time threshold, the head movement detection is started, and the corresponding head movement information is generated based on the head movement information obtained from the head movement detection. operation instructions, and then executes the operation instructions and feeds back the operation results to the user. Through the two dimensions of sight and head movement, combined with the control of human-computer interaction, it is suitable for complex and delicate application scenarios, achieving efficient and high-accuracy human-computer interaction.

Claims

A human-computer interaction method, characterized by including:

Detect the user’s gaze direction;

When the gaze direction does not change beyond the preset time threshold, start the detection of head movement;

Generate corresponding operating instructions based on the head movement information obtained from head movement detection;

Execute the operation instructions and feed back the operation results to the user.
The human-computer interaction method according to claim 1, wherein the step of detecting the user's line of sight direction includes:

Perform eye tracking on the user to determine the gaze direction, which indicates the position where the user is looking in the gaze coordinate system, which determines the vector direction of the position where the user is gazing with data on the X, Y, and Z axes.
The human-computer interaction method according to claim 1, wherein the step of initiating the detection of head movement includes:

Acquire the sensor signal of at least one sensor in the inertial detection unit IMU, and determine the head movement relative to a fixed coordinate system based on the sensor signal. The fixed coordinate system determines the vector of the head movement with data on the X, Y, and Z axes. direction.
The human-computer interaction method according to claim 3, wherein the IMU includes at least one accelerometer sensor that measures acceleration signals and at least one gyro sensor that measures angle signals.
The human-computer interaction method according to claim 1, wherein the step of initiating the detection of head movement includes:

Head image data is acquired through an image acquisition device, and head movement is determined relative to a fixed coordinate system based on the head image data. The fixed coordinate system determines the vector direction of the head movement using data on the X, Y, and Z axes.
The human-computer interaction method according to claim 1, wherein the step of initiating the detection of head movement includes:

Detect the vibration characteristics of the sound waves of the user’s voice;

Based on the vibration characteristics, the head movement is analyzed.
The human-computer interaction method according to claim 1, characterized in that the line of sight direction and/or head movement is detected through a head-mounted device.
The human-computer interaction method according to claim 1, characterized in that the operation results are fed back through a head-mounted device and/or a second device connected to the head-mounted device, and the second device is connected through a wired or wireless connection. Connect with the headset.
A human-computer interaction device, characterized by including:

Gaze detection module, used to detect the direction of the user's gaze;

A motion detection module for initiating the detection of head motion when the direction of vision does not change beyond a preset time threshold;

The instruction generation module is used to generate corresponding operation instructions based on the head movement information obtained by detecting the head movement;

An execution module is used to execute the operation instructions and feed back the operation results to the user.
The human-computer interaction device according to claim 9, wherein the gaze detection module is used to track the user's eyeballs and determine the gaze direction, and the gaze direction indicates the position of the user's gaze in the gaze coordinate system, so The gaze coordinate system determines the vector direction of the user's gaze position using data on the X, Y, and Z axes.
The human-computer interaction device according to claim 9, wherein the motion detection module includes:

Sensor detection submodule, used to obtain the sensor signal of at least one sensor in the inertial detection unit IMU, and determine the movement of the head relative to the fixed coordinate system based on the sensor signal. The IMU includes at least one acceleration measuring acceleration signal. The sensor and at least one gyro sensor that measures an angular signal are detected relative to a fixed coordinate system that determines the vector direction of the head movement based on data on the X, Y, and Z axes. The fixed coordinate system is any one of the following coordinate systems: user head coordinate system, user body coordinate system, and earth coordinate system;

An image detection submodule, used to obtain head image data through an image acquisition device, and determine head movement based on the head image data relative to the fixed coordinate system;

The sound detection submodule is used to detect the vibration characteristics of the sound wave of the user's voice, and analyze the head movement based on the vibration characteristics.
A computer device, characterized in that it includes:

processor;

Memory used to store instructions executable by the processor;

Wherein, the processor is configured as:

Detect the user’s gaze direction;

When the gaze direction does not change beyond the preset time threshold, start the detection of head movement;

Generate corresponding operating instructions based on the head movement information obtained from head movement detection;

Execute the operation instructions and feed back the operation results to the user.
A non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, enable the mobile terminal to perform a human-computer interaction method, the method includes:

Detect the user’s gaze direction;

When the gaze direction does not change beyond the preset time threshold, start the detection of head movement;

Generate corresponding operating instructions based on the head movement information obtained from head movement detection;

Execute the operation instructions and feed back the operation results to the user.