WO2022228056A1

WO2022228056A1 - Human-computer interaction method and device

Info

Publication number: WO2022228056A1
Application number: PCT/CN2022/085282
Authority: WO
Inventors: 解文博; 赵安; 陈维
Original assignee: 华为技术有限公司
Priority date: 2021-04-30
Filing date: 2022-04-06
Publication date: 2022-11-03
Also published as: CN115268619A

Abstract

The embodiments of the present application provide a human-computer interaction method and a related device, which are used to control a cursor in a display apparatus by means of asynchronous calibration, which can improve the continuity of cursor movement in the display apparatus, thereby improving user experience. In the method, after a camera acquires obtained first image data on the basis of a second sampling frequency in a first time period, a first constraint condition is obtained by performing CV key point recognition on the first image data; a processor can, on the basis of the first constraint condition, calibrate obtained initial motion sensing data acquired on the basis of a first sampling frequency in the first time period, and obtain target motion sensing data; afterwards, the processor then further obtains, according to the target motion sensing data, control information for controlling a display screen. The second sampling frequency is less than the first sampling frequency, that is, the processor performs asynchronous calibration on the initial motion sensing data to obtain the target motion sensing data.

Description

Human-computer interaction method and device

This application claims the priority of the Chinese patent application filed on April 30, 2021 with the application number 202110486465.2 and the title of the invention is "A Human-Computer Interaction Method and Device", the entire contents of which are incorporated by reference in in this application.

technical field

The present application relates to the field of terminal applications, and in particular, to a human-computer interaction method and related equipment.

Background technique

With the development of science and technology, the full-scene immersive experience has become the development trend of terminal equipment in the process of user-computer interaction using terminal equipment.

At present, there are various application scenarios for full-scene immersive experience, such as the application scenario in which a user performs human-computer interaction with a device with a display screen such as a computer, a TV, and a smart screen (or called a large screen).

However, the traditional way of using a remote controller or a mouse and other devices to control to achieve human-computer interaction has been unable to meet the current needs.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a human-computer interaction method and related equipment for controlling a cursor in a display device through asynchronous calibration, which can improve the continuity of cursor movement in the display device, thereby improving user experience.

A first aspect of the embodiments of the present application provides a human-computer interaction method, and the method can be applied to a human-computer interaction system including a motion sensor, a camera, a processor, and a display screen, and the method includes: the motion sensor acquires a first The initial motion sensing data of the first sampling frequency within a period of time, the initial motion sensing data is triggered by the user's limb movements; the camera acquires the first image data of the second sampling frequency within the first period of time , the second sampling frequency is less than the first sampling frequency, and the first image data includes the user's limb movement information; after that, the processor obtains a first constraint condition, the first constraint The first image data is obtained through computer vision CV processing; and the processor calibrates the initial motion sensing data according to the first constraint condition to obtain target motion sensing data; further, the processor Control information is obtained according to the target motion sensing data, and the control information is used to control the display screen.

Based on the above technical solution, after the camera obtains the first image data based on the second sampling frequency in the first time period, the camera obtains the first constraint condition through CV key point recognition, and the processor determines the first constraint condition based on the first constraint condition. The initial motion sensing data obtained based on the first sampling frequency is calibrated within the time period to obtain target motion sensing data, and thereafter, the processor further obtains control information for controlling the display screen according to the target motion sensing data. The second sampling frequency is lower than the first sampling frequency, that is, the processor performs asynchronous calibration on the initial motion sensing data to obtain target motion sensing data. Among them, due to the limitation of hardware computing capacity, the calculation time of the CV identification process is generally much longer than the processing time of the IMU data. Compared with the human-computer interaction method of real-time synchronous calibration, the asynchronous calibration implementation method does not need to wait for a long calculation time. The long CV processing process can effectively avoid problems such as display freezes and display delays, so that the cursor in the display device can be controlled by asynchronous calibration, which can improve the continuity of the cursor movement in the display device, thereby improving user experience. .

It should be noted that the processor and the motion sensor may be installed in the same electronic device, or the processor and the camera may be installed in the same electronic device, or the processor and the display screen may be installed in the same electronic device. There are no restrictions.

In a possible implementation manner of the first aspect, the first constraint condition is obtained by recognizing human skeleton key points in computer vision CV processing on the first image data, and the first constraint condition includes three-dimensional Spatial orientation angle information.

Optionally, the CV processing may be implemented based on a three-dimensional human skeleton recognition technology, or may be implemented based on a two-dimensional human skeleton recognition technology, which is not limited here.

Based on the above technical solution, the first constraint condition for asynchronous calibration of the initial motion sensing data may be three-dimensional space orientation angle information obtained by CV identification processing.

In a possible implementation manner of the first aspect, the processor calibrates the initial motion sensing data according to the first constraint condition to obtain target motion sensing data, which specifically includes: the processor first according to the The first constraint condition is mapped to obtain calibration data, and a first curve is obtained by fitting based on the calibration data; then, the processor is fitted to obtain a second curve according to the initial motion sensing data, and the first curve is fitted by the processor. Perform weighted average processing with the second curve to obtain a third curve; and, the processor determines the calibrated motion sensing data in the third curve; after that, the processor processes according to the attitude calculation algorithm The calibrated motion sensing data is used to obtain the target motion sensing data.

Based on the above technical solution, the target motion sensing data may be data obtained after being processed by an attitude calculation algorithm. In the process of asynchronously calibrating the initial motion sensing data with the first constraint, the initial motion sensing data may be firstly calibrated. The data is calibrated, and then processed by the attitude calculation algorithm to obtain the target motion sensing data.

In a possible implementation manner of the first aspect, the processor calibrates the initial motion sensing data according to the first constraint condition to obtain target motion sensing data, which specifically includes: the processor first settles a settlement according to the posture The algorithm processes the initial motion sensing data to obtain first attitude angle data; then, the processor obtains a fourth curve by fitting according to the first attitude angle data, and obtains a fourth curve by fitting according to the first constraint condition Five curves; after that, the processor performs weighted average processing on the fourth curve and the fifth curve to obtain a sixth curve; further, the processor determines the target movement in the sixth curve sensor data.

Based on the above technical solution, the target motion sensing data may be data obtained after being processed by an attitude calculation algorithm. In the process of asynchronously calibrating the initial motion sensing data with the first constraint, the initial motion sensing data may be firstly calibrated. The data is processed by an attitude calculation algorithm, and after the processing result is obtained, the processing result is calibrated based on the first constraint condition to obtain the target motion sensing data.

In a possible implementation manner of the first aspect, the control information is coordinate data obtained by performing coordinate transformation on the target motion sensing data, and the coordinate data is used to control the display position of the cursor in the display screen, Alternatively, the control information is a gesture identification result obtained by mapping the target motion sensing data, and the gesture identification result is used to operate an interface element of the display screen.

Based on the above technical solution, the control information for controlling the display screen obtained by processing the target motion sensing data obtained by asynchronous calibration can perform various operations on the display screen, such as controlling the display position of the cursor in the display screen, controlling Interface elements in the display, such as selection, zoom, drag, click, etc.

In a possible implementation manner of the first aspect, before the processor calibrates the initial motion sensing data according to the first constraint condition, the method further includes: the processor aligns the first motion sensing data according to a time difference a constraint condition and the initial motion sensing data.

Based on the above technical solutions, due to inherent differences in hardware, there is inevitably a time difference between different devices (such as a device containing a motion sensor, a device containing a camera, etc.), which may cause the cursor displayed on the display screen to be misplaced or The trajectory is not accurate. Therefore, in order to eliminate the adverse effect of the objectively existing time difference, the first constraint condition and the initial motion sensing data may be aligned through the determined time difference, so as to eliminate the effect of the time difference.

In a possible implementation manner of the first aspect, the time difference is calculated through an initialization process before the first time period, and the method further includes: displaying, on the display screen, first prompt information for prompting The user makes a designated limb movement; thereafter, the motion sensor acquires motion sensing data in the initialization process, the motion sensing data in the initialization process being triggered by the designated limb movement made by the user; and , the camera acquires the image data in the initialization process, and the image data in the initialization process includes the specified limb movement information made by the user; further, the processor is based on the movement in the initialization process. The time difference is determined by the signal characteristics of the sensory data and the signal characteristics of the image data during the initialization process.

Based on the above technical solution, during the initialization process, the user is prompted on the display screen to make the user perform a specific limb action, and during the process of the user performing the specific limb action, the motion sensing data in the initialization process is obtained through a motion sensor, The image data in the initialization process is acquired through the camera, and the processor further processes based on the motion sensing data and the image data to determine the time difference.

In a possible implementation manner of the first aspect, the method further includes: the processor determining initial relative information between the user and the camera according to the image data in the initialization process.

Optionally, the initial relative information may include distance, orientation, and the like.

In a possible implementation manner of the first aspect, the processor calibrates the initial motion sensing data according to the first constraint condition, and obtaining the target motion sensing data includes: the processor according to the initial relative information Determine an initial human arm engineering model, the initial human arm engineering model includes at least a first range of limb movement angles; then, the processor updates the first constraint condition according to the initial human arm engineering model to obtain an updated After that, the processor calibrates the initial motion sensing data according to the updated first constraint condition to obtain the target motion sensing data.

Based on the above technical solutions, in the process of human-computer interaction, the user generally does not make physical movements that violate ergonomics. Therefore, the human arm engineering model constructed by the relative information between the user and the display screen can be used for A constraint condition is updated, that is, the first constraint condition is further constrained based on the ergonomic model of the human arm, so as to avoid the problems of inaccurate cursor and cursor overflow.

In a possible implementation manner of the first aspect, the processor updates the first constraint condition according to the initial ergonomic model of the human arm, and the process of obtaining the updated first constraint condition may specifically include: the processor: Determine the first relative information between the user and the camera according to the first image data; then, when the first relative information is different from the initial relative information, the processor determines the first relative information according to the first relative information updating the initial human arm engineering model to obtain a first human arm engineering model; after that, the processor updates the first constraint condition according to the first human arm engineering model to obtain the updated first constraint condition .

Based on the above technical solutions, when the relative information (such as distance, orientation, etc.) between the user and the camera changes, the ergonomic arm model constructed based on the relative information between the user and the camera can also be updated based on different relative information. And use the updated human arm engineering model to further constrain the first constraint condition to ensure the timeliness of the control information to control the display screen.

In a possible implementation manner of the first aspect, obtaining, by the processor, the control information according to the target motion sensing data includes: first, the processor determines, according to the initial relative information, a user's status in the display device initial mapping relationship; then, the processor performs coordinate transformation processing on the target motion sensing data according to the initial mapping relationship to obtain the control information.

Based on the above technical solution, since the user may move during the process of controlling the cursor, the relative information between the user and the camera may change. Therefore, the initial mapping relationship of the user in the display screen can be further determined according to the initial relative information determined in the initialization process, and the initial mapping relationship can be used as the processing basis of the control information, so as to avoid the positioning generated when the relative information is changed. Inaccurate, cursor overflow and other issues.

In a possible implementation manner of the first aspect, the processor performs coordinate transformation processing on the target motion sensing data according to the initial mapping relationship, and obtaining the control information includes: The first image data determines first relative information between the user and the camera; then, when the first relative information is different from the initial relative information, the processor updates the first relative information according to the first relative information. The initial mapping relationship is obtained to obtain a first mapping relationship; after that, the processor performs coordinate transformation processing on the target motion sensing data according to the first mapping relationship to obtain the control information.

Based on the above technical solution, when the relative information (such as distance, orientation, etc.) between the user and the camera changes, the initial mapping relationship determined based on the relative information between the user and the camera can also be updated based on different relative information, and Using the updated mapping relationship to perform coordinate transformation processing on the target motion sensing data to obtain control information, so as to ensure the timeliness of the control information to control the display screen.

In a possible implementation manner of the first aspect, the motion sensor includes a sensing unit of one or more sensors among an accelerometer, a gyroscope, and a magnetometer.

Based on the above technical solution, the motion sensor may be an IMU data acquisition device, wherein the IMU data acquisition device may include a sensing unit of one or more sensors among an accelerometer, a gyroscope, and a magnetometer.

In a possible implementation manner of the first aspect, the camera includes one or more cameras selected from a depth camera and a non-depth camera.

Based on the above technical solutions, the camera may include various implementations, such as a depth camera, a non-depth camera, etc., so that the solution is suitable for different application scenarios.

A second aspect of the embodiments of the present application provides a first electronic device, including a motion sensor and a processor, wherein the motion sensor is used to acquire initial motion sensing data of a first sampling frequency in a first time period, and the The initial motion sensing data is triggered by the user's limb movements; in addition, the processor is configured to calibrate the initial motion sensing data according to the acquired first constraint condition to obtain target motion sensing data, wherein the first The constraint condition is obtained by performing computer vision CV processing on the first image data of the second sampling frequency obtained by the camera in the first time period, the second sampling frequency is less than the first sampling frequency, and the first sampling frequency is smaller than the first sampling frequency. The image data includes the user's limb movement information; further, the processor is further configured to obtain control information according to the target motion sensing data, where the control information is used to control the display content of the display screen; wherein, the camera and the display screen is contained in a second electronic device different from the first electronic device.

Based on the above technical solution, in the first electronic device, the initial motion sensing data of the first sampling frequency in the first time period is obtained by the motion sensor, and the first electronic device also obtains the first The first constraint condition obtained by performing computer vision CV processing on the first image data of the second sampling frequency in the time period, after that, the processor in the first electronic device further based on the first constraint condition The initial motion sensing data obtained at a sampling frequency is calibrated to obtain target motion sensing data, after which the processor further obtains control information for controlling the display screen according to the target motion sensing data. The second sampling frequency is lower than the first sampling frequency, that is, the processor performs asynchronous calibration on the initial motion sensing data to obtain target motion sensing data. Among them, due to the limitation of hardware computing capacity, the calculation time of the CV identification process is generally much longer than the processing time of the IMU data. Compared with the human-computer interaction method of real-time synchronous calibration, the asynchronous calibration implementation method does not need to wait for a long calculation time. The long CV processing process can effectively avoid problems such as display freezes and display delays, so that the cursor in the display device can be controlled by asynchronous calibration, which can improve the continuity of the cursor movement in the display device, thereby improving user experience. .

In a possible implementation manner of the second aspect, the processor is specifically configured to: map and obtain calibration data according to the first constraint condition, and obtain a first curve by fitting based on the calibration data; and then, according to the Fitting the initial motion sensing data to obtain a second curve, and performing weighted average processing on the first curve and the second curve to obtain a third curve; and determining the calibrated motion in the third curve Sensing data, further processing the calibrated motion sensing data according to an attitude calculation algorithm to obtain the target motion sensing data.

In a possible implementation manner of the second aspect, the processor is specifically configured to: process the initial motion sensing data according to an attitude settlement algorithm to obtain first attitude angle data; and then, according to the first attitude angle A fourth curve is obtained by data fitting, and a fifth curve is obtained by fitting according to the first constraint condition, and a weighted average process is performed on the fourth curve and the fifth curve to obtain a sixth curve; The target motion sensing data is determined in the sixth curve.

In a possible implementation manner of the second aspect, the processor is further configured to: align the first constraint condition and the initial motion sensing data according to a time difference.

In a possible implementation manner of the second aspect, the time difference is calculated through an initialization process before the first time period; the motion sensor is further configured to acquire motion sensing data in the initialization process , the motion sensing data in the initialization process is triggered by the specified limb movements made by the user; in addition, the processor is also used for signal characteristics of the motion sensing data in the initialization process and all The time difference is determined by the signal characteristics of the image data in the initialization process, wherein the image data in the initialization process is acquired by the camera in the initialization process, and the image data in the initialization process includes the user The specified body motion information made.

In a possible implementation manner of the second aspect, the processor is further configured to: determine initial relative information between the user and the camera according to the image data in the initialization process.

In a possible implementation manner of the second aspect, the processor is specifically configured to: determine an initial ergonomic model of the human arm according to the initial relative information, where the initial ergonomic model of the human arm includes at least a first movement angle of a limb. value range; then, update the first constraint condition according to the initial human arm engineering model to obtain the updated first constraint condition; thereafter, calibrate the initial motion sensing data according to the updated first constraint condition to obtain the target motion sensing data.

In a possible implementation manner of the second aspect, the processor is specifically configured to: determine first relative information between the user and the camera according to the first image data; and then, in the first relative information When the information is different from the initial relative information, the initial human arm engineering model is updated according to the first relative information to obtain a first human arm engineering model; further, the processor is based on the first human arm engineering model. The first constraint condition is updated to obtain the updated first constraint condition.

In a possible implementation manner of the second aspect, the processor is further configured to: first determine the initial mapping relationship of the user in the display device according to the initial relative information; The target motion sensing data is subjected to coordinate conversion processing to obtain the control information.

In a possible implementation manner of the second aspect, the processor is specifically configured to: determine first relative information between the user and the camera according to the first image data; and then, in the first relative information When the information is different from the initial relative information, update the initial mapping relationship according to the first relative information to obtain a first mapping relationship; further, coordinate the target motion sensing data according to the first mapping relationship The conversion process is performed to obtain the control information.

In a possible implementation manner of the second aspect, the motion sensor includes one or more sensing units of an accelerometer, a gyroscope, and a magnetometer.

It should be noted that the motion sensor and the processor included in the first electronic device in the second aspect can also perform the implementation process in the first aspect and any possible implementation manner thereof, and achieve corresponding beneficial effects, which are not described here. Repeat them one by one.

A third aspect of the embodiments of the present application provides a second electronic device, including a camera and a display screen; wherein the camera is used to acquire first image data of a second sampling frequency in a first time period, and the first image The data includes user limb movement information; wherein the first image data is used to determine a first constraint condition, and the first constraint condition is used to calibrate initial motion sensing data to obtain target motion sensing data, and the initial motion The sensing data is obtained by sampling the motion sensor in the first electronic device based on a first sampling frequency within the first period of time, and the initial motion sensing data is triggered by the user's limb movement; the second sampling frequency is less than the first sampling frequency; thereafter, the display screen is used to display control information, wherein the control information is obtained based on the target motion sensing data.

Based on the above technical solution, in the second electronic device, the first image data of the second sampling frequency obtained by the camera in the first time period is used to determine the first constraint condition, and the first image data is used to determine the first constraint condition. The constraint condition may be to calibrate the initial motion sensing data obtained based on the first sampling frequency within the first time period to obtain target motion sensing data, and then further obtain the target motion sensing data for controlling the display screen according to the target motion sensing data. control information, so that the display screen displays the control information. Wherein, the second sampling frequency is lower than the first sampling frequency, that is, the target motion sensing data is obtained by asynchronously calibrating the initial motion sensing data. Among them, due to the limitation of hardware computing capacity, the calculation time of the CV identification process is generally much longer than the processing time of the IMU data. Compared with the human-computer interaction method of real-time synchronous calibration, the asynchronous calibration implementation method does not need to wait for a long calculation time. The long CV processing process can effectively avoid problems such as display freezes and display delays, so that the cursor in the display device can be controlled by asynchronous calibration, which can improve the continuity of the cursor movement in the display device, thereby improving user experience. .

In a possible implementation manner of the third aspect, the first constraint condition is obtained by recognizing human skeleton key points in computer vision CV processing on the first image data, and the first constraint condition includes three-dimensional Spatial orientation angle information.

In a possible implementation manner of the third aspect, the control information is coordinate data obtained by performing coordinate transformation on the target motion sensing data, and the coordinate data is used to control a cursor on the display screen or, the control information is a gesture identification result obtained by mapping the target motion sensing data, and the gesture identification result is used to operate the interface element of the display screen.

In a possible implementation manner of the third aspect, the display screen is further configured to display first prompt information, which is used to prompt the user to make a specified physical action; in addition, the camera is also configured to display first prompt information at the first time The image data in the initialization process is acquired in the initialization process before the segment, and the image data in the initialization process includes the specified body motion information made by the user; wherein, the signal characteristics of the image data in the initialization process determining the time difference with the signal characteristics of the motion sensing data during the initialization, the time difference being used to align the first constraint and the initial motion sensing data, the motion sensing data during the initialization is acquired by the second electronic device during the initialization process,

In a possible implementation manner of the third aspect, the camera includes one or more cameras selected from a depth camera and a non-depth camera.

It should be noted that the camera and display screen included in the second electronic device in the third aspect can also perform the implementation process in the first aspect and any possible implementation manner thereof, and achieve corresponding beneficial effects, which are not the same here. One more elaboration.

A fourth aspect of an embodiment of the present application provides an electronic device, including a processor, where the processor is coupled to the memory; the memory is used to store a program; the processor is used to execute a program in the memory The program causes the execution device to execute the human-computer interaction method described in the above aspects.

It should be noted that the motion sensor mentioned in the above-mentioned human-computer interaction method can be integrated in the electronic device, or can be independently provided outside the electronic device and connected to the electronic device in a wired/wireless manner. Do limit. Similarly, the camera mentioned in the above-mentioned human-computer interaction method may be integrated in the electronic device, or independently provided outside the electronic device and connected to the electronic device in a wired/wireless manner, which is not limited here. Similarly, the display screen mentioned in the above-mentioned human-computer interaction method can be integrated in the electronic device, or can be independently provided outside the electronic device and connected to the electronic device in a wired/wireless manner, which is not limited here. .

A fifth aspect of the embodiments of the present application provides a computer program, which, when running on a computer, enables the computer to execute the human-computer interaction method described in the first aspect and any implementation manner thereof.

A sixth aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the first aspect and any implementation thereof method of human-computer interaction.

A seventh aspect of an embodiment of the present application provides a circuit system, where the circuit system includes a processing circuit, and the processing circuit is configured to execute the human-computer interaction method described in the first aspect and any implementation manner thereof.

An eighth aspect of an embodiment of the present application provides a chip system, where the chip system includes a processor, configured to support implementing the functions involved in the first aspect and any implementation manner thereof, for example, sending or processing the functions involved in the foregoing method. the data and/or information involved. In a possible design, the chip system further includes a memory for storing necessary program instructions and data of the server or the communication device. The chip system may be composed of chips, or may include chips and other discrete devices.

Description of drawings

Fig. 1 is a schematic diagram of human-computer interaction realization;

2 is a schematic diagram of an application scenario in an embodiment of the present application;

3 is a schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

4 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

FIG. 5 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

6 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

FIG. 7 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

8 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

FIG. 9 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

10 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

FIG. 11 is another schematic diagram of a human-computer interaction method provided by an embodiment of the present application;

12 is a schematic diagram of a first electronic device according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a second electronic device according to an embodiment of the present application.

Detailed ways

With the development of science and technology, the full-scene immersive experience has become the development trend of terminal equipment in the process of user-computer interaction using terminal equipment. Among them, there are various application scenarios for full-scene immersive experience, such as extended reality (XR) through virtual reality (VR), augmented reality (AR) or mixed reality (MR), etc. ) technology for human-computer interaction, or an application scenario for human-computer interaction with devices with display screens such as computers, TVs, and smart screens (or large screens). The traditional way of using remote control or mouse and other devices to control to achieve human-computer interaction can no longer meet the current needs.

In order to achieve a full-scene immersive experience, a series of improvements can be made to the way of human-computer interaction by using multi-device fusion technology, starting with user experience friendliness, device use fluency, and ease of use. Among them, the user's limb movement is a direct and convenient input method, and a wearable device with sensors such as an inertial measurement unit (IMU) can be used as a medium to collect the user's limbs (for example, hand, wrist, wrist, etc.). It can also use the user's body movement information carried by the image or video captured by the camera as feedback to identify the user's operation intention, so that the user can realize human-computer interaction in a clearer and smoother way.

Take the process of human-computer interaction between a user and a device with a display screen through an air mouse mode (also referred to as an air mouse mode, an air operation mode, etc.) as an example. In general, the air mouse mode can refer to the wireless mouse or wireless control device by adding sensors (such as gyroscope, 3-dimension gravity sensor, 3D-Gsensor, etc.) to the wireless mouse or wireless control device. A human-computer interaction mode in which the cursor on the display screen can follow the movement of the user's limbs in the air without being placed on a fixed desktop.

Further, the air mouse mode can also be extended to a human-computer interaction mode in which a terminal device (such as a wearable device such as a watch and a wristband) is used to control the cursor on the display screen (for operations such as moving, dragging, zooming in, clicking, etc.). Wherein, the above-mentioned cursor may be an icon or figure of any shape, size, and transparency. At present, the air mouse mode mainly includes the following mainstream implementation methods:

Method 1: Use a wearable device with an IMU to recognize the user's body movements, and control the cursor on the display screen according to the recognition result, so as to realize human-computer interaction.

Specifically, the main components included in the IMU include a gyroscope, an accelerometer, and a magnetometer. Among them, the gyroscope can detect the angular velocity signal of the wearable device relative to the navigation coordinate system (such as the ground-fixed coordinate system, the geographic coordinate system, etc.), and the accelerometer can detect the three-axis of the wearable device in the carrier coordinate system (such as the wearable device The origin of the coordinates is the origin of the center of the carrier, and the three axes include the acceleration signals along the left and right directions of the carrier, the front and rear axes of the carrier, and the up and down axes of the carrier), and the magnetometer can obtain the information of the magnetic field around the smart watch.

The main function of the IMU is to fuse the data of the three sensors, the gyroscope, the accelerometer and the magnetometer, and obtain more accurate attitude information through the processing of the attitude calculation algorithm, and recognize the user's limb movements based on the attitude information. Generally, the attitude calculation algorithm may include a mahony algorithm, a Kalman filter algorithm, and the like. This implementation method requires less computing power and has good real-time performance, which makes the cursor position on the display screen refresh quickly and track smoothly.

In method 1, there is a certain spatial translation tracking error in the IMU calculation process, and the attitude calculation algorithm cannot eliminate this error in the calculation process of continuous integration, which is easy to cause the accumulation of this error and cause serious errors in the cursor position of the display screen. drift. Therefore, the spatial translation of the user's limb movements cannot be accurately tracked using the IMU alone.

The second method is to recognize specific gestures in the user's body movements through computer vision (computer vision, CV) recognition technology, and control the cursor on the display screen according to the recognition result, so as to realize human-computer interaction.

Specifically, first collect relevant information of the user's body movements through various devices, such as specific gestures performed by the user (eg, swipe up, swipe down, etc.), and control the cursor on the display screen according to the specific gesture. For example, various devices may include cameras (e.g., depth cameras or non-depth cameras, etc.) and/or other sensors (e.g., photosensors in photoplethysmography (PPG), infrared, radar, etc.). Here, the camera is taken as an example of a depth camera. The depth camera can collect image information of a specific gesture performed by the user, and the processor uses a pre-established image recognition model to perform CV recognition based on the image information to obtain the first judgment result; other sensors collect The sensor signal of the specific gesture performed by the user, the processor uses the pre-established sensor recognition model to recognize based on the sensor signal, and obtains the second determination result; after that, the processor performs fusion processing based on the first determination result and the second determination result , after determining the specific gesture performed by the user, the processor controls the cursor in the display screen according to the control operation corresponding to the specific gesture. The image recognition model and the sensor recognition model both include the correspondence between specific gestures and control operations on the display screen.

In the second method, since the processing process of the depth camera is compared with the processing process of other sensors, the calculation amount of the former is much larger than that of the latter, resulting in a large gap between the calculation time of the two. That is to say, the control operation of gesture recognition depends on the long processing time of the depth sensor for CV recognition, which is easy to cause problems such as the cursor display freeze and display delay corresponding to the control operation of gesture recognition, resulting in poor user experience.

Mode 3: Integrate the solution using IMU positioning in mode 1 and the solution using CV recognition in mode 2 to realize a human-computer interaction mode of real-time calibration.

Specifically, in the third mode, the device including the IMU identifies the gesture information of the user's limb movements to obtain the initial positioning information; at the same time, by including a camera (similar to the second mode, the camera may be a depth camera or a non-depth camera). , where the camera is used as a depth camera as an example) to identify and locate the image information of the user's limb movements to obtain calibration information; after that, the initial positioning information is calibrated according to the calibration information to obtain a calibration result, and based on the calibration result, the display The cursor on the screen is operated to realize human-computer interaction. The method can reduce the spatial translation tracking error existing in the IMU calculation process through the calibration process, and realize real-time tracking of the user's limb movements.

Exemplarily, the implementation process shown in FIG. 1 is used as an example to illustrate the human-computer interaction mode of the real-time calibration.

As shown in Figure 1, when using the multi-device real-time fusion technology for human-computer interaction, the following steps are included.

S1. Boot initialization;

S2. The user operates the device containing the IMU to form the movement trajectory of the air mouse, so that the device containing the IMU can collect the IMU data;

S3. Collect image data through the depth camera, and perform CV recognition processing based on the image data, and the obtained CV recognition result is used as the basis for real-time calibration in step S4;

S4. In conjunction with the CV identification result obtained in step S3, the IMU data obtained by the device comprising the IMU in step S2, track and determine the offset angle data in real time;

S5. Coordinate transformation is performed on the offset angle data obtained in step S4 to obtain coordinate data mapped to the screen;

S6. display the coordinate data (X, Y) obtained in step S5 on the screen in real time, or, instead of displaying the coordinate data (X, Y), at the screen position corresponding to the coordinate data (X, Y), Display the cursor (where the cursor can be an icon/graphic/image of any size, shape, transparency).

However, in the implementation process shown in Figure 1, since the response time of the IMU data collection is different from the response time of the CV recognition processing, the implementation process of Mode 3 has serious delay and lag, the cursor position is unstable, and the user experience performance is not good.

Obviously, although the third method can solve the problem of inaccurate positioning caused by using the IMU only for cursor positioning to a certain extent, the processing process of the CV recognition by the device including the depth camera is different from that of the device including the IMU to the IMU data. Compared with the processing process, the calculation amount of the former is much larger than that of the latter, resulting in a large gap between the calculation time of the two. That is to say, each frame displayed by the cursor on the display screen is completely dependent on the real-time calibration of CV, and is limited by the hardware computing capability. The duration of each calculation (usually a few milliseconds or more than ten milliseconds) of the air mouse movement trajectory is formed based on the data collected by the IMU, so that the human-computer interaction method of real-time calibration (ie method 3) needs to wait for the CV identification process for a long time. .

Exemplarily, the processing time of the IMU data is 10 milliseconds each time, and the processing time of each CV identification process is 200 milliseconds. In the interval, the duration interval is denoted as (0,1000], and the duration interval unit described here and the subsequent description is milliseconds. In the duration interval corresponding to one second in this example, the CV identification result in step S3 is ( 200, 400, 600, 800, 1000) of the CV key point data at a total of 5 moments, in order to realize the synchronous calibration of the IMU data based on the CV identification result, it is necessary to limit the moment when the IMU data is collected in step S2 to be the same as the 5 moments. Make in step S6, The cursor data displayed in real time on the screen is only 5 times of data, that is, the refresh frequency of the cursor on the screen can only be the same as the CV recognition frequency, which is at most 5 Hz (Hz), which is likely to cause problems such as cursor display freezes and display delays. lead to poor user experience.

In addition, in ways 2 and 3, in the process of collecting different data for different devices (such as a device including a depth camera and a device including an IMU), since the sampling accuracy of different devices may be different, the data collected by different devices may correspond to For different time stamps, there is generally a millisecond-level time difference. This time difference can easily lead to inaccurate calibration in Mode 3, resulting in misplaced display of the cursor on the display screen, which is also one of the reasons for poor user experience.

In order to solve the above problem, the embodiments of the present application provide a human-computer interaction method and related equipment, which are used to control the cursor in the display device through asynchronous calibration, which can improve the continuity of the cursor movement in the display device. Thereby improving the user experience.

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

In this embodiment and subsequent embodiments, the motion sensor is only taken as an example of an IMU data acquisition device. Obviously, the motion sensor may also be other devices, such as an accelerometer data acquisition device, a gyroscope data acquisition device, and a magnetometer data acquisition device. device, or other devices, which are not limited here.

It can be understood that the display screen can be included in the display device, and the display device can also include a base for carrying the display screen, physical buttons or touch screen buttons for controlling the parameters of the display screen (such as brightness, contrast, etc.) The power supply of electrical energy, the wired/wireless communication module for transmitting control instructions for the display screen, etc. are not limited here, and the display device mentioned in the following embodiments is mainly used to realize the display function of the display screen.

It can be understood that the camera may be included in the image acquisition device, and the image acquisition device may also include a power supply for providing power to the camera, a wired/wireless communication module for transmitting control instructions for the camera, etc., which are not limited here. The image acquisition device mentioned in the following embodiments is mainly used to realize the shooting (image or video) function of the camera.

Please refer to FIG. 2 , which is a schematic diagram of an application scenario of an embodiment of the present application, where the application scenario at least includes an image acquisition device, a display device, and an IMU data acquisition device. Among them, the IMU data acquisition device is used to collect IMU data, and can be included in terminal devices with IMU, such as mobile phones, remote control devices (such as remote controls, handles, etc.), tablet computers, and wearable devices (such as smart watches, smart bracelets, etc. )Wait. The display device is a picture output device, which can be included in a device with a display screen, such as a computer, a TV, a smart screen (or a large screen), etc. The image acquisition device is used to collect image data (including one or more frames of image information, Or a video stream containing multiple frames of images, etc.), which can be a camera, such as a depth camera, a non-depth camera, or other image acquisition devices, which are not limited here.

Optionally, the image capturing device and the display device may be integrated in the same device, such as a computer, a TV, a smart screen (or a large screen), and the like.

In addition, in the embodiment of the present application, the electronic device for executing the asynchronous calibration process in the human-computer interaction method includes a processor, and the electronic device may have multiple implementations. For example, the electronic device may be a device including an IMU data acquisition device, and is connected to one or more devices including an image capture device and/or a display device in a wired/wireless manner; or, the electronic device may also be a device including a display screen equipment, which is connected to one or more devices including an image acquisition device and/or an IMU data acquisition device by wired/wireless means; One or more devices including a display device and/or an IMU data acquisition device are connected; alternatively, the electronic device may also be other devices, that is, a device that does not include an image capture device, a display device, and an IMU data acquisition device (such as smart speakers, robot, server, computing center, etc.), connected with one or more devices including an image acquisition device, a display device and/or an IMU data acquisition device through a wired/wireless manner. The electronic device can receive data sent by one or more devices and send data to one or more devices in a wired/wireless manner, and the processor of the electronic device can process the data collected and received by the device.

In a typical application scenario, a user interacts with a large screen by holding a mobile phone. In this scenario, the mobile phone is a device that includes an IMU data acquisition device, and the large screen is a device that includes a display device. The mobile phone and the large screen respectively execute the relevant steps in the human-computer interaction method provided by the embodiments of the present application, so that the user can hold the mobile phone to make physical actions and drive the mobile phone to move, and the large screen responds to the trajectory of the mobile phone movement, and the corresponding response on the large screen can be realized. The position shows the effect of the cursor.

In the above typical application scenarios, since the performance of the processor of the mobile phone is usually better than that of the processor of the large screen, the asynchronous calibration process in the human-computer interaction method provided by the embodiment of the present application can preferably be processed by the mobile phone. device executes. Of course, the asynchronous calibration process may also be performed by a processor of a large screen, or performed by a processor of other devices (eg, a server, a computing center, etc.), which is not limited in this application.

In addition, the application scenarios of the human-computer interaction method provided by the embodiments of the present application also include but are not limited to: the user wears a watch/bracelet to interact with the large screen, the user wears a motion sensor to interact with the large screen, the user holds a remote control and the head wears interactive display devices. It should be understood that any combination of any device including an IMU data acquisition device and any device including a display device can use the human-computer interaction method provided by the embodiments of the present application to perform human-computer interaction.

It should be noted that, in the following embodiments, the electronic device is a device equipped with an IMU data acquisition device, and the display device of the image acquisition device is integrated with another device (such as a computer, a TV, a smart screen, etc.) as an example, that is, The electronic device includes the IMU data acquisition device, and communicates with another device through a wired/wireless connection to acquire image data acquired by the image acquisition device or processed by the image acquisition device according to the acquired image data. The processing result enables the electronic device to obtain and send the control information for controlling the cursor in the display device to the display device through the implementation process of the human-computer interaction method provided by the present application.

In the application scenario shown in FIG. 2 , the user's hand (including fingers, wrist, palm, palm, etc.) carries the device including the IMU data acquisition device in a handheld or wearable manner. The user moves his hand in the shooting area of the image acquisition device, which drives the IMU data acquisition device to move. In this process, the IMU data collection device collects IMU data, the image collection device collects image data, the electronic device obtains the IMU data and image data, and performs fusion processing to obtain air mouse data, which can then be displayed on the display device based on the air mouse data. Display the cursor. Therefore, the user can interact with the display device in the air mouse mode by moving the hand. For example, the user can select, move, drag, zoom in, and click on interface elements in the display interface of the display device in the air.

As mentioned earlier, due to the inherent differences in hardware, there is inevitably a time difference between different devices, and this time difference is usually in the order of microseconds and milliseconds. The existence of such a time difference will have a great impact on the asynchronous calibration process in the human-computer interaction method provided in the embodiment of the present application, so that the final displayed cursor is misplaced or the trajectory is inaccurate.

Therefore, in order to eliminate the adverse effect of the objectively existing time difference, before implementing the human-computer interaction method provided by the embodiment of the present application, initialization may be performed first, and the time difference may be obtained by calculation. Then, in the subsequent implementation of the human-computer interaction method provided by the embodiment of the present application, according to the time difference calculated in the initialization process, the data collected by the IMU data collection device and the data collected by the image collection device are aligned, so that the final displayed cursor is not in place, The track is accurate.

The above initialization process may be a process in which the user performs a specified physical action facing the display device. For example, as shown in Figure 4, when a user wears a watch, the display screen displays the text "please face the screen" to prompt the user to adjust the position relative to the display screen, and the display screen displays the text "please draw a W curve" to prompt the user to make Specifying a body motion, the user moves the arm, drawing a W curve in the air with the arm. Therefore, the watch processor can calculate the inherent time difference between the IMU and the camera hardware according to the IMU data collected during the user's drawing of the W curve and the image data collected by the camera (or the recognition result obtained based on the image data).

Taking the above application scenario as an example, the user carries an electronic device with an IMU data acquisition device by hand or wears it, and when the user is in the shooting area of the image acquisition device, the person who enters (or disconnects and reconnects) the display device for the first time. During the computer interaction process, the electronic device can realize the initialization process through the human-computer interaction method shown in the following Figure 3, so as to align the time information (also called time stamp or time axis) between the IMU data acquisition device and the image acquisition device. . The implementation shown in FIG. 3 can be used to avoid problems such as inaccurate calibration caused by the time difference between data collected by different devices in the aforementioned second and third modes, resulting in misplaced cursor display.

The following will first describe the implementation process shown in FIG. 3 in detail.

Please refer to FIG. 3 , which is a schematic flowchart of a human-computer interaction method according to an embodiment of the present application. The method includes the following steps.

S101. Determine initialized IMU data and initialized image data within a first duration.

In this embodiment, in the first duration corresponding to the initialization process, when the user carries the electronic device equipped with the IMU data acquisition device in a handheld or wearable manner, and the user is in the shooting area of the image acquisition device, the electronic device passes the IMU data The acquisition device acquires the initialized IMU data by acquiring the user's limb movements within the first time period, and the image acquisition device acquires the initialization image data by acquiring the user's limb motion in the first period of time.

Wherein, in step S101, the electronic device may receive the initialization image data through wired/wireless communication with the image acquisition device.

It should be noted that, in step S101, the sampling frequencies of the two devices, the IMU data acquisition device and the image acquisition device, may be the same, for example, the sampling frequencies are both 100 hertz (Hz), that is, in step S101, the two devices are in IMU data of 100 moments and image data of 100 moments can be collected within any second in the first duration. Alternatively, in step S101, the sampling frequencies of the IMU data acquisition device and the image acquisition device may also be different, for example, the sampling frequencies of the IMU data acquisition device are both 100 hertz (Hz) and the sampling frequency of the image acquisition device is 5 Hz, that is, in step S101 , the two devices can acquire the IMU data of 100 moments and the image data of 5 moments within any second of the first duration.

S102. Determine the difference between the timestamp of the IMU data acquisition device and the timestamp of the image acquisition device according to the initialization IMU data and the initialization image data.

In this embodiment, the electronic device determines the difference between the time stamp of the IMU data acquisition device and the time stamp of the image acquisition device according to the initialized IMU data and the initialized image data obtained in step S101 in step S102, after the initialization process , the time stamps of the IMU data acquisition device and the image acquisition device can be aligned according to the difference.

In a possible implementation manner, in step S102, the electronic device may analyze the signal characteristics of the initialization image data and the initialization IMU data according to the initialization IMU data and the initialization image data, and may calculate the fluctuation frequency to determine the difference value, The difference may also be determined by regression linear fitting, which is not limited here.

Exemplarily, the electronic device determines the difference by calculating the fluctuation frequency in step S102 as an example for description. Wherein, the electronic device may obtain the IMU data fluctuation curve within the first time period according to the initialized IMU data, and determine the peak position (and/or the wave trough position) in the IMU data fluctuation curve; The CV key point detection result determines the CV data fluctuation curve within the first time period, and determines the peak position (and/or the wave trough position) in the CV data fluctuation curve, and the CV key point detection result includes the user holding or wearing the electronic device. The three-dimensional orientation angle of the device's location (eg, left wrist, right wrist, etc.). Thereafter, the electronic device compares the time information of the peak position (and/or the trough position) in the IMU data fluctuation curve with the time information of the peak position (and/or the trough position) in the CV data fluctuation curve, and obtains the difference in time information. The value is the difference between the timestamp of the IMU data acquisition device and the timestamp of the image acquisition device. Therefore, after step S102, the IMU data collected by the IMU data collection device and the image data of the image collection device can be aligned according to the difference.

It should be noted that, in the above example, the image acquisition device may perform CV key point recognition on the initialization image to obtain the CV key point detection result, and send the CV key point detection result to the electronic device in step S101, or, image acquisition The apparatus sends the initialization image to the electronic device in step S101, so that the electronic device performs CV key point recognition on the initialization image to obtain a CV key point detection result, which is not limited herein.

In a possible implementation manner, when the user performs the initialization process when the user enters the air mouse operation for the first time in step S101, the electronic device may determine that the user enters the initialization process based on a variety of different triggering methods, and executes steps S101 and S102, For example, the initialization process may be triggered based on the user's specific limb movements (such as left swipe, right swipe, etc.) acquired by the IMU data acquisition device, or may be triggered when the user is in the shooting area of the image acquisition device, or the image acquisition It is triggered when communication is established between the device and the IMU data acquisition device, or other triggering methods, which are not limited here.

In a possible implementation manner, after step S101, the electronic device may further determine initial relative information between the user and the image acquisition device according to the initialized IMU data and the initialized image data, where the initial relative information may include parameters such as distance and orientation .

Exemplarily, the CV key point identification process can obtain the shoulder width through the two CV key points of the left shoulder and the right shoulder, which is used to determine the distance between the user and the image acquisition device, and the distance between the user and the image acquisition device. relative orientation. Alternatively, the distance between the user and the image capture device and the relative orientation between the user and the image capture device are determined by using facial key points, such as eye spacing, ear spacing and other parameters. For example, the electronic device can obtain a certain proportional relationship according to any frame of images collected during the initialization process. For example, the default body rotation will not cause the head width to change. During the initialization process, the ratio of the user's head width to the shoulder width is A preset proportional coefficient, and the subsequent proportional coefficient changes according to the change of shoulder width, resulting in the change of orientation.

The implementation process of determining the initial relative information will be exemplarily described below.

Example 1

The determination of the relative initialization information can be achieved by the electronic device executing the implementation process shown in step S101. The user's hands (including fingers, wrists, palms, palms, etc.) carry the equipment equipped with the IMU data acquisition device by holding or wearing, and the user is in the shooting area of the image acquisition device. , the video stream includes multiple frames of user images, and thereafter, CV key point detection is performed on the multiple frames of user images captured by the image acquisition device to obtain initial relative information. For example, the initial relative information can be used as a calibration reference value for relative information (such as distance and orientation), and at any time after the initialization process, if the acquired relative information is different from the calibration reference value (such as due to the user walking or turning around, etc.) trigger), it is determined that the relative information (such as distance and orientation) between the user and the image capture device has changed.

Specifically, take the electronic device provided with the IMU data acquisition device as a wearable device including a smart watch including an IMU as an example. The user wears the smart watch and stands directly in front of the display screen. By drawing the preset motions set by the system in the air, including but not limited to W-curve, O-curve and other similar axisymmetric preset actions, the image acquisition device captures the gesture video stream data (including at least the first one). image and second image), obtain the user's initial state standing distance, body orientation initialized to the front, shoulder width, head size and other parameters through CV key point detection as the initial image data, and synchronously align the IMU corresponding to the preset actions collected by the IMU data.

For example, the initialization process can be implemented by the example shown in FIG. 4 . The initialization process includes: the display screen displays “please face the screen” through the interface, and when the image capture device detects that the user is facing the screen, it can remind the user to perform the initialization operation, That is, "please draw the W curve" is displayed on the interface; after that, after the image acquisition device detects that the user has drawn the W curve in the air, the image acquisition device acquires the initial image data of the user in the process of drawing the W curve, and the IMU data acquisition device The user's IMU data is collected, and the initial image data and the IMU data are aligned to initialize the cursor position.

For another example, the initialization process can be implemented by the example shown in FIG. 5 , and the initialization process includes: when the image acquisition device detects the user, the user faces the screen by default, and then the user can be reminded to perform the initialization operation, and the display screen displays “Please "Draw W curve facing the screen"; after that, after the image acquisition device detects that the user has drawn the W curve in the air, the image acquisition device collects the initial image data of the user in the process of drawing the W curve, and the IMU data acquisition device collects the user's IMU data, and align this initial image data with the IMU data to initialize the cursor position.

Embodiment 2

The electronic device can realize the determination of the initialization relative information without performing the implementation process shown in step S101. Among them, the user stands in front of the screen of the display device, boots into the air mouse mode, and uses a monocular image acquisition device (such as a monocular camera) ranging technology, which can specifically calculate the actual coordinates of the corresponding pixel based on a similar triangular ratio, and initialize the model. Parameters such as distance and body shape are aligned with the IMU measurement data of the watch and the initial image data of the display device.

Specifically, the user can enter the air mouse mode without initializing gestures. The monocular image acquisition device ranging technology has relatively high requirements for camera calibration, and requires that the distortion caused by the lens itself is relatively small, but in general this method can be used. The portability and practicability are strong, and the method can also achieve accurate estimation of initialization parameters. Wherein, compared with the first embodiment, in the second embodiment, the accuracy of the ranging may be sacrificed without the initialization process, but the depth image acquisition device can be used to make up for the accuracy.

That is, in the second embodiment, the monocular image acquisition device is directly modeled in the initialization stage, and the user's position distance and body shape parameters (ie, orientation) are obtained through the ranging principle of the monocular camera. In the second embodiment, it is not necessary to directly use CV to initialize key points in the gesture video stream to obtain user initialization parameters, which reduces the complexity of operations and saves hardware memory and computing costs.

Optionally, after the initialization process shown in FIG. 3 , that is, after aligning the time information of the IMU data acquisition device and the image acquisition device, the electronic device can execute the human-computer interaction method shown in FIG. 6 to achieve asynchronous calibration. Controlling the cursor in the display device can improve the continuity of the cursor movement in the display device, thereby improving user experience.

Optionally, the electronic device can also directly execute the human-computer interaction method shown in FIG. 6 without going through the initialization process shown in FIG. 3 . When the data capability is strong, the time difference between the two devices for collecting/processing data may be at the microsecond level or even lower, and the level of the time difference is too low for the user to perceive during the human-computer interaction process. The cursor display effect caused by this time difference. Therefore, in the human-computer interaction method shown in FIG. 6, the cursor in the display device can be controlled by asynchronous calibration without the initialization process shown in FIG. 3, and the coherence of the cursor movement in the display device can be improved. to enhance the user experience.

Please refer to FIG. 6 , which is a schematic flowchart of a human-computer interaction method according to an embodiment of the present application. The method includes the following steps.

S201. The electronic device determines initial IMU data.

In this embodiment, when a user carries an electronic device equipped with an IMU data acquisition device in a handheld or wearable manner, and the user is in the shooting area of the image acquisition device, the electronic device collects the user through the IMU data acquisition device in the first set of moments Body movements get initial IMU data.

It should be noted that the first time set is included in the first time period. For example, the sampling frequency of the IMU data acquisition device is the first sampling frequency (for example, 100 Hz), and the multiple times included in the first time set are the first time period. 100 moments in every second.

As can be seen from the description of FIG. 2, the electronic device including the IMU data acquisition device is used as an example for description. For example, the electronic device is a mobile phone, a remote control device (such as a remote control, a handle, etc.), a tablet computer, a wearable device (such as smart watches, smart bracelets, etc.)

Specifically, in step S201, when the user carries the electronic device equipped with the IMU data collection device by hand or wears it, and in the process of performing the air mouse operation, the IMU continuously tracks the change of the user's gesture, through the main components included in the IMU The gyroscope, accelerometer, magnetometer, etc. record the IMU data, and collect the user's limb movements in the first time set to obtain the initial IMU data, so that the electronic device can obtain and determine the initial IMU data in step S201.

Optionally, the initial IMU data obtained in step S201 may be obtained after processing the IMU data recorded by the IMU data acquisition device through waveform smoothing processing, de-drying calibration compensation processing, or other methods. Initial IMU data, not limited here.

S202. The electronic device determines the first image data.

In this embodiment, when a user carries an electronic device equipped with an IMU data acquisition device in a handheld or wearable manner, and the user is in the shooting area of the image acquisition device, the image acquisition device collects the user's limb movements in the second set of moments to obtain an image The image data may include one or more frames of image information, or a video stream containing multiple frames of images, and the like. Wherein, in step S202, the electronic device may obtain the first image data through wired/wireless communication connection with the device including the image acquisition device.

It should be noted that the second time set is included in the first time period. For example, the sampling frequency of the image acquisition device is the second sampling frequency (for example, 5 Hz), and the multiple moments included in the second time set are within the first time period. 5 moments in every second.

Specifically, as can be seen from the description of the human-computer interaction method (ie, method 3) of the aforementioned real-time calibration, due to the limitation of hardware computing capabilities, the calculation time of the CV recognition process is generally much longer than the processing time of the IMU data. For example, CV recognition Each calculation process of the IMU generally takes several hundred milliseconds, while the processing time of each IMU data is generally several milliseconds or ten milliseconds, and the difference between the two is at least an order of magnitude. Therefore, the second time set corresponding to the image data collected by the image acquisition device in step S202 may be a subset of the first time set corresponding to the initial IMU data collected by the IMU data acquisition device in step S201.

Exemplarily, the processing time of each IMU data is 10 milliseconds, and the processing time of each CV identification process is 200 milliseconds as an example. In steps S201 and S202, the user carries the device equipped with the IMU data acquisition device by hand or wearing, and the user is within a certain second time interval during which the air mouse operation is performed in the shooting area of the image acquisition device. The interval is denoted as (0, 1000], and the unit of time interval described here and in the subsequent description is milliseconds. In this example, the first time set for collecting initial IMU data in step S201 is (10, 20, 30... 200, 210, 220... 400... 600... 800... 980, 990, 1000) a total of 100 times, the second time set for collecting the first image data in step S202 is (200, 400, 600, 800, 1000) a total of 5 times. Subsequent In step S203 and step S204, the initial IMU data collected based on 100 times may be asynchronously calibrated based on the first image data collected at 5 times, so as to avoid cursors that exist when only the IMU data is used for human-computer interaction. Drift and other issues (see the description of the above-mentioned way 1).

Specifically, in the implementation process of step S202, the image acquisition device may detect that the user holds or wears the IMU data acquisition device and the user's position is in the shooting area of the image acquisition device, or the image acquisition device responds to the user's voice Wake up, or, in response to a user's operation on the electronic device, the image acquisition apparatus performs a process of acquiring image data, which is not limited here.

In addition, the number of cameras included in the image capture device may be set to one, that is, the image capture device acquires image data through a single camera, or the number of cameras included in the image capture device may also be set to multiple, that is, the image capture device uses Image data is acquired by multiple cameras, which is not limited here. Exemplarily, when the image acquisition device includes multiple cameras, different cameras can be used to cover scenes of different ranges, and the problem that the cameras cannot switch the focal length back and forth can be solved. ranging from multiple sets of image data to improve ranging accuracy. When the image acquisition device includes one camera, compared with the arrangement of multiple cameras, the hardware settings of the camera can be saved, and the calculation amount of image data can be reduced to improve the subsequent cursor response speed.

In this embodiment and subsequent embodiments, only one camera is taken as an example for description. Please refer to FIG. 7 , which is an implementation example of setting the positional relationship between the image capturing device and the display device in the electronic device. The image capture device may be set outside the display area of the display device. For example, as shown in (a) of FIG. 7 , the image capture device is set at a position close to the upper frame of the display device. For another example, as shown in (b) of FIG. 7 It is shown that the image acquisition device is set at a position close to the lower border of the display device, or the image acquisition device is set at other positions of the display device, such as a position close to the left border of the display device, a position close to the right border of the display device, or a position close to the display device. The positions of the upper left corner, upper right corner, lower left corner, and lower right corner of the device are not limited here. In addition, the image capturing device may also be arranged within the display area of the display device. For example, as shown in (c) of FIG. As shown in (d), the image acquisition device is set at the middle position of the display area of the display device, or the image acquisition device is set at other positions of the display device, such as a position close to other borders in the display area of the display device, which is not limited here. .

S203. The electronic device performs CV key point recognition according to the first image data to obtain a first constraint condition;

In this embodiment, the electronic device performs CV key point recognition according to the first image data determined in step S202 to obtain the first constraint condition.

Specifically, in step S203, by reading the first image data, the electronic device uses the human body skeleton recognition technology to perform CV key point recognition on the human body included in the image data, and determines the obtained recognition result as the first constraint condition , for example, the CV identification result may include the three-dimensional space azimuth information of the CV key points. Among them, in the process of CV key point identification and positioning, it can be based on the number of CV key points being 9, the number of CV key points being 14, the number of CV key points being 16, the number of CV key points being 21, or other CV key points. The implementation of the number of points is not limited here. Illustratively, take the implementation process in which the number of CV key points is 9 as shown in FIG. 8 as an example, wherein the 9 CV key points respectively include the user's left eye, nose, right eye, left shoulder, right shoulder, left elbow, right Elbow, left wrist, right wrist. In step S203, the CV key point at least includes the position where the user holds or wears the electronic device containing the IMU data acquisition device, such as the user's left elbow, left wrist, right elbow, right wrist, left hand finger, right finger, etc. or other The CV key points of , can be adjusted according to specific application scenarios, which are not limited here.

It should be noted that, in step S203, the CV key point identification process may also be performed by a device including an image capture device to obtain the first constraint condition, that is, in step S202, the device including an image capture device may also The first constraint condition is sent to the electronic device. In this scenario, the electronic device does not need to perform the CV key point identification process, which can reduce the processing delay of the electronic device and improve the response speed of the electronic device.

As an implementation example, whether the electronic device performs the CV key point recognition process, or the device including the image acquisition device performs the CV key point recognition process, the input image data (for example, the aforementioned The first image data) is processed to obtain the first constraint condition. Among them, through the processing process of the neural network model, the processing efficiency can be greatly improved, and the response speed of the subsequent cursor in the display device can be further improved.

Optionally, the preset neural network model may be obtained by training a training sample, wherein the training sample may include image data and label data, and the label data may be the CV key point coordinates corresponding to the image data, or, The label data may be a constraint condition corresponding to the image data (such as a three-dimensional space orientation angle), or the label data may be a CV key point coordinate corresponding to the image data and a constraint condition (such as a three-dimensional space orientation angle) corresponding to the image data. There are no restrictions. In addition, the training process can be performed locally by the electronic device, or locally by the device including the image capture device, or by the cloud server and then transmitted to the electronic device or the device including the image capture device by means of data transmission, There is no limitation here.

Here, in the aforementioned step S202, the user carries the equipment with the IMU data acquisition device by hand-held or wearing, and the user is in the shooting area of the image acquisition device and performs the air mouse operation within a certain second. The realization process Described as an example. In this example, in step S202, the set of second moments when the image acquisition device collects the first image data is (200, 400, 600, 800, 1000) a total of 5 moments, and the electronic device performs CV on the image data corresponding to these five moments in step S203. Key point identification (for example, the user wears a watch containing an IMU data acquisition device on the right wrist), and obtains the positioning coordinates of the CV key point "right wrist" in five moments, and the positioning coordinates in these five moments respectively indicate the user's movement. Or the positioning coordinates of the right wrist corresponding to the operation, and determine the movement direction of the right wrist according to the chronological order of 5 moments to determine the azimuth angle in three-dimensional space of the right wrist (or the arm where the right wrist is located), and determine the three-dimensional space direction angle as the first constraint.

In addition, in step S203, the human skeleton recognition technology used by the electronic device may be CV key point recognition using a three-dimensional (3-dimensional, 3D) human skeleton recognition technology (for example, in step S202, the image acquisition device uses a monocular camera to acquire image), or a two-dimensional (2-dimensional, 2D) human skeleton recognition technology (for example, when the image acquisition device in step S202 acquires images through a multi-camera camera), which is not limited here. That is, the CV key points used by the electronic device to determine the first constraint condition in step S203 may be 2D human body CV key points or 3D human body CV key points, which are not limited here.

Optionally, the electronic device can acquire the wearing position information of the device including the IMU data acquisition device, so that the electronic device can determine which CV key point among the multiple CV key points can be used as the first constraint condition. For example, it is assumed that the device including the IMU data acquisition device is a watch, and the electronic device is also a watch. The user wears the watch on the right wrist, and the watch can sense whether the user wears the watch on the left wrist or the right wrist. Assuming that the CV key points include 6 CV key points including left shoulder, right shoulder, left elbow, right elbow, left wrist and right wrist, the watch can determine the 3D space azimuth information of the CV key point of the right wrist as the first constraint according to the wearing position information condition.

S204. The electronic device calibrates the IMU data based on the first constraint condition to obtain target IMU data;

In this embodiment, the electronic device calibrates the IMU data determined in step S201 according to the first constraint condition determined in step S203 to obtain target IMU data.

Specifically, the electronic device processes the IMU data recorded by the three sensors of the gyroscope, accelerometer and magnetometer in the initial IMU data determined in step S201 to obtain the attitude angle information through the attitude calculation algorithm, and based on the first obtained in step S203 A constraint condition is used to calibrate the recorded IMU data (or the attitude angle information is calibrated based on the first constraint condition obtained in step S203, or the recorded data is calibrated based on the first constraint condition obtained in step S203. The IMU data and the attitude angle information are calibrated) to obtain the target IMU data. The attitude calculation algorithm may include a mahony algorithm, a Kalman filter algorithm, and the like.

Here, in the aforementioned step S202, the user carries the equipment with the IMU data acquisition device by hand-held or wearing, and the user is in the shooting area of the image acquisition device and performs the air mouse operation within a certain second. The realization process Described as an example. In this example, the first time set of the initial IMU data collected by the IMU data collection device in step S201 is (10, 20, 30...200, 210, 220...400...600...800...980,990 , 1000) a total of 100 moments, the second moment set of the first image data collected by the image acquisition device in step S202 is (200, 400, 600, 800, 1000) a total of 5 moments, the electronic device in step S203 corresponds to the five moments respectively The image data is used for CV key point recognition (for example, the user wears a watch containing an IMU data acquisition device on the right wrist), and the first constraint condition containing the positioning coordinates in these five moments is obtained, and according to the positioning coordinates in the five moments The asynchronous calibration is performed in the first time set respectively, that is, the attitude angle information corresponding to the IMU data in the first time set (200, 400, 600, 800, 1000) is calibrated according to the positioning coordinates in the five time sets, and the calibrated The IMU data of 100 moments is the target IMU data.

Specifically, in the above example, the first constraint condition is the three-dimensional space direction angle of the arm indicated by the positioning coordinates in 5 times, and the initial IMU data is the IMU data at 100 times. That is, in step S204, the electronic device may calibrate the three-dimensional space direction angle of the arm indicated by the positioning coordinates in 5 moments (in the first constraint) to the IMU data at 100 moments (in the initial IMU data) , get the target IMU data. Among them, asynchronous calibration is performed on the initial IMU data at many times through the first constraint formed by the CV identification of the image data at a small amount of time. Compared with the synchronous calibration method (for example, in the aforementioned method 3), there is no need to wait for a long time for CV identification. The control information obtained from the target IMU data obtained by calibration can be used to control the cursor in the display device, so that the refresh rate of the cursor in the display device can be the same as the frame rate of the IMU data collection, and in the asynchronous calibration method Limited by the processing frequency of CV recognition, the refresh frequency of the cursor can be increased, and at the same time, problems such as cursor display freeze and display delay can be avoided, and the user experience can be improved.

As an implementation example of the calibration process, the electronic device may calibrate the IMU data recorded by the sensor in the initial IMU data according to the first constraint condition in step S204 to obtain the calibration result, and then process the calibration result through the attitude settlement algorithm , and use the attitude angle information obtained by processing as the target IMU data.

Specifically, the electronic device performs mapping processing according to the three-dimensional space orientation angle of the arm (at five moments) in the first constraint condition, and obtains the IMU calibration data corresponding to each arm three-dimensional space orientation angle (at five moments), and based on the obtained The (5 time) IMU calibration data are fitted to obtain the IMU calibration curve. In addition, according to the IMU data recorded by the sensor in the initial IMU data (100 moments), the initial IMU curve is obtained by fitting, and further, the weighted average processing is performed on the IMU calibration curve and the initial IMU curve to obtain the optimized curve. , and read the corresponding (100 time) calibrated IMU data from the optimized curve. Further, the calibrated IMU data is processed by an attitude calculation algorithm to obtain attitude angle information (at 100 times), and the attitude angle information (at 100 times) is used as the target IMU data obtained in step S204.

As another implementation example of the calibration process, in step S204, the electronic device may first obtain the attitude angle information according to the IMU data recorded by the sensor in the initial IMU data through the attitude calculation algorithm, and then adjust the attitude angle information according to the first constraint condition. The attitude angle information is calibrated, and the calibrated attitude angle information is used as the target IMU data.

Specifically, the electronic device performs regression processing according to the three-dimensional space orientation angle of the arm (at five moments) in the first constraint condition, and obtains the attitude angle information corresponding to each arm three-dimensional space orientation angle (at five moments), and based on the obtained The attitude angle calibration curve is obtained by fitting the attitude angle information of (5 moments). And, according to the IMU data (100 moments) recorded by the sensor in the initial IMU data, the attitude angle information (5 moments) is obtained through the attitude calculation algorithm, and the attitude angle information (5 moments) is fitted and processed. , get the attitude angle change curve. Thereafter, weighted average processing/filtering is performed on the attitude angle calibration curve and attitude angle change curve to obtain an optimized attitude angle curve, and based on the optimized attitude angle curve, attitude angle information (of 100 moments) is obtained as step S204 The target IMU data obtained in .

As discussed in the scenario of method 3 above, when the user stands in front of the display screen, and the IMU and CV multi-devices are integrated to realize the real-time human-computer interaction method, there are still problems such as inaccurate positioning and cursor overflow. Exemplarily, the implementation process shown in FIG. 9 is used as an example here to illustrate the problems of inaccurate positioning and cursor overflow in the implementation process of the third mode.

As shown in Figure 9, the user carries a wearable device (including an IMU) on his hand, and performs user body movements in front of the device with a display screen and a camera. The wearable device obtains the IMU data generated by the user's body movements, and at the same time, The camera acquires the image data generated by the user's body movements, and calibrates the IMU data according to the image data, so as to realize the scene of the mouse movement operation on the coordinate position on the display screen. Wherein, the moving direction of the wearable device triggered by the user's physical action is indicated by a dashed arrow, and the coordinate displacement of the coordinate position in the display screen is indicated by a solid arrow. Among them, since the position of the camera on the display screen is fixed (for example, when it is set above the middle axis of the display screen), when the user is at position 0, the user's limb movements move within a certain angular range, so that the coordinates mapped on the display screen Move from A to B; when the user is in position 1, the user's body movement moves the same angle range, so that the coordinates mapped on the display screen move from C to D, but because the relative direction of the camera and the user changes, the user's body movement Even if the movement moves within the same angular range, the changes of the coordinates mapped on the display will still be different (that is, the distance between the two points AB is not equal to the distance between the two points CD); similarly, when the user is at position 2, the user's limbs move Move the same angle range, so that the coordinates mapped on the display screen move from E to F. At this time, the relative direction between the camera and the user in position 2 is similar to the relative direction between the camera and the user in position 0, and the resulting display The coordinate displacement changes mapped on the screen may be the same (that is, the distance between the two points CD is approximately equal to the distance between the two points EF), but since the position 2 is close to the right edge of the display screen, it is easy to cause the problem of cursor overflow as shown in the figure (that is, point F). The coordinates are beyond the range of coordinates covered by the display area).

In addition, when the user's orientation changes, and the user's body movement moves within the same angular range, the cursor movement path corresponding to the user's body movement displayed on the display screen will also deviate. For example, when the user is in a standing posture with the front of the body facing the camera and the side of the body facing the camera, even if the user's limbs perform the same action, the resulting movement paths of the cursor on the display screen are different. Specifically, for example, when the user is in a standing posture with the front of the body facing the camera, the user's arm moves from a position naturally perpendicular to one side of the body to a position parallel to the ground on the same plane as the torso using the shoulder joint as the axis. The cursor movement path on the screen may be an arc. However, when the user is in a standing posture with the side of the body facing the camera, and the arm still performs the above actions, the resulting movement path of the cursor on the display screen may be a straight line. That is, when the position and orientation of the human body relative to the display screen changes, the spatial displacement and the angle change of the human body torso coordinate system cannot be tracked, resulting in inaccurate positioning and cursor overflow. In order to solve this problem, the implementation can be further optimized in step S204, which will be described in detail below.

In a possible implementation manner, in step S204, the electronic device calibrates the initial IMU data based on the first constraint condition, and the process of obtaining the target IMU data may specifically include: the electronic device firstly obtains the first image data according to the first image data obtained in step S202. Determine a first human arm engineering model, the first human arm engineering model includes at least one first value range of the rotation direction of the limb, after that, the first constraint condition is updated based on the first human arm engineering model to obtain the updated No. A constraint condition, and further use the updated first constraint condition to calibrate the initial IMU data to obtain target IMU data.

Specifically, the electronic device can also determine a first human arm engineering model according to the first image data obtained in step S202, where the first human arm engineering model includes at least one first value range of the rotation direction of the limb, that is, for the user's personalization It is easy to get tired and constructs the first human arm engineering model with minimum work and minimum torque change. Exemplarily, in the process of human-computer interaction, the user generally does not make limb movements that violate ergonomics. is [-25, 30°]. If the wrist joint movement beyond this range is detected, it can be considered that the identification of the IMU data acquisition device or image acquisition device is incorrect. Therefore, the movable range of different limbs of the user can be determined. Build the ergonomic model of the human arm.

It should be noted that the user's fatigue letter may indicate the gorilla arm effect. For example, the user's elbow joint has a larger angle change than the shoulder joint, and the movement angle of the shoulder joint may be temporarily larger in the first few minutes. range, but soon the movement angle of the shoulder joint is reduced to within 30 degrees. The minimum work and minimum torque changes are also specific to the phenomenon that the movement at the end of the rigid cylinder consumes less work and torque than the movement at the root of the cylinder by the same angle.

Thereafter, the target IMU data is determined based on the first ergonomic model of the human arm and the first constraint. In the specific implementation process, the first constraint condition determined in step S203 (for example, the aforementioned three-dimensional space direction angle) can be input into the human arm engineering model to form a ternary inequality about the attitude angle, and based on the attitude angle of the ternary The inequality calibrates the initial IMU data to obtain the target IMU data in step S204. Among them, the human arm engineering model is used as one of the constraints, which can effectively avoid the inaccurate positioning in the display screen and the overflow of the cursor.

Illustratively, an implementation example of an ergonomic model of a human arm can be implemented in FIG. 10 . As shown in Figure 10, according to the analysis of mechanics and dynamics principles, the user's limb is simulated as a four-axis rigid cylinder human arm engineering model. For example, in the figure, θ1 indicates the user's shoulder, θ2 indicates the user's shoulder joint, and θ3 indicates the user The elbow joint, θ4 indicates the user's wrist joint. Specifically, each rigid cylinder corresponding to each user limb has different movable distances/angles in different degrees of freedom, and the first ergonomic model of the human arm includes at least one first range of rotation directions of the limb. The degrees of freedom of the different limbs of the user can be exemplarily represented as pitch, yaw, and yaw. Generally, the physical movements of the different limbs of the user can be performed based on these three degrees of freedom. express.

For example, the adduction or abduction action of the user's shoulder joint can be represented by pitch, the forward bending or backward extension action of the user's shoulder joint can be represented by roll, and the internal rotation or external rotation action of the user's shoulder joint can be represented by yaw.

For another example, the extension and flexion action of the user's elbow joint can be represented by pitch, and the rotation action of the user's elbow joint can be represented by roll.

For another example, the extension and flexion action of the user's wrist joint can be represented by pitch, and the ulnar deflection action of the user's wrist joint can be represented by roll.

Exemplarily, the device including the IMU data collection device is taken as an example of a watch, and the watch is generally worn on the user's wrist joint. Wherein, through the constructed first human arm engineering model, the first range of limb movements of the user's wrist joint can be obtained, including extension and flexion [-35°, 50°] and ulnar deviation [-25°, 30°]. For example, if the first constraint condition obtained by the image acquisition device indicates that the posture angle of the user's wrist joint is an extension and flexion angle, the extension and flexion angle can be input into the first human arm engineering model to form a ternary inequality about the extension and flexion angle. Expressed as:

min≤pitch≤max;

Among them, min indicates the minimum value of the extension and flexion action in the first value range, that is -35°, pitch is the value of the extension and flexion angle corresponding to this degree of freedom, and max indicates the maximum value of the extension and flexion action in the first value range, that is, 50°.

In the ternary inequality, if the first constraint condition indicates that the posture angle of the user's wrist joint is an extension and flexion angle that exceeds the first value range, the first constraint condition can be updated based on the first value range, and the excess The value is updated to the minimum or maximum value in the first range (ie -35° or 50°). Obviously, if the degree of freedom of the detected user limb is not the pitch, the pitch in the ternary inequality can be replaced by other degrees of freedom, which can be flexibly implemented according to specific application scenarios, which is not limited here.

In a possible implementation manner, if the electronic device performs the initialization process shown in FIG. 3 , it can perform calculations according to the initialization image data obtained in step S101 in FIG. 3 to establish an initial human arm engineering model, and the user and the image capture When the relative information between the devices changes, the initial human arm engineering model is updated to obtain the above-mentioned first human arm engineering model.

Optionally, the first duration of the user initialization process performed by the electronic device in FIG. 3 may include a third time set, wherein, in the second time set after the third time set, the electronic device performs the first image data in the first image data. CV key point identification, relevant parameters (such as shoulder width, eye spacing, ear spacing, etc.) can be obtained to determine the first relative information, and when the first relative information is different from the initial relative information, the electronic device will be based on the first relative information. The initial human arm engineering model is updated to obtain the first human arm engineering model. Wherein, based on the difference between the initial relative information and the first relative information (for example, the difference in the orientation angle, the difference in the distance, etc.), the overall coordinate value in the initial ergonomic model of the human arm can be offset and corrected according to the direction indicated by the difference. , to obtain the first ergonomic model of the human arm.

It should be noted that the third time set is located before the first time period.

Wherein, the electronic device can also determine the initial image data collected by the image acquisition device in the third time set before the second time set, and construct the initial human arm engineering model according to the initial image data. When the initial relative information is different from the first relative information due to the user walking or the user turning around, that is, the relative information collected by the user (such as the user's torso or the user's body) and the image acquisition device at the second moment is different from the user (such as the user's torso or the user's body) ) when compared with the relative information collected by the image acquisition device at the third moment, when a change occurs, use the first relative information to update the initial ergonomic model of the human arm, obtain the first ergonomic model of the human arm, and further optimize the cursor control .

It should be noted that, in the user initialization process, the calculation is performed according to the initialization image data to establish the implementation process of the initial human arm engineering model, which is similar to the aforementioned implementation process of determining the first human arm engineering model according to the first image data obtained in step S202. , and will not be repeated here.

Exemplarily, when the relative information between the user and the image acquisition device is changed, the initial human arm ergonomic model is updated, and the scene of obtaining the above-mentioned first ergonomic arm ergonomic model is introduced.

Wherein, the electronic device can determine the first relative information between the user and the image acquisition device according to the first image data, and when the first relative information is different from the initial relative information, trigger the execution of the above-mentioned updating process, so as to obtain the first relative information. Human arm ergonomic model. For example, the shoulder joint includes at least two degrees of freedom, extension and flexion, abduction and adduction, and the initial relative information indicates that the user is facing the image acquisition device as an example. At this time, the initial relative information can establish an initial human arm engineering model , and the first value range of the initial human arm engineering model includes the parameters on the two degrees of freedom of the shoulder joint, including the movement range of abduction and adduction. The range of movement range of this degree of freedom can be [0, 0°]; taking the first relative information indicating that the user is facing 90 degrees laterally facing the image acquisition device as an example, at this time, the first relative information can be used to update the The initial human arm engineering model is to perform translation/rotation operations on the coordinates of the initial human arm engineering model according to the difference between the initial relative information and the first relative information to obtain the updated first human arm engineering model. In this example, the second value range of the first human arm ergonomic model also includes parameters on the two degrees of freedom of the shoulder joint, and the range of movement range including the abduction and adduction degree of freedom may be [0, 0° ], the range of movement range of this degree of freedom can be [0, 90°].

Similarly, for the process of determining the first relative information by the electronic device according to the first image data, reference may also be made to the process of determining the initial relative information, which will not be repeated here.

Further, when the initial relative information is the same as the first relative information, the electronic device determines the initial human arm engineering model as the first human arm engineering model. Wherein, when the initial relative information is the same as the first relative information, that is, when the relative information gathered by the user and the image acquisition device at the second moment is not changed compared with the relative information gathered by the user and the image acquisition device at the third moment, The initial human arm engineering model can be determined as the first human arm engineering model, and there is no need to update the human arm engineering model, thereby improving processing efficiency.

S205. The electronic device performs coordinate transformation processing on the target IMU data to obtain control information.

In this embodiment, the electronic device performs coordinate conversion processing according to the target IMU data obtained in step S204 to obtain control information, where the control information is used to control the cursor in the display device. For example, the control information is used to control the cursor in the display device to perform related operations, such as moving, dragging, zooming in, clicking, and the like.

Exemplarily, when the control information is used to control the cursor in the display device to move, the control information may be specific coordinate information (X, Y) of the cursor on the two-dimensional display plane of the display device. When the control information is used to control the cursor in the display device to perform dragging, zooming, and clicking, the control information may be an identifier corresponding to the corresponding gesture action (for example, performing dragging corresponds to marker 1, performing zooming corresponding to marker 2, performing click Corresponding identification 3, etc.); wherein, in step 105, the electronic device can determine whether the target IMU data is the gesture action of the corresponding category according to the preset neural network classifier, and if so, the identification of the corresponding gesture action is used as control information, to Implements related operations on the cursor in the display device.

In a possible implementation manner, in step S205, the electronic device performs coordinate transformation processing on the target IMU data, and the process of obtaining the control information may include: the electronic device determines the user (for example, the user's torso or the user's torso) according to the first image data. body) and the first relative information between the image acquisition device, and determine the first mapping relationship of the user (such as the user's torso or the user's body) in the display device according to the first relative information, and then determine the first mapping relationship according to the first relative information. A mapping relationship is performed on the target IMU data to perform coordinate transformation processing to obtain the control information.

Optionally, the first relative information includes parameters such as distance and station orientation, and the implementation process may refer to the content in step S204, which will not be repeated here.

Specifically, because the user may move during the cursor control process, the relative information between the user and the image capturing device may change. Therefore, according to the first relative information determined by the first image data, the first mapping relationship of the user (for example, the user's torso or the user's body) in the display device can be further determined, and the first mapping relationship can be used as the processing of control information Based on this, problems such as inaccurate positioning and cursor overflow caused when the relative information is changed are avoided.

In a possible implementation manner, if the electronic device performs the initialization process shown in FIG. 3 , it can perform calculation according to the initialization image data obtained in step S101 in FIG. The initial mapping relationship in the device is updated, and the above-mentioned first initial mapping relationship is obtained only when the relative information between the user and the image capturing device changes.

Exemplarily, the electronic device may further determine the initial mapping relationship of the user in the display device according to the initial image data. Thereafter, when the initial relative information is different from the first relative information, that is, when the relative information collected by the user and the image capture device at the second time is compared with the relative information collected by the user and the image capture device at the third time, use The first relative information updates the initial mapping relationship to obtain the first mapping relationship, so as to further optimize the cursor control.

Further, when the initial relative information is the same as the first relative information, the electronic device determines the initial relative information as the first mapping relationship. Wherein, when the initial relative information is the same as the first relative information, that is, the relative information gathered by the user (for example, the user's torso or the user's body) and the image acquisition device at the second moment and the relative information gathered by the user and the image acquisition device at the third moment In contrast, when there is no change, the initial mapping relationship can be determined as the first mapping relationship, and there is no need to update the mapping relationship, thereby improving processing efficiency.

It can be seen from the above content that the embodiments of the present application have at least the following beneficial effects: the electronic device uses the image acquisition device to collect the first image data obtained by the user's limb movements in the second time set to perform CV key point recognition, and obtain the first constraint condition. , based on the first constraint condition, calibrate the initial IMU data obtained by the IMU data acquisition device collecting the user's limb movements in the first time set to obtain the target IMU data, and then perform coordinate transformation processing based on the target IMU data to obtain the target IMU data. Control information for controlling the cursor in the display device. The second time set is a subset of the first time set, that is, the process in which the electronic device calibrates the initial IMU data to obtain the target IMU data is asynchronous calibration. Among them, due to the limitation of hardware computing capacity, the calculation time of the CV identification process is generally much longer than the processing time of the IMU data. Compared with the human-computer interaction method of real-time synchronous calibration, the asynchronous calibration implementation method does not need to wait for a long calculation time. The long CV processing process can effectively avoid problems such as display freezes and display delays, so that the cursor in the display device can be controlled by asynchronous calibration, which can improve the continuity of the cursor movement in the display device, thereby improving user experience. .

Referring to FIG. 11 , an embodiment of the present application further provides a human-computer interaction method, and the human-computer interaction method can be performed by one or more electronic devices, wherein the method can include the following module settings.

A determining image data module 1101, configured to determine and output image data, where the image data includes at least the first image data, corresponding to the implementation process in the aforementioned step S202;

The CV key point recognition module 1102 is used to perform CV key point recognition on the image data output by the determining image data module 1101, and obtain and output the first constraint condition, which corresponds to the implementation process of the aforementioned step S203;

determining IMU data module 1103, for determining and outputting IMU data, where the IMU data at least includes initial IMU data, corresponding to the implementation process in the aforementioned step S201;

An asynchronous calibration module 1104, configured to perform calibration processing at least according to the first constraint condition and the initial IMU data, and obtain and output the target IMU data, corresponding to the implementation process in the aforementioned step S204;

The coordinate conversion module 1105 is configured to perform coordinate conversion processing at least according to the target IMU data, and obtain and output control information, which corresponds to the implementation process in the foregoing step S205.

Optionally, the electronic device shown in FIG. 11 may further include other modules as follows.

a display device 1106, configured to control the cursor according to the control information;

The preprocessing module 1107 is used to preprocess the initial IMU data, and output the preprocessed result to the asynchronous calibration module 1104, wherein the preprocessing process may include waveform smoothing processing, de-drying calibration compensation processing, etc.;

The human arm engineering model building module 1108 is used to construct a first human arm engineering model according to the first image data output by the image data module 1101 , and input the first human arm engineering model to the first constraint of the asynchronous calibration module 1104 The condition is updated as one of the basis for determining the target IMU data;

Optionally, the human arm engineering model building module 1108 can also be used to construct an initial human arm engineering model according to the initial image data output by the determining image data module 1101, and input the initial human arm engineering model pair to the first step of the asynchronous calibration module 1104. A constraint condition is updated;

The mapping relationship determination module 1109 is used to determine the first mapping relationship according to the first image data output by the image data module 1101, and output the first mapping relationship to the coordinate conversion module 1105 as one of the basis for coordinate conversion processing;

Optionally, the mapping relationship determination 1109 can also be used to determine the initial mapping relationship according to the initial image data output by the determining image data module 1101, and output the initial mapping relationship to the coordinate conversion module 1105 as one of the basis for coordinate conversion processing;

The module 1110 for judging whether the relative information has changed is used to determine whether the first relative information indicated by the first image data and the initial relative information indicated by the initial image data have changed;

If it is changed, the judging whether the relative information is changed or not, the module 1110 outputs the judgment result to the human arm engineering model building module 1108, so that the human arm engineering model building module 1108 determines to output the first human arm engineering model to the asynchronous calibration module 1104; The relative information change module 1110 outputs the judgment result to the mapping relationship determination module 1109, so that the mapping relationship determination module 1109 determines to output the first mapping relationship to the asynchronous calibration module 1104,

If it has not changed, the judging whether the relative information has changed the module 1110 outputs the judgment result to the human arm engineering model building module 1108, so that the human arm engineering model building module 1108 determines to output the initial human arm engineering model to the asynchronous calibration module 1104; The relative information change module 1110 outputs the judgment result to the mapping relationship determination module 1109 , so that the mapping relationship determination module 1109 determines to output the initial mapping relationship to the asynchronous calibration module 1104 .

In addition, for the implementation process of each module shown in FIG. 11 and the corresponding beneficial effects, reference may also be made to the description of the foregoing method embodiments, which will not be repeated here.

Referring to FIG. 12 , an embodiment of the present application further provides a first electronic device 1200 , where the first electronic device 1200 may at least include a motion sensor 1201 and a processor 1202 .

Optionally, the first electronic device 1200 may further include other components, such as a memory, a casing, a communication module, etc., which are not limited here.

Specifically, the motion sensor 1201 can be used to implement the implementation process of the IMU data acquisition apparatus in any of the foregoing embodiments, and the processor 1202 can be used to perform the calculation, processing and other implementation processes in any of the foregoing embodiments, and achieve corresponding beneficial effects , will not be repeated here.

Referring to FIG. 13 , an embodiment of the present application further provides a second electronic device 1300 , where the second electronic device 1300 may at least include a camera 1301 and a display screen 1302 .

Optionally, the second electronic device 1300 may also include other components, such as a memory, a casing, a communication module, etc., which are not limited here.

Specifically, the camera 1301 can be used to implement the implementation process of the image acquisition device in any of the foregoing embodiments, and the display screen 1302 can be used to implement the implementation process of the display device in any of the foregoing embodiments, and achieve corresponding beneficial effects, here Not to repeat them one by one.

The present application provides an electronic device, which is coupled to a memory for reading and executing instructions stored in the memory, so that the electronic device implements the electronic device in any of the foregoing embodiments in FIG. 3 to FIG. 11 . The steps of the method to perform. In one possible design, the electronic device is a chip or a system on a chip.

The present application provides a chip system, the chip system includes a processor for supporting an electronic device to implement the functions involved in the above aspects, for example, for example, sending or processing the data and/or information involved in the above method. In a possible design, the chip system further includes a memory for storing necessary program instructions and data. The chip system may be composed of chips, or may include chips and other discrete devices.

The present application also provides a processor, which is coupled to the memory and configured to execute the methods and functions related to the electronic device in any of the foregoing embodiments.

The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a computer, implements a method process related to an electronic device in any of the foregoing method embodiments. Correspondingly, the computer may be the above electronic device.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.

The names of messages/frames/information, modules or units, etc. provided in the embodiments of the present application are only examples, and other names may be used, as long as the functions of the messages/frames/information, modules or units are the same.

The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. As used in the embodiments of this application, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that, in the description of this application, unless otherwise specified, "/" indicates that the associated objects are in an "or" relationship, for example, A/B can indicate A or B; in this application, "and" "/or" is just an association relationship that describes an associated object, which means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. where A and B can be singular or plural.

Depending on the context, the words "if" or "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting." Similarly, the phrases "if determined" or "if detected (the stated condition or event)" can be interpreted as "when determined" or "in response to determining" or "when detected (the stated condition or event)," depending on the context )" or "in response to detection (a stated condition or event)".

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present application.

Claims

A human-computer interaction method, which is applied to a human-computer interaction system including a motion sensor, a camera, a processor and a display screen, is characterized in that, the method comprises:

The motion sensor acquires initial motion sensing data of a first sampling frequency within a first time period, and the initial motion sensing data is triggered by a user's limb movement;

obtaining, by the camera, first image data at a second sampling frequency within the first time period, where the second sampling frequency is less than the first sampling frequency, and the first image data includes the user's limb movement information;

obtaining, by the processor, a first constraint condition obtained by performing computer vision CV processing on the first image data;

The processor calibrates the initial motion sensing data according to the first constraint condition to obtain target motion sensing data;

The processor obtains control information according to the target motion sensing data, where the control information is used to control the display screen.
The method according to claim 1, wherein the first constraint condition is obtained by recognizing human skeleton key points in computer vision CV processing on the first image data, and the first constraint condition comprises: 3D space orientation angle information.
The method according to claim 1 or 2, wherein the processor calibrates the initial motion sensing data according to the first constraint condition to obtain target motion sensing data, which specifically includes:

The processor maps and obtains calibration data according to the first constraint condition, and obtains a first curve by fitting based on the calibration data;

The processor obtains a second curve by fitting according to the initial motion sensing data, and performs weighted average processing on the first curve and the second curve to obtain a third curve;

the processor determines calibrated motion sensing data in the third curve;

The processor processes the calibrated motion sensing data according to an attitude calculation algorithm to obtain the target motion sensing data.
The method according to any one of claims 1 to 3, wherein the processor calibrates the initial motion sensing data according to the first constraint condition to obtain target motion sensing data, which specifically includes:

The processor processes the initial motion sensing data according to an attitude settlement algorithm to obtain first attitude angle data;

The processor obtains a fourth curve by fitting according to the first attitude angle data, and obtains a fifth curve by fitting according to the first constraint condition;

The processor performs weighted average processing on the fourth curve and the fifth curve to obtain a sixth curve;

The processor determines the target motion sensing data in the sixth curve.
The method according to any one of claims 1 to 4, wherein the control information is coordinate data obtained by performing coordinate transformation on the target motion sensing data, and the coordinate data is used to control the display screen the display position of the cursor in , or,

The control information is a gesture identification result obtained by mapping the target motion sensing data, and the gesture identification result is used to operate an interface element of the display screen.
The method according to any one of claims 1 to 5, wherein before the processor calibrates the initial motion sensing data according to the first constraint condition, the method further comprises: the processor The first constraint and the initial motion sensing data are aligned according to a time difference.
The method according to claim 6, wherein the time difference is calculated through an initialization process before the first time period, and the method further comprises:

The display screen displays first prompt information for prompting the user to make a specified physical action;

The motion sensor acquires motion sensing data in the initialization process, and the motion sensing data in the initialization process is triggered by the specified physical action made by the user;

The camera acquires image data in the initialization process, and the image data in the initialization process includes information about the specified limb movements made by the user;

The processor determines the time difference according to the signal characteristics of the motion sensing data in the initialization process and the signal characteristics of the image data in the initialization process.
The method according to claim 7, wherein the method further comprises:

The processor determines initial relative information between the user and the camera according to the image data in the initialization process.
The method according to claim 8, wherein the processor calibrates the initial motion sensing data according to the first constraint condition, and obtaining the target motion sensing data comprises:

The processor determines an initial human arm engineering model according to the initial relative information, and the initial human arm engineering model includes at least one first value range of a limb movement angle;

The processor updates the first constraint condition according to the initial human arm engineering model to obtain the updated first constraint condition;

The processor calibrates the initial motion sensing data according to the updated first constraint condition to obtain the target motion sensing data.
The method according to claim 9, wherein the processor updates the first constraint condition according to the initial human arm engineering model, and obtaining the updated first constraint condition comprises:

The processor determines first relative information between the user and the camera according to the first image data;

When the first relative information is different from the initial relative information, the processor updates the initial human arm engineering model according to the first relative information to obtain a first human arm engineering model;

The processor updates the first constraint condition according to the first human arm engineering model to obtain the updated first constraint condition.
The method according to claim 8, wherein the control information obtained by the processor according to the target motion sensing data comprises:

The processor determines the initial mapping relationship of the user in the display device according to the initial relative information;

The processor performs coordinate transformation processing on the target motion sensing data according to the initial mapping relationship to obtain the control information.
The method according to claim 11, wherein the processor performs coordinate transformation processing on the target motion sensing data according to the initial mapping relationship, and obtaining the control information comprises:

The processor determines first relative information between the user and the camera according to the first image data;

When the first relative information is different from the initial relative information, the processor updates the initial mapping relationship according to the first relative information to obtain a first mapping relationship;

The processor performs coordinate transformation processing on the target motion sensing data according to the first mapping relationship to obtain the control information.
The method according to any one of claims 1 to 12, wherein the motion sensor comprises a sensing unit of one or more sensors among an accelerometer, a gyroscope, and a magnetometer.
The method according to any one of claims 1 to 13, wherein the camera comprises one or more cameras selected from a depth camera and a non-depth camera.
A first electronic device, comprising a motion sensor and a processor, is characterized in that:

The motion sensor is used to acquire initial motion sensing data of a first sampling frequency within a first time period, where the initial motion sensing data is triggered by a user's limb movements;

The processor is configured to calibrate the initial motion sensing data to obtain target motion sensing data according to the acquired first constraint condition, wherein the first constraint condition is obtained by the camera within the first time period The first image data of the second sampling frequency is obtained by performing computer vision CV processing, the second sampling frequency is less than the first sampling frequency, and the first image data includes the user's limb movement information;

The processor is further configured to obtain control information according to the target motion sensing data, where the control information is used to control the display content of the display screen;

Wherein, the camera and the display screen are included in a second electronic device different from the first electronic device.
The first electronic device according to claim 15, wherein the processor is specifically configured to:

The calibration data is obtained by mapping according to the first constraint condition, and a first curve is obtained by fitting based on the calibration data;

A second curve is obtained by fitting according to the initial motion sensing data, and weighted average processing is performed on the first curve and the second curve to obtain a third curve;

determining calibrated motion sensing data in the third curve;

The calibrated motion sensing data is processed according to an attitude calculation algorithm to obtain the target motion sensing data.
The first electronic device according to claim 15 or 16, wherein the processor is specifically configured to:

Process the initial motion sensing data according to the attitude settlement algorithm to obtain the first attitude angle data;

A fourth curve is obtained by fitting according to the first attitude angle data, and a fifth curve is obtained by fitting according to the first constraint;

performing weighted average processing on the fourth curve and the fifth curve to obtain a sixth curve;

The target motion sensing data is determined in the sixth curve.
The first electronic device according to any one of claims 15 to 17, wherein the processor is further configured to:

The first constraint and the initial motion sensing data are aligned according to a time difference.
The first electronic device according to claim 18, wherein the time difference is calculated through an initialization process before the first time period;

The motion sensor is also used to obtain the motion sensing data in the initialization process, and the motion sensing data in the initialization process is triggered by the specified limb movements made by the user;

The processor is further configured to determine the time difference according to the signal characteristics of the motion sensing data in the initialization process and the signal characteristics of the image data in the initialization process, wherein the image data in the initialization process is: The camera is acquired during the initialization process, and the image data during the initialization process includes the information on the designated body movements made by the user.
The first electronic device according to claim 19, wherein the processor is further configured to:

The initial relative information between the user and the camera is determined according to the image data in the initialization process.
The first electronic device according to claim 20, wherein the processor is specifically configured to:

Determine an initial human arm engineering model according to the initial relative information, where the initial human arm engineering model includes at least one first range of limb movement angles;

Update the first constraint condition according to the initial human arm engineering model to obtain the updated first constraint condition;

The initial motion sensing data is calibrated according to the updated first constraint condition to obtain the target motion sensing data.
The first electronic device according to claim 21, wherein the processor is specifically configured to:

determining first relative information between the user and the camera according to the first image data;

When the first relative information is different from the initial relative information, updating the initial human arm engineering model according to the first relative information to obtain a first human arm engineering model;

The processor updates the first constraint condition according to the first human arm engineering model to obtain the updated first constraint condition.
The first electronic device according to claim 19, wherein the processor is further configured to:

Determine the initial mapping relationship of the user in the display device according to the initial relative information;

The control information is obtained by performing coordinate transformation processing on the target motion sensing data according to the initial mapping relationship.
The first electronic device according to claim 23, wherein the processor is specifically configured to:

determining first relative information between the user and the camera according to the first image data;

When the first relative information is different from the initial relative information, updating the initial mapping relationship according to the first relative information to obtain a first mapping relationship;

The control information is obtained by performing coordinate transformation processing on the target motion sensing data according to the first mapping relationship.
The first electronic device according to any one of claims 15 to 24, wherein the motion sensor comprises a sensing unit of one or more sensors among an accelerometer, a gyroscope, and a magnetometer.
A second electronic device, comprising a camera and a display screen, is characterized in that:

The camera is used to acquire first image data of a second sampling frequency in a first time period, the first image data includes user limb movement information; wherein the first image data is used to determine a first constraint condition, and The first constraint condition is used to calibrate the initial motion sensing data to obtain target motion sensing data, and the initial motion sensing data is that the motion sensor in the first electronic device is based on the first sampling in the first time period Frequency sampling is obtained, and the initial motion sensing data is triggered by the user's limb movements; the second sampling frequency is less than the first sampling frequency;

The display screen is used to display control information, wherein the control information is obtained based on the target motion sensing data.
The second electronic device according to claim 26, wherein the first constraint condition is obtained by recognizing human skeleton key points in computer vision CV processing on the first image data, and the first Constraints include three-dimensional space orientation angle information.
The second electronic device according to claim 26 or 27, wherein,

The control information is coordinate data obtained by performing coordinate transformation on the target motion sensing data, and the coordinate data is used to control the display position of the cursor in the display screen, or,

The control information is a gesture identification result obtained by mapping the target motion sensing data, and the gesture identification result is used to operate an interface element of the display screen.
The second electronic device according to any one of claims 26 to 28, wherein,

The display screen is also used to display the first prompt information, used to prompt the user to make a specified physical action;

The camera is further configured to acquire image data in the initialization process in the initialization process before the first time period, where the image data in the initialization process includes the specified body motion information made by the user;

Wherein, the signal characteristics of the image data in the initialization process and the signal characteristics of the motion sensing data in the initialization process determine the time difference, and the time difference is used to align the first constraint condition and the initial motion signal. sensing data, and the motion sensing data in the initialization process is collected by the second electronic device during the initialization process.
The second electronic device according to any one of claims 26 to 29, wherein the camera comprises one or more cameras selected from a depth camera and a non-depth camera.
A computer-readable storage medium, characterized in that the medium stores instructions, and when the instructions are executed by a computer, the method executed by the first electronic device according to any one of claims 15 to 25 is implemented.
A computer-readable storage medium, characterized in that the medium stores instructions, and when the instructions are executed by a computer, the method executed by the second electronic device in any one of claims 26 to 30 is implemented.
A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 14.