CN115454236A

CN115454236A - Gesture cursor mapping method based on machine vision, network equipment and storage medium

Info

Publication number: CN115454236A
Application number: CN202211015270.0A
Authority: CN
Inventors: 贺垟瑒; 贺欣
Original assignee: Xinhuasan Intelligent Terminal Co ltd
Current assignee: Xinhuasan Intelligent Terminal Co ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-12-09

Abstract

The application provides a gesture cursor mapping method based on machine vision, network equipment and a storage medium, wherein the gesture cursor mapping method based on machine vision comprises the steps of acquiring a key point set of a body; the key point set comprises two end point coordinates of shoulders, and the shoulder width D is calculated for the two end point coordinates of the shoulders as a and b respectively; acquiring a hand key point coordinate m; calculating the hand central point coordinate c according to the hand key point coordinate; calculating coordinates of two diagonal points of the virtual frame according to the shoulder width D; and obtaining cursor mapping coordinates M of the hand key points on the projection screen based on the coordinate p and q of the virtual frame pair point, the obtained coordinate M of the hand key points, and the length width and height of the projection screen. According to the method and the device, the size and the position of the virtual frame are added, and the virtual frame is generated through an algorithm, so that the cursor moving speed can be self-adaptive under different distances, the operation range is smaller, the user fatigue is reduced, and the cursor moving speed can be adjusted.

Description

Gesture cursor mapping method based on machine vision, network equipment and storage medium

Technical Field

The present application relates to the field of communications devices, and in particular, to a gesture cursor mapping method based on machine vision, a network device, and a storage medium.

Background

With the development of computer performance and AI field, more interaction modes appear in the human-computer interaction field, and compared with the traditional interaction modes such as a mouse, a keyboard, a touch screen, a remote controller and the like, the gesture control based on machine vision brings brand new operation experience. In the field of large-screen terminals, gesture control becomes an indispensable function for high-end machines under flags of various manufacturers.

In the gesture interaction process based on vision, the gesture control of the user is simulated into mouse operation, so that the learning cost of the user can be reduced, the current software ecology can be better adapted, and a gesture interaction mode is not required to be adapted again by third-party application software. The most important thing in the process of simulating mouse control by gesture control is cursor mapping, and the accuracy and stability of cursor mapping are the keys of user experience. At present, cursor mapping is often realized by adopting a direct mapping mode, and the biggest problem of the mode is that the moving speeds of mapping cursors at different distances are inconsistent, and the cursor moving speed is slower as the distance is farther. Meanwhile, the cursor movement in the direct mapping mode requires a larger gesture operation range of the user, which will cause the fatigue of the gesture operation of the user to be increased.

Disclosure of Invention

In order to overcome the problems in the related art, the application provides a gesture cursor mapping method based on machine vision, a network device and a storage medium.

According to a first aspect of embodiments of the present application there is provided a machine vision based gesture cursor mapping method,

acquiring a key point set of a body;

the key point set comprises two end point coordinates of the shoulder which are respectively a (x) _a ,y _a )，b(x _b ,y _b )；

Two endpoint coordinates a (x) through the shoulder _a ,y _a )，b(x _b ,y _b ) Calculating the shoulder width D;

obtaining the coordinates m (x) of key points of hands _m ,y _m )；

Calculating the coordinates c (x) of the hand center point according to the coordinates of the hand key points _c ,y _c )；

Calculating the coordinates p (x) of two diagonal points of the virtual frame according to the shoulder width D _p ,y _p )＝(x _c -D/2,y _c -D/2)q(x _q ,y _q )＝(x _c +D/2,y _c +D/2)；

Obtaining cursor mapping coordinates M (x) of the hand key points on the projection screen based on the coordinate p and q of the virtual frame pair point, the obtained coordinate M of the hand key points, and the length width and height of the projection screen _M ,y _M )，(x _M ,y _M )＝(width*(x _m -x _p )/(x _q -x _p ),height*(y _m -y _p )/(y _q -y _p ))。

Preferably, the method further comprises obtaining a cursor speed gain, including

Calculating the current speed grade to obtain the speed V = V _min +(K-1)/(N-1)*(V _max -V _min )；

gain＝V/V _max ；

Wherein, V _min Indicating the minimum cursor movement speed, V, supported by the system _max The maximum moving speed of the cursor supported by the system is shown, N is the number N of speed grades more than or equal to 1, and the current speed grade K belongs to [1, N')]；

Two diagonal point coordinates p (x) of virtual frame _p ,y _p )＝(x _t -gain*D/2,y _t -gain*D/2)q(x _q ,y _q )＝(x _t +gain*D/2,y _t +gain*D/2)。

Preferably, also comprises

Judging the coordinates m (x) of key points of hands _m ,y _m ) Whether it is within a virtual box;

if not, updating the virtual frame;

if the cursor is in the virtual frame, calculating the mapping coordinates of the cursor.

Preferably, updating the virtual frame includes:

case 1:

x _m <x _p offset = x _p -x _m ；x _p ＝x _m ；x _q ＝x _q -offset；

Case 2:

x _m >x _q offset = x _m -x _q ；x _q ＝x _m ；x _p ＝x _p +offset；

Case 3:

y _m <y _t offset = y _p -y _m ；y _p ＝y _m ；y _q ＝y _q -offset；

Case 4:

y _m >y _q offset = y _m -y _q ；y _q ＝y _m ；y _p ＝y _p +offset。

Preferably, before calculating the shoulder width, the coordinates a (x) of the two end points of the shoulder are calculated _a ,y _a )，b(x _b ,y _b ) And (5) performing filtering processing.

The network device provided by the second aspect of the present application includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor executes the method for mapping the gesture cursor based on the machine vision.

A third aspect of the present application provides a storage medium having stored thereon computer program instructions for implementing the above-described machine vision based gesture cursor mapping method when executed by a processor.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in the embodiment of the application, the scale conversion is performed on the traditional direct mapping mode by adding the virtual frame, and the virtual frame is different from the direct mapping mode in that the size and the position of the picture are fixed. The size and the position of the virtual frame are generated through an algorithm, so that the cursor moving speed can be self-adapted under different distances, the fatigue of a user is reduced by a smaller operation range, and the cursor moving speed can be adjusted.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments consistent with the present application and together with the application, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of a gesture interaction flow;

fig. 2 is a schematic diagram of obtaining coordinates of a whole body key point by a MediaPipe pos model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of obtaining coordinates of a hand key point by the MediaPipe handles model according to the embodiment of the present application;

FIG. 4 is a flowchart illustrating a cursor mapping method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a transformation relationship between a virtual frame and a projection screen according to an embodiment of the present application;

fig. 6 is a schematic diagram of a hardware framework of a network device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

First, a more complete flow of gesture interaction is described. The method comprises the steps of data acquisition, hand key point identification, data preprocessing and control right judgment, key point gesture conversion, instruction matching, smooth filtering, cursor mapping and system response, and is shown in figure 1.

Specifically, data acquisition is used to acquire pictures through a monocular camera, and the sampling rate is determined by the camera frame rate. The key point collection can be realized by adopting a Mediapipe framework, and is a multimedia machine learning model application framework developed and sourced by Google. And the hand key point identification is used for transmitting the data acquisition picture into a hand key point detection model for detection, and if the hand is identified, outputting the key point coordinate information of the hand. The key coordinate information of the hand is usually a set of coordinates. The more common is 21 key points, and the data processing amount can be increased or decreased according to the data processing capacity and the actual demand. The data preprocessing and control right judgment is used for preprocessing and controlling right judgment on the identified key point coordinate information, outputting a key point coordinate set of the hand with the control right, and inputting the key point coordinate into a key point-to-gesture model. And (4) converting the key points to gestures, and transmitting the output of the previous step into a gesture classification model of the training number for reasoning. And outputting the name corresponding to the current gesture, such as FIVE, FIRST, POINTER and the like. The command matching is based on a command matching algorithm, and the operation command matched with the current gesture is output, such as MOVE, CLICK _ DOWN/UP and the like. The smooth filtering (cursor anti-shake) refers to filtering the coordinate information of the key point. Cursor mapping (cursor positioning) refers to converting the keypoint coordinates to coordinates of a cursor on a projection screen. The system performs command processing and cursor positioning in response to the finger.

In order to solve the problems in the background art, an embodiment of the present application provides a gesture cursor mapping method based on machine vision, as shown in fig. 4, including:

a set of keypoint coordinates of the body is obtained. The embodiment of the application is obtained through the MediaPipe Pose model under the MediaPipe framework. The MediaPipe position model is a model for high fidelity body posture tracking. The keypoint coordinates of the whole body, here 33 keypoint coordinates, can be deduced from the single frame picture. Wherein the coordinates of two endpoints including the shoulder are respectively a (x) _a ,y _a )，b(x _b ,y _b ) Calculating shoulder width

As shown in fig. 2.

As shown in FIG. 3, the coordinates m (x) of the hand key points are obtained _m ,y _m ) And acquiring a hand key point coordinate set through a MediaPipe handles model under the MediaPipe architecture. The MediaPipe handles model is a high fidelity hand and finger tracking model. The hand keypoint coordinates, here 21 keypoints, were inferred from the single frame picture using machine learning. Each keypoint output by the Mediapipe handles model consists of x, y, and z. x and y are normalized to [0.0,1.0, respectively, by image width and height]. z represents the depth of the coordinate with the depth at the wrist as the origin, the smaller the value, the closer the coordinate is to the camera, and the size of z uses about the same scale as x.

Calculating the coordinates c (x) of the hand center point according to the coordinates of the hand key points _c ,y _c ) (ii) a Calculated from algorithms previously determined, such as weighting or linear accumulation using hand keypoint coordinates.

According to the shoulder width D and the hand center point coordinate c (x) _c ,y _c ) Calculating two diagonal point coordinates p (x) of the virtual frame _p ,y _p )＝(x _c -D/2,y _c -D/2)，q(x _q ,y _q )＝(x _c +D/2,y _c + D/2) as shown in FIG. 5.

Obtaining a cursor mapping coordinate M (x) of the hand key point on the projection screen based on the coordinate p and q of the virtual frame coordinate pair, the obtained coordinate M of the hand key point, and the length width and height of the projection screen _M ,y _M )，(x _M ,y _M )＝(width*(x _m -x _p )/(x _q -x _p ),height*(y _m -y _p )/(y _q -y _p ) As shown in fig. 5). Different from direct mapping mode (x) _M ,y _M )＝(width*x _m ，height*y _m ) And introducing cursor mapping of the virtual frame, firstly calculating the position of the key point relative to the virtual frame, and then multiplying the position by the width and the height of the screen to obtain the final coordinates on the screen. Obviously, when the size of the key point of the hand is not changed, the larger the virtual frame is, the slower the cursor moving speed is; the smaller the virtual frame, the faster the cursor moves. The virtual frame is generated according to key points of the body and the hand of the user, when the user is close to the camera, the proportion of the user in the camera picture is large, and the generated virtual frame is large; when the user is far away from the camera, the proportion of the user in the camera picture is small, and the generated virtual frame is small. The virtual frame based cursor mapping can have consistent cursor movement speed at different distances. Meanwhile, the generation of the virtual frame depends on the shoulder width and the hand center position, so that a user can control the cursor at any corner of the screen with small enough movement amplitude in the operation process, and the fatigue of gesture interaction is reduced.

The embodiment of the application further comprises the following steps of obtaining the cursor speed gain after calculating the shoulder width D: includes calculating a current speed level acquisition speed V = V _min +(K-1)/(N-1)*(V _max -V _min )；gain＝V/V _max (ii) a Wherein, V _min Supported by the presentation systemMinimum moving speed of cursor, V _max The maximum moving speed of the cursor supported by the system is shown, N is the number N of speed grades more than or equal to 1, and the current speed grade K belongs to [1, N')](ii) a Two diagonal point coordinates p (x) of virtual frame _p ,y _p )＝(x _t -gain*D/2,y _t -gain*D/2)q(x _q ,y _q )＝(x _t +gain*D/2,y _t + gain D/2). The speed gain is used for realizing the adjustable moving speed of the cursor.

The embodiment of the application also comprises the step of judging the coordinates m (x) of the key points of the hand part _m ,y _m ) Whether it is within a virtual box; if not, updating the virtual frame; and if the cursor is in the virtual frame, calculating the cursor mapping coordinate. The method for updating the virtual frame comprises the following steps:

case 1: x is the number of _m <x _p Offset = x _p -x _m ；x _p ＝x _m ；x _q ＝x _q -offset；

Case 2: x is a radical of a fluorine atom _m >x _q Offset = x _m -x _q ；x _q ＝x _m ；x _p ＝x _p +offset；

Case 3: y is _m <y _t Offset = y _p -y _m ；y _p ＝y _m ；y _q ＝y _q -offset；

Case 4: y is _m >y _q Offset = y _m -y _q ；y _q ＝y _m ；y _p ＝y _p +offset。

When the cursor moves to the screen boundary, the virtual frame is subjected to self-adaptive adjustment, and no error of moving out of the virtual frame exists. Compared with the simple method of judging whether the cursor is directly set as the screen boundary after moving out of the virtual frame or not, the self-adaptive adjustment can effectively solve the damping phenomenon when the cursor returns from the boundary.

Before calculating the shoulder width D, the embodiment of the application is used for calculating the coordinates a (x) of two end points of the shoulders _a ,y _a )，b(x _b ,y _b ) And (5) performing filtering processing. It may be a data filtering processing method such as smoothing filtering, wavelet filtering, or the like.

The second aspect of the embodiments of the present application further provides a network device, as shown in fig. 6, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to perform the above-mentioned gesture cursor mapping method based on machine vision. And acquiring pictures shot by the camera, such as a computer, an iPad and the like.

The third aspect of the embodiments of the present application further provides a storage medium, on which computer program instructions are stored, and the program instructions, when executed by a processor, are configured to implement the above-mentioned machine vision-based gesture cursor mapping method.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed.

Claims

1. A gesture cursor mapping method based on machine vision is characterized in that,

acquiring a key point set of a body;

the key point set comprises two end point coordinates of shoulders, and the two end point coordinates of the shoulders are a (x) respectively _a ,y _a )，b(x _b ,y _b )；

Two endpoint coordinates a (x) through the shoulder _a ,y _a )，b(x _b ,y _b ) Calculating shoulder width D;

obtaining the coordinates m (x) of key points of hands _m ,y _m )；

2. The machine-vision-based gesture cursor mapping method according to claim 1, further comprising obtaining a cursor speed gain comprising

gain＝V/V _max ；

Two diagonal point coordinates p (x) of the virtual frame _p ,y _p )＝(x _t -gain*D/2,y _t -gain*D/2)q(x _q ,y _q )＝(x _t +gain*D/2,y _t +gain*D/2)。

3. The machine vision-based gesture cursor mapping method according to claim 1 or 2, further comprising

if not, updating the virtual frame;

and if the cursor is in the virtual frame, calculating the cursor mapping coordinate.

4. The machine-vision-based gesture cursor mapping method of claim 3, wherein updating the virtual box comprises:

case 1:

x _m <x _p offset = x _p -x _m ；x _p ＝x _m ；x _q ＝x _q -offset；

Case 2:

x _m >x _q offset = x _m -x _q ；x _q ＝x _m ；x _p ＝x _p +offset；

Case 3:

y _m <y _t offset = y _p -y _m ；y _p ＝y _m ；y _q ＝y _q -offset；

Case 4:

y _m >y _q offset = y _m -y _q ；y _q ＝y _m ；y _p ＝y _p +offset。

5. The machine-vision based gesture cursor mapping method of claim 1, wherein two endpoint coordinates a (x) for the shoulder are calculated before the shoulder width is calculated _a ,y _a )，b(x _b ,y _b ) And (5) performing filtering processing.

6. Network device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to perform a method comprising the machine vision based gesture cursor mapping method according to any one of claims 1 to 5.

7. Storage medium having stored thereon computer program instructions, characterized in that the program instructions, when executed by a processor, are adapted to implement the machine vision based gesture cursor mapping method of any of claims 1-5.