CN111913584B

CN111913584B - Mouse cursor control method and system based on gesture recognition

Info

Publication number: CN111913584B
Application number: CN202010834455.9A
Authority: CN
Inventors: 陈康; 易金; 王俊; 林瑞全; 欧明敏; 武义; 邢新华; 赵显煜; 李振嘉; 郑炜
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2022-04-01
Anticipated expiration: 2040-08-19
Also published as: CN111913584A

Abstract

The invention relates to a mouse cursor control method and system based on gesture recognition. A gesture image recognition method fusing a PHOG (pyramid history of organized gradients) feature and an improved LBP (local Binary Pattern) feature and based on a K-NN (K-nearest neighbor classification) classifier is provided. In order to improve the real-time performance of the system operation, the system determines whether the current frame has a human hand or not through skin color detection. When a human hand is detected, the PHOG features and the improved LBP features are further extracted. After the PHOG characteristics and the improved LBP characteristics are fused, a K-NN classifier is adopted to realize gesture classification recognition. The invention realizes the quick and accurate recognition of the user gesture under the complex background conditions of different angles, different light rays and the like, and accurately controls the action of the mouse cursor in real time according to the recognition result.

Description

Mouse cursor control method and system based on gesture recognition

Technical Field

The invention relates to the technical field of image processing and the field of man-machine interaction, in particular to a mouse cursor control method and system based on gesture recognition.

Background

Gesture recognition technology is one of the important research contents in the field of human-computer interaction. Through the gesture recognition technology, the computer equipment and the robot can be controlled more naturally and effectively. The limitation of traditional human-computer interaction such as keyboard, mouse and remote control is broken through, and the friendliness of human-computer interaction is greatly improved.

The development of gesture recognition goes through two stages. The first stage is a gesture recognition technology represented by a wearable device. Many sensors are installed in these devices. The second stage is a gesture recognition technique based on computer vision. And in the second stage, electronic equipment such as a camera is used for capturing, identifying and tracking the object instead of human eyes, and corresponding processing and analysis are carried out to obtain a final result. In the gesture recognition technology based on computer vision, the common method is to extract the gesture features and then use a classifier to perform classification recognition. LBP and HOG are more commonly used gesture feature descriptors. The HOG features have good geometric invariance and optical invariance, and are usually used as features to extract edge and contour information of a target image. LBP is often used to describe local texture feature information of an image. As the study on the LBP features becomes deeper and deeper, the LBP features are changed more and more, and the application is wider and wider.

Although there has been some research into gesture recognition, there are still some inherent problems in this area. Research has shown that gestures of different scales, angles, grayscales and positions all reduce the recognition rate.

Disclosure of Invention

In view of the above, the present invention provides a mouse cursor control method and system based on gesture recognition, which provides a new mouse control method to make the operation of the computer more flexible.

The invention is realized by adopting the following scheme: a mouse cursor control method based on gesture recognition comprises the following steps:

step S1: calling a computer camera or an external USB camera on an MATLAB platform to acquire user gesture video data;

step S2: carrying out median filtering image preprocessing on the collected user gesture video data;

step S3: carrying out skin color detection on the filtered image based on an elliptical clustering model, and carrying out binarization processing on video data according to the elliptical skin color clustering model so as to segment a gesture area of the current video frame;

step S4: setting a threshold value of skin color pixel points, and counting the number of the detected skin color pixel points; if the number of the skin color pixel points exceeds a set threshold value beta, and the number is 50< beta <100, the gesture is detected in the current frame, and the HOG feature extraction of the step S5 is carried out; if the number of the skin color pixel points does not exceed the set threshold, the current frame is considered to have no detected gesture, and the next frame data is processed;

step S5: extracting PHOG characteristics of the gesture area;

step S6: improving the traditional LBP characteristics to obtain MLBP characteristics, and fusing the MLBP characteristics and the PHOG characteristics, namely connecting an MLBP characteristic vector and a PHOG characteristic vector in series to obtain a fused characteristic vector;

step S7: making a characteristic template database;

step S8: comparing the fusion features extracted in real time in the step S6 with the features in the feature template database by using a K-NN classification algorithm to obtain an identification result;

step S9: and controlling the motion of the mouse cursor in real time according to the identification result, namely flexibly controlling the mouse to move to any position on a screen by setting the position of the mouse cursor after calling java.

Further, the step S2 specifically includes the following steps:

step S21: converting the collected RGB video data into YCbCr video data;

step S22: and performing median filtering on the YCbCr video data to obtain clearer video data.

Further, in step S3, based on the elliptical clustering model skin color detection, the elliptical skin color clustering model formula is as follows:

and carrying out binarization processing on the video data according to the oval skin color clustering model, wherein a binarization formula is as follows:

if the value after binarization is 1, the color is a skin color area, namely a gesture area; if the value after binarization is 0, the non-skin color area is a non-gesture area; in this equation, θ is 2.53，c_x＝109.38，c_y＝152.02，ec_x＝1.60，ec_y＝2.41，a＝25.39，b＝14.03。

Further, the step S5 specifically includes the following steps:

step S51: gamma correction is carried out on the gesture area image after normalization processing;

step S52: calculating the gradient amplitude and the gradient direction of the corrected gesture area pixel point (x, y), wherein the calculation formula is as follows:

G_x(x,y)＝H(x+1,y)-H(x-1,y)

G_y(x,y)＝H(x,y+1)-H(x,y-1)

G_x(x, y) is the gradient in the horizontal direction, G_y(x, y) is a vertical direction gradient, G (x, y) gradient magnitude, and theta (x, y) gradient direction;

step S53: dividing the normalized gesture area image into a plurality of cell cells, calculating gradient information of each cell by adopting a histogram of 9 bins, namely dividing the gradient direction of the cell into 9 directions, and performing weighted projection on pixels in each cell in the gradient direction according to the gradient amplitude to obtain a gradient direction histogram of the cell, namely the HOG characteristic of the cell; connecting HOG characteristics of a plurality of cells in series to form a larger block HOG characteristic; connecting HOG features of all blocks in the gesture area in series to form HOG features of the image;

step S54: dividing the gesture area into three block scales, and extracting the HOG characteristics of the gesture area on the three scales respectively:

the HOG features extracted from the original gesture area are HOG features of a first scale, the HOG features extracted after the original gesture area image is divided into 2 x 2 sub-blocks are HOG features of a second scale, and the HOG features extracted after the original gesture area image is divided into 4 x 4 sub-blocks are HOG features of a third scale;

and connecting HOG features of three scales in series to obtain the PHOG feature.

Further, the step S6 of improving the conventional LBP feature to obtain the MLBP feature specifically includes the following steps:

step Sa: calculating the coordinates of the neighborhood pixel points according to the coordinates of the current center pixel point; wherein (X)_c,Y_c) Is the coordinate of the central pixel point, R is the sampling radius, P is the number of sampling points, i.e. the number of field pixel points, (X)_p,Y_p) Is the neighborhood pixel point coordinate;

and Sb: carrying out bilinear interpolation calculation on the calculated coordinate values of the neighborhood pixel points, and rounding the coordinate values to obtain an interpolation formula as follows:

step Sc: sorting P neighborhood pixel values around the central pixel, and after removing the minimum value and the maximum value, solving a mean value as a binary threshold value; MLBP characteristic calculation formula is as follows, wherein g_mThe mean value obtained after the minimum value and the maximum value are removed from the neighborhood pixels;

further, the specific content of step S7 is: under the environment close to the actual operation of the system, 50 pictures of various predefined gestures are collected, and the PHOG and MLBP fusion characteristics of each picture are extracted according to the steps from S1 to S6 and stored in a template database.

Further, the specific content of step S8 is:

calculating Euclidean distance between a target characteristic vector and an actual characteristic vector in a database; the calculation formula of the euclidean distance is as follows, where T ═ T (T)₁,t₂,…t_n) A feature vector representing an input frame image, Q ═ Q₁,q₂,...q_n) Representing the feature vectors stored in the template database;

selecting the first K features closest to the template database according to the calculation result, wherein the category with the highest frequency in the K features is the identification result; the selection of the K value is determined through K-fold cross validation, wherein when the recognized gesture is gesture 1, the left movement of the mouse is realized; when the recognized gesture is gesture 2, the right movement of the mouse is realized; when the recognized gesture is gesture 3, the mouse is moved upwards; and when the recognized gesture is a gesture 4, the downward movement of the mouse is realized. When the recognized gesture is a gesture fist, left-clicking of the mouse is realized; when the recognized gesture is a gesture palm, right click of the mouse is realized; setting gesture 1 as a gesture when the human hand stretches out one finger, gesture 2 as a gesture when the human hand stretches out two fingers, gesture 3 as a gesture when the human hand stretches out three fingers, and gesture 4 as a gesture when the human hand stretches out four fingers.

Further, the present invention also provides a mouse cursor control system based on gesture recognition, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the computer program is run by the processor, the steps of the method as described above are implemented.

Compared with the prior art, the invention has the following beneficial effects:

the invention directly collects the user video data by using the MATLAB software platform, can accurately detect and recognize the gesture of the user, and can control the movement of the mouse cursor according to the recognized result. The invention provides a new mouse control mode, which enables the operation of a computer to be more flexible and solves some problems in human-computer interaction.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

FIG. 2 is a flowchart illustrating image preprocessing according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the present embodiment provides a mouse cursor control method based on gesture recognition, including the following steps:

step S2: carrying out median filtering image preprocessing on the collected user gesture video data; as shown in fig. 2

step S5: extracting PHOG characteristics of the gesture area;

step S7: making a characteristic template database;

In this embodiment, the step S2 specifically includes the following steps:

step S21: converting the collected RGB video data into YCbCr video data;

In this embodiment, in step S3, based on the elliptical cluster model skin color detection, the elliptical skin color cluster model formula is as follows:

if the value after binarization is 1, the color is a skin color area, namely a gesture area; if the value after binarization is 0, the non-skin color area is a non-gesture area;

in this formula, θ is 2.53, c_x＝109.38，c_y＝152.02，ec_x＝1.60，ec_y＝2.41，a＝25.39，b＝14.03。

In this embodiment, the step S5 specifically includes the following steps:

G_x(x,y)＝H(x+1,y)-H(x-1,y)

G_y(x,y)＝H(x,y+1)-H(x,y-1)

dividing the gesture area into a plurality of cells, and counting a gradient direction histogram of each cell; combining the histograms of the cells into a block histogram, and combining the histograms of all the blocks into an image histogram;

In this embodiment, the step S6 of improving the conventional LBP feature to obtain the MLBP feature specifically includes the following steps:

in this embodiment, the specific content of step S7 is: under the environment close to the actual operation of the system, 50 pictures of various predefined gestures are collected, and the PHOG and MLBP fusion characteristics of each picture are extracted according to the steps from S1 to S6 and stored in a template database.

In this embodiment, the specific content of step S8 is:

Preferably, the present embodiment further provides a mouse cursor control system based on gesture recognition, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the computer program is run by the processor, the method steps as described above are implemented.

Preferably, in this embodiment, a computer-carried camera or an external USB camera is called on the MATLAB platform to acquire user video data, the acquired video data is processed in the MATLAB platform in real time, and the real-time control of the mouse cursor is realized through a processing result. The video data processing comprises converting the collected RGB video data into YCbCr format and then carrying out median filtering processing. And (4) detecting the filtered data based on an elliptical skin color model, and segmenting the gesture area. Normalizing the segmented gesture area, extracting PHOG characteristics and MLBP characteristics from the segmented gesture area, and performing classification and identification by using a KNN classifier. And controlling the movement of the mouse cursor according to the classification recognition result. It is preferable to propose a method for describing the gesture features in combination with the PHOG features and the improved LBP features. And identifying the gesture of the current frame by comparing the distances between the target feature vector and the actual feature vector in the database by using the K-NN as a classifier.

The embodiment can simply and efficiently realize the control of the mouse cursor and provide an efficient gesture recognition method for other gesture recognition occasions.

Preferably, in this embodiment, the MATLAB is used to invoke a camera of the computer or an external USB camera to acquire video data, detect and recognize gestures of a person in the video, and control the cursor to move or click in real time according to the gestures. A gesture image recognition method fusing a PHOG (pyramid history of organized gradients) feature and an improved LBP (local Binary Pattern) feature and based on a K-NN (K-nearest neighbor classification) classifier is provided. In order to improve the real-time performance of the system operation, the system determines whether the current frame has a human hand or not through skin color detection. When a human hand is detected, the PHOG features and the improved LBP features are further extracted. After the PHOG characteristics and the improved LBP characteristics are fused, a K-NN classifier is adopted to realize gesture classification recognition. According to the embodiment, the gestures of the user can be quickly and accurately recognized under the complex background conditions of different angles, different light rays and the like, and the action of the mouse cursor can be accurately controlled in real time according to the recognition result.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A mouse cursor control method based on gesture recognition is characterized in that: the method comprises the following steps:

step S5: extracting PHOG characteristics of the gesture area;

step S7: making a characteristic template database;

step S9: after calling java.awt.robot class and initializing in the Matlab platform, the mouse can be flexibly controlled to move to any position on the screen by setting the position of the mouse cursor;

the step S5 specifically includes the following steps:

G_x(x,y)＝H(x+1,y)-H(x-1,y)

G_y(x,y)＝H(x,y+1)-H(x,y-1)

connecting HOG characteristics of three scales in series to obtain a PHOG characteristic;

in step S6, the improving the conventional LBP feature to obtain the MLBP feature specifically includes the following steps:

2. the mouse cursor control method based on gesture recognition according to claim 1, wherein: the step S2 specifically includes the following steps:

step S21: converting the collected RGB video data into YCbCr video data;

3. The mouse cursor control method based on gesture recognition according to claim 1, wherein: in step S3, the ellipse skin color clustering model based skin color detection is based on the following formula:

4. The mouse cursor control method based on gesture recognition according to claim 1, wherein: the specific content of step S7 is: under the environment close to the actual operation of the system, 50 pictures of various predefined gestures are collected, and the PHOG and MLBP fusion characteristics of each picture are extracted according to the steps from S1 to S6 and stored in a template database.

5. The mouse cursor control method based on gesture recognition according to claim 1, wherein: the specific content of step S8 is:

selecting the first K features closest to the template database according to the calculation result, wherein the category with the highest frequency in the K features is the identification result; the selection of the K value is determined through K-fold cross validation, wherein when the recognized gesture is gesture 1, the left movement of the mouse is realized; when the recognized gesture is gesture 2, the right movement of the mouse is realized; when the recognized gesture is gesture 3, the mouse is moved upwards; when the recognized gesture is a gesture 4, the downward movement of the mouse is realized; when the recognized gesture is a gesture fist, left-clicking of the mouse is realized; when the recognized gesture is a gesture palm, right click of the mouse is realized; setting gesture 1 as a gesture when the human hand stretches out one finger, gesture 2 as a gesture when the human hand stretches out two fingers, gesture 3 as a gesture when the human hand stretches out three fingers, and gesture 4 as a gesture when the human hand stretches out four fingers.

6. The utility model provides a mouse cursor control system based on gesture recognition which characterized in that: comprising a memory, a processor and a computer program stored on the memory and being executable on the processor, the method steps as claimed in any of the claims 1-5 being carried out when the computer program is executed by the processor.