CN112749664A

CN112749664A - Gesture recognition method, device, equipment, system and storage medium

Info

Publication number: CN112749664A
Application number: CN202110053294.4A
Authority: CN
Inventors: 张坡; 阎汉生; 朱腾; 周宇; 黎少
Original assignee: Guangdong College of Industry and Commerce
Current assignee: Guangdong College of Industry and Commerce
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2021-05-04

Abstract

The invention relates to a gesture recognition method, a gesture recognition device, gesture recognition equipment, a gesture recognition system and a storage medium. The method comprises the following steps: acquiring a hand image shot by a binocular camera system; carrying out distortion correction on the hand image to obtain a corrected image; carrying out color space processing on the corrected image to obtain an HSV image; carrying out threshold segmentation on the HSV image to obtain a segmented image; performing frame selection on the hand in the separation image to obtain a hand outline; and classifying and identifying the hand images by using a pre-trained classifier according to the hand contours to obtain an identification result. According to the method, the accuracy of gesture recognition is greatly improved through distortion correction and classifier recognition.

Description

Gesture recognition method, device, equipment, system and storage medium

Technical Field

The invention relates to the technical field of gesture recognition, in particular to a gesture recognition method, a gesture recognition device, gesture recognition equipment, a gesture recognition system and a storage medium.

Background

With the continuous development of Virtual Reality (VR) technology, interactive gesture recognition technology has come to work. In order to improve the sensing effect, the gesture recognition technology generally adopts an infrared LED gray camera mode to realize accurate recognition.

At present, generally, when an interactive gesture recognition technology of an infrared LED gray camera type is used for recognition, the coverage angle of the camera is concentrated in the range of 40-140 degrees of a forward horizontal plane of a corresponding lens, gesture interactive tracking at a higher angle cannot be realized, and if the camera is changed into a wide-angle camera, the camera has a distortion effect, and gesture segmentation and recognition cannot be accurately carried out.

Disclosure of Invention

In view of the above, the present invention provides a gesture recognition method, device, apparatus, system and storage medium to overcome the disadvantages of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a gesture recognition method, comprising:

acquiring a hand image shot by a binocular camera system;

carrying out distortion correction on the hand image to obtain a corrected image;

carrying out color space processing on the corrected image to obtain an HSV image;

carrying out threshold segmentation on the HSV image to obtain a segmented image;

performing frame selection on the hand in the separation image to obtain a hand outline;

and carrying out classification and identification in the hand image by utilizing a pre-trained classifier according to the hand contour to obtain an identification result.

Optionally, the method further includes:

and performing morphological closed operation on the image obtained after the threshold segmentation of the HSV image.

Optionally, the performing distortion correction on the hand image to obtain a corrected image includes:

calibrating the binocular camera system by adopting a black and white chessboard calibration plate, and solving internal parameters and external parameters of a camera in the binocular camera system;

and correcting by using a preset AI correction algorithm according to the internal parameters and the external parameters to obtain the corrected image.

Optionally, the classifying and recognizing the hand image by using a pre-trained classifier according to the hand contour to obtain a recognition result includes:

intercepting the hand contour in the hand image to obtain an intercepted image;

and carrying out image recognition on the intercepted image by using the classifier to obtain the recognition result.

Optionally, the process of training the classifier includes:

acquiring a set number of hand unfolding images and hand fist making images from a preset experiment platform;

setting labels for the hand unfolding image and the hand fist making image in a self-defined mode respectively;

training the classifier in conjunction with the label, the hand-unfolded image, and the hand-clenched image.

A gesture recognition apparatus comprising:

the hand image acquisition module is used for acquiring a hand image shot by the binocular camera system;

the distortion correction module is used for carrying out distortion correction on the hand image to obtain a corrected image;

the color processing module is used for carrying out color space processing on the corrected image to obtain an HSV image;

the threshold segmentation module is used for carrying out threshold segmentation on the HSV image to obtain a segmented image;

the framing module is used for framing the hand in the separation image to obtain a hand outline;

and the recognition module is used for carrying out classification recognition in the hand image by using a pre-trained classifier according to the hand contour to obtain a recognition result.

Optionally, the method further includes:

and the closed operation module is used for performing morphological closed operation on the image obtained after the threshold segmentation is performed on the HSV image.

A gesture recognition device, comprising:

a processor, and a memory coupled to the processor;

the memory is used for storing a computer program, and the computer program is at least used for executing the gesture recognition method;

the processor is used for calling and executing the computer program in the memory.

A gesture recognition system, comprising:

binocular camera system, and with binocular camera system communication connection as above gesture recognition equipment.

A storage medium storing a computer program which, when executed by a processor, implements the steps of the gesture recognition method as described above.

The technical scheme provided by the application can comprise the following beneficial effects:

the application discloses a gesture recognition method, which comprises the following steps: acquiring a hand image shot by a binocular camera system; carrying out distortion correction on the hand image to obtain a corrected image; carrying out color space processing on the corrected image to obtain an HSV image; carrying out threshold segmentation on the HSV image to obtain a segmented image; performing frame selection on the hand in the separation image to obtain a hand outline; and carrying out classification and identification in the hand image by utilizing a pre-trained classifier according to the hand contour to obtain an identification result. According to the method, the binocular camera system is used as input signal change judgment equipment, and on the basis of guaranteeing that the coverage angle is captured in a traditional mode and enlarged, the novel orthodontic algorithm is used for completing orthodontics so as to reduce the influence on subsequent VR gesture recognition, and therefore the accuracy of gesture recognition is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a gesture recognition method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a gesture recognition apparatus according to an embodiment of the present invention;

FIG. 3 is a block diagram of a gesture recognition apparatus according to an embodiment of the present invention;

fig. 4 is a block diagram of a gesture recognition system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

Fig. 1 is a flowchart of a gesture recognition method according to an embodiment of the present invention. Referring to fig. 1, a gesture recognition method includes:

step 101: and acquiring a hand image shot by the binocular camera system. The binocular stereo camera system is mainly used in the prior art, the coverage angle of the camera is in the range of 40-140 degrees, the binocular stereo camera system can shoot videos in a wider range and can shoot images in the range of 360 degrees, and therefore the situation that omission exists in gesture interactive tracking due to limitation of shooting angles is avoided.

Step 102: and carrying out distortion correction on the hand image to obtain a corrected image. Video or image that the camera was shot all can have the condition of distortion, consequently before carrying out gesture recognition, need carry out the distortion correction processing with the hand image earlier, the specific process of distortion correction processing in this application is as follows:

and calibrating the binocular camera system by adopting a black and white chessboard calibration plate, and solving the internal parameters and the external parameters of the camera in the binocular camera system. The internal parameters include focal length, lens radial distortion coefficient and the like, and the external parameters include camera position and direction, scanning angle, inclination angle and the like. And then correcting by using a preset AI correction algorithm according to the internal parameters and the external parameters to obtain the corrected image.

In order to solve the 4 internal parameters and 6 external parameters of the two monocular camera parameters, it is assumed that the black-and-white chessboard calibration plate used in the present application has N angular points, then the calibration plate is placed through M different poses, and the world coordinate is set as the first angular point of the chessboard calibration plate, so the restriction conditions for obtaining the images of the chessboard at different poses are as follows:

2NM≥6M+4；

although theoretically, N is 5, and M is 1, the requirement of the corner points can be met, all targets representing a plane projection view field only need 4 points, so that no matter how many corner points exist in a plane, only 4 pieces of corner point information are available, but in consideration of noise and stability, more corner points and pose requirements exist, and more calibration plate images with different poses need to be collected during calibration.

The coordinates of the checkerboard corner points and the coordinates of the checkerboard corner point image can determine key points, and a corresponding unit matrix is solved for each checkerboard image through least squares. And finally obtaining the internal parameters and the external parameters of the camera.

Step 103: and carrying out color space processing on the corrected image to obtain an HSV image.

Step 104: and carrying out threshold segmentation on the HSV image to obtain a segmented image. In the application, after color space conversion is carried out on the corrected image, self-adaptive threshold segmentation is carried out on the corrected image, and the hand and the background are separated to obtain a segmented image. Meanwhile, in order to obtain a better segmentation effect, morphological closed operation is carried out on the image segmented by the threshold value.

Step 105: and performing frame selection on the hand in the separated image to obtain a hand outline. Specifically, the hand frame is selected by detecting a rectangle and a convex hull which are externally connected with the outline.

Step 106: and carrying out classification and identification in the hand image by utilizing a pre-trained classifier according to the hand contour to obtain an identification result. The classifier adopted in the present application is an SVM classifier, and it should be noted that the type selection of the classifier is not fixed and can be determined according to the actual situation. And finally, intercepting the frame selection part in the original image and carrying out classification and identification by utilizing a trained SVM classifier. Specifically, the hand contour is firstly intercepted from the hand image, and then the intercepted image is identified by an SVM classifier to obtain an identification result.

The SVM classifier needs to be trained before an experiment, images of the unfolded hands and images of the fist made by the hands are obtained through an experiment platform, and tags are marked, for example, the image tag of the unfolded hands is 1, and the image tag of the fist made by the hands is 0. Then training is carried out through a large number of images with labels, and an SVM classifier is obtained.

After the SVM classifier is trained, the classification recognition result of the classifier needs to be verified. The application discloses a process for training an SVM classifier by a self-learning prediction optimization updating algorithm, wherein the optimization algorithm is defined as follows:

wherein the content of the first and second substances,

in the formula: c is a constant term, m is the number of samples, and n is a characteristic number; theta^TA vector of features; (x)⁽ⁱ⁾,y⁽ⁱ⁾) Is the ith sample; h is_θ(x) And (4) corresponding to a self-optimization model for the vector machine.

In the application, the classification object of the linear SVM classifier is an image with a gesture, and the classification result is 0 or 1, so that a calibrated picture can be used in a training sample of the SVM classifier, and data predicted by key points are not needed. However, the data + images of the keypoint predictions may be used as samples for training the SVM classifier.

If the data and the images predicted by the key points are used as samples to train the SVM classifier, the processing can be carried out according to the following gesture key point prediction algorithm and key point calibration, specifically as follows:

the probability that the predicted key point position of a certain point P in the graph is within a distance threshold value sigma of the real position of the predicted key point position is set, and the probability is defined as:

in the formula:

for detector d₀The probability of correct keypoints at point P, σ represents the threshold, T represents the test set,

represents the predicted location of the p-th keypoint,

represents the actual position of the p-th key point, and delta is a preset value.

By setting different internal parameters n and the number V of cameras with different viewing angles, the probability of correct key points is represented by a correct number TP of correct cases and a wrong number FP of wrong cases. And meanwhile, calculating a correct recognition rate FDR, which is defined as:

meanwhile, the hand key point detector used in the key point calibration algorithm in the application is a convolution attitude machine heterogeneous optimization algorithm. The algorithm applies a key point detection algorithm based on deep learning to human body posture analysis, takes an image with a human body posture as input data, and outputs a confidence distribution map of human body posture characteristics through a set convolutional neural network.

According to the embodiment, the traditional monocular wide-angle camera is combined into the binocular stereo camera system, the binocular stereo camera system is used as input signal change judging equipment, a novel orthodontic algorithm is used on the basis of guaranteeing to enlarge the traditional mode and capture the coverage angle, and the influence of orthodontics on subsequent VR gesture recognition is reduced, and the binocular stereo camera system is used simultaneously. The AI correction algorithm is used as a judgment core, and the characteristics of low delay and accurate capture are met.

Corresponding to the gesture recognition method provided by the embodiment of the invention, the embodiment of the invention also provides a gesture recognition device. Please see the examples below.

Fig. 2 is a block diagram of a gesture recognition apparatus according to an embodiment of the present invention. Referring to fig. 2, a gesture recognition apparatus includes:

and a hand image acquisition module 201, configured to acquire a hand image captured by the binocular imaging system.

And the distortion correction module 202 is configured to perform distortion correction on the hand image to obtain a corrected image.

And the color processing module 203 is configured to perform color space processing on the corrected image to obtain an HSV image.

And the threshold segmentation module 204 is configured to perform threshold segmentation on the HSV image to obtain a segmented image.

And the frame selection module 205 is configured to select a hand in the separation image to obtain a hand contour.

And the recognition module 206 is configured to perform classification recognition on the hand image by using a pre-trained classifier according to the hand contour to obtain a recognition result.

On this basis, the device in this application still includes: and the closed operation module is used for performing morphological closed operation on the image obtained after the threshold segmentation is performed on the HSV image.

Wherein the distortion correction module 202 is specifically configured to: calibrating the binocular camera system by adopting a black and white chessboard calibration plate, and solving internal parameters and external parameters of a camera in the binocular camera system; and correcting by using a preset AI correction algorithm according to the internal parameters and the external parameters to obtain the corrected image.

The identification module 206 is specifically configured to: intercepting the hand contour in the hand image to obtain an intercepted image; and carrying out image recognition on the intercepted image by using the classifier to obtain the recognition result.

Above-mentioned embodiment uses novel just abnormal algorithm and is guaranteeing to enlarge traditional mode and catch the coverage angle basis on, accomplishes just abnormal in order to reduce to subsequent VR gesture recognition influence, uses simultaneously. The AI correction algorithm is used as a judgment core, and the characteristics of low delay and accurate capture are met.

In order to more clearly introduce a hardware system for implementing the embodiment of the present invention, a gesture recognition apparatus and a system are also provided in the embodiment of the present invention, corresponding to the gesture recognition method provided in the embodiment of the present invention. Please see the examples below.

Fig. 3 is a block diagram of a gesture recognition apparatus according to an embodiment of the present invention. Referring to fig. 3, a gesture recognition apparatus includes:

a processor 301, and a memory 302 connected to the processor 301;

the memory 302 is used for storing a computer program at least for executing the gesture recognition method;

the processor 301 is used for calling and executing the computer program in the memory 302.

Meanwhile, on the basis, the application also discloses a system, and fig. 4 is a structural diagram of the gesture recognition system provided by the embodiment of the invention. Referring to fig. 4, a gesture recognition system includes:

a binocular camera system 401, and a gesture recognition device 402 as described above in communication with the binocular camera system 401.

Meanwhile, the application also discloses a storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the gesture recognition method are realized.

According to the embodiment, the binocular stereo camera system is adopted, the shooting range of the camera is enlarged, and orthodontic algorithm is used for finishing orthodontic to reduce the influence on subsequent VR gesture recognition and simultaneously on the basis of ensuring that the coverage angle is captured in a traditional manner. The gesture recognition precision is greatly improved.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A gesture recognition method, comprising:

acquiring a hand image shot by a binocular camera system;

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein said distortion correcting said hand image to obtain a corrected image comprises:

4. The method of claim 1, wherein the performing classification recognition on the hand image by using a pre-trained classifier according to the hand contour to obtain a recognition result comprises:

intercepting the hand contour in the hand image to obtain an intercepted image;

5. The method of claim 1, wherein training the classifier comprises:

6. A gesture recognition apparatus, comprising:

7. The apparatus of claim 6, further comprising:

8. A gesture recognition device, comprising:

a processor, and a memory coupled to the processor;

the memory for storing a computer program for at least performing the gesture recognition method of any one of claims 1-5;

9. A gesture recognition system, comprising:

a binocular camera system, and the gesture recognition apparatus of claim 8 communicatively coupled to the binocular camera system.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, performs the steps of the gesture recognition method according to any one of claims 1-5.