CN110458095B - Effective gesture recognition method, control method and device and electronic equipment - Google Patents

Effective gesture recognition method, control method and device and electronic equipment Download PDF

Info

Publication number
CN110458095B
CN110458095B CN201910735669.8A CN201910735669A CN110458095B CN 110458095 B CN110458095 B CN 110458095B CN 201910735669 A CN201910735669 A CN 201910735669A CN 110458095 B CN110458095 B CN 110458095B
Authority
CN
China
Prior art keywords
gesture
image
recognition
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910735669.8A
Other languages
Chinese (zh)
Other versions
CN110458095A (en
Inventor
徐绍凯
贾宝芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Ruiwei Information Technology Co ltd
Original Assignee
Xiamen Ruiwei Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Ruiwei Information Technology Co ltd filed Critical Xiamen Ruiwei Information Technology Co ltd
Priority to CN201910735669.8A priority Critical patent/CN110458095B/en
Publication of CN110458095A publication Critical patent/CN110458095A/en
Application granted granted Critical
Publication of CN110458095B publication Critical patent/CN110458095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an effective gesture recognition method, a control method, a device and electronic equipment, wherein the recognition method comprises the steps of S11, obtaining a current frame image collected by a camera; s12, performing gesture detection and recognition on the current frame image according to a preset recognition algorithm to obtain a possible region, a gesture category and a confidence coefficient of a gesture in the current frame image; s13, sequentially carrying out gesture detection and recognition on all image frames of the video within a fixed time interval after the current frame to obtain possible regions of gestures in the image, gesture categories and confidence degrees of the gestures; s14, judging whether the proportion of the image frames with the same gesture in the image frames in the fixed time interval is larger than a preset proportion threshold value or not, and if yes, determining that the gesture is an effective gesture. The invention can effectively and quickly detect and identify gestures at the embedded terminal, and can conveniently and quickly carry out human-computer interaction.

Description

Effective gesture recognition method, control method and device and electronic equipment
Technical Field
The invention relates to a real-time gesture detection and judgment method and device based on artificial intelligence deep learning technology and computer vision and electronic equipment.
Background
With the rapid development of computer technology, deep learning is increasingly applied to the field of computer vision. The method for man-machine interaction by using gestures is a very convenient method and has very high application value. A remote non-contact man-machine interaction mode can be provided through a gesture recognition and control technology, so that a fast and accurate gesture recognition algorithm can bring convenient and friendly experience to a user. The difficulty of the application of the current deep neural network on the embedded device is that the network is huge and complex, the computing power of the embedded device is insufficient, the limitations of slow algorithm operation speed, unsmooth system operation, long response time and the like exist, and poor use experience is brought to users. In order to solve the above problems, the present invention mainly provides a method, an apparatus and an electronic device for real-time gesture recognition and control based on a neural network.
Disclosure of Invention
The invention aims to solve the technical problems of providing an effective gesture recognition method, a control method, a recognition device and recognized electronic equipment, which can effectively and quickly detect and recognize gestures in an embedded terminal and carry out convenient and quick human-computer interaction.
According to a first aspect of the present invention, there is provided a method for recognizing a valid gesture, comprising the steps of:
s11, acquiring a current frame image acquired by a camera;
s12, performing gesture detection and recognition on the current frame image according to a preset recognition algorithm to obtain a possible region of a gesture in the current frame image, a gesture type and a confidence coefficient of a recognition result, and judging whether to accept the recognition result according to the confidence coefficient;
s13, sequentially carrying out gesture detection and recognition on all image frames of the video within a fixed time interval after the current frame to obtain possible regions of gestures in the image, gesture categories and confidence degrees of recognition results, and judging whether to accept the recognition results according to the confidence degrees;
and S14, judging whether the proportion of the image frames with the same gesture category in the image frames in the fixed time interval is larger than a preset proportion threshold value, if so, considering the gesture as an effective gesture, if not, taking the next frame of the current frame in the step S13 as the current frame, and returning to the step S13.
Optionally, in step S11, the obtained current frame image is further preprocessed: firstly, the current frame image is normalized, and whether the gesture is detected in the previous frame image is judged according to the gesture detection and recognition result of the previous frame image.
Optionally, the detection and identification in step S12 and step S13 specifically include:
selecting a first neural network model or a second neural network model according to the gesture detection result in the previous frame of image, wherein the first neural network model is a pre-trained convolutional network single detection model and is used for directly predicting the possible area and the category of the gesture on the full image, and the second neural network model is a pre-trained convolutional network single detection model and is used for tracking the gesture according to the detection result of the previous frame;
if the gesture is not detected in the first frame image or the previous frame image, inputting the current frame image into a first neural network model for gesture detection and recognition, outputting coordinates of a possible region of the gesture in the current frame image, possible types of the gesture and a confidence coefficient of a recognition result by the first neural network model, if the confidence coefficient is greater than or equal to a preset confidence coefficient threshold value, receiving the detection and recognition result predicted by the first neural network model, and if the confidence coefficient is smaller than the preset confidence coefficient threshold value, ignoring the current frame image;
if the gesture is detected in the previous frame of image, mapping the position of the gesture in the previous frame of image to the current frame of image, expanding the mapping region on the current frame of image outwards according to a preset multiple, inputting the expanded mapping region to a second neural network model for gesture detection and recognition, outputting the coordinates of the possible region of the gesture in the current image, the possible types of the gesture and the confidence coefficient of the result by the second neural network model, if the confidence coefficient is greater than a preset confidence coefficient threshold value, receiving the prediction result of the second neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the confidence coefficient.
Optionally, the training method of the first neural network model is as follows: acquiring a first type of training sample set and labeling information of gestures; performing data preprocessing on the first class training sample set: cutting the first type of training samples in random size and turning the first type of training samples in a mirror image mode according to a preset aspect ratio; converting the labeling information of the gesture according to the cutting and turning conditions, and performing random color enhancement on the cut picture; and training a first neural network model by using the preprocessed first class sample set.
Optionally, the training method of the second neural network model is as follows: acquiring a second type of training sample set and labeling information of gestures; and (3) carrying out data preprocessing on the second class training sample set: taking the position of the gesture frame and the position after random offset as the center, randomly expanding the second type of training sample by 3 to 6 times outwards to perform cutting and mirror image turning, converting the labeling information of the gesture according to the cutting and turning conditions, and performing random color enhancement on the cut picture; training a second neural network model using the preprocessed second class sample set.
According to a second aspect of the present invention, there is provided a control method after recognition of a valid gesture, comprising the steps of:
s21, counting and analyzing the effective gesture recognition results of all detection frames in a fixed time interval before the current frame, and judging whether continuous and stable effective gestures exist in the fixed time interval;
s22, judging whether a continuous stable effective gesture type is changed into another continuous stable effective gesture type in the fixed time interval;
and S23, when the gesture type is found to be changed, executing control operation corresponding to the gesture change.
Wherein, the judgment of whether the gesture generates the category change is as follows: judging all image frames in the fixed time interval, and if the detected gesture in a certain image frame is changed from the stable state of one type to the stable state of another type, determining that the gesture type is changed; wherein the steady state of the classes are: and the proportion of the image frames with the same gesture in all the image frames of the video in the fixed time interval is greater than a preset proportion threshold value.
According to a third aspect of the present invention, there is provided an apparatus for recognizing a valid gesture, comprising:
the image acquisition module is used for acquiring a current frame image acquired by the camera;
the gesture detection and recognition module is used for carrying out gesture detection and recognition on the current frame image according to a preset recognition algorithm to obtain the gesture category of the gesture in the current frame image and the confidence coefficient of a recognition result, and judging whether to accept the recognition result according to the confidence coefficient;
sequentially performing gesture detection and recognition on all image frames of the video within a fixed time interval after the current frame to obtain gesture types of the gestures in the images and confidence degrees of recognition results, and judging whether to accept the recognition results according to the confidence degrees;
and the gesture recognition module is also used for judging whether the proportion of the image frames with the same gesture in the image frames in the time interval is greater than a preset proportion threshold value, if so, the gesture is considered to be a valid gesture, and a judgment result is returned.
Optionally, the method further includes:
the image preprocessing module is used for carrying out normalization processing on the current frame image and judging whether the gesture is detected in the previous frame image or not according to the gesture detection and recognition result of the previous frame image;
the model selection module is used for selecting a first neural network model or a second neural network model according to the gesture detection result in the previous frame of image, the first neural network model is a pre-trained convolutional network single detection model and is used for directly predicting the possible regions and types of gestures on the full graph, and the second neural network model is a pre-trained convolutional network single detection model and is used for tracking the gestures according to the previous frame of detection result;
if the gesture is not detected in the previous frame of image, inputting the current frame of image into a first neural network model for gesture detection and recognition, outputting coordinates of a possible region of the gesture in the current frame of image, possible types of the gesture and a confidence coefficient of a recognition result by the first neural network model, if the confidence coefficient is greater than or equal to a preset confidence coefficient threshold value, receiving the detection and recognition result predicted by the first neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the confidence coefficient;
if the gesture is detected in the first detected image or the previous frame of image, mapping the position of the gesture in the previous frame of image to the current frame of image, expanding the mapping region on the current frame of image outwards according to a preset multiple, inputting the expanded mapping region to a second neural network model for gesture detection and recognition, outputting the coordinates of the possible region of the gesture in the current image, the possible types of the gesture and the confidence coefficient of the recognition result by the second neural network model, if the confidence coefficient is greater than a preset confidence coefficient threshold value, receiving the prediction result of the second neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the gesture.
According to a fourth aspect of the present invention, there is provided an electronic device for recognizing valid gestures, comprising a processor and a memory, wherein the processor is capable of executing the method for recognizing valid gestures as described above; the memory is used for storing all the obtained detection images, the result of image preprocessing and the result of gesture detection and recognition, and also storing an executable program for gesture response.
The invention has the advantages that:
(1) The gesture detection and recognition are carried out on the basis of the images acquired by the common camera, extra wearing equipment, parameters and excessive image preprocessing are not needed, the cost is saved, the use is more convenient and faster, and the operation speed is favorably improved;
(2) The gesture detection and recognition are carried out by alternately matching two neural network models, the first neural network model can directly predict the possible position, gesture type and confidence coefficient of the gesture in the full image, and the second neural network can track and recognize the possible region of the gesture of the next frame on the basis of the position of the gesture of the previous frame; obviously, the second neural network ensures the accuracy of gesture detection and recognition, has extremely fast speed while only consuming extremely small computing resources, has the running speed on an ARM chip of more than 10FPS and can meet the real-time detection requirement;
(3) In the gesture recognition process, the detection results of multiple frames are used as the detection result of the final gesture, so that the stability of the system can be ensured, the equipment can be accurately controlled through the gesture, and better experience is brought to human-computer interaction.
Drawings
The invention will be further described with reference to the following examples and figures.
FIG. 1 is a flowchart illustrating an effective gesture recognition method according to a preferred embodiment of the present invention.
FIG. 2 is a flowchart illustrating an exemplary method for performing control operations corresponding to gesture changes according to an exemplary embodiment of the present invention.
FIG. 3 is a flowchart illustrating the training process of the neural network model in the effective gesture method according to the present invention.
FIG. 4 is a block diagram of an effective gesture system according to a preferred embodiment of the present invention.
Detailed Description
Referring to fig. 1 to 3, a detailed description is given of the effective gesture recognition method of the present invention, which includes the following steps:
s11, acquiring a current frame image acquired by a camera; and preprocessing the current frame image: firstly, normalizing the current frame image, and judging whether the gesture is detected in the previous frame image according to the gesture detection and recognition result of the previous frame image; however, if the image is the first frame image, only normalization processing is required.
S12, performing gesture detection and recognition on the current frame image according to a preset recognition algorithm to obtain possible regions of gestures in the current frame image, gesture categories and confidence of recognition results; and judging whether to accept the recognition result according to the confidence degree. The confidence coefficient is a measure value of the gesture area and the gesture category, the confidence coefficient is obtained by model output and represents probability values of the gesture area and the gesture category predicted by the model, the higher the confidence coefficient is, the more credible the detected gesture area and the gesture category is, in practice, a fixed threshold value is usually set for the confidence coefficient, and the gesture area and the gesture category higher than the threshold value are considered as a detected effective gesture area and gesture category.
S13, sequentially carrying out gesture detection and recognition on all image frames of the video within a fixed time interval after the current frame to obtain possible regions of gestures in the images, gesture categories and confidence coefficients of the gestures; and judging whether to accept the recognition result according to the confidence degree. There are multiple possible regions of the gesture detected actually, and it is determined which is the gesture region that is finally taken by the confidence.
S14, judging whether the proportion of the image frames with the same gesture in the image frames in the fixed time interval is larger than a preset proportion threshold value or not, and if yes, determining that the gesture is an effective gesture. The same gesture is the same gesture which is predicted to be in the same category, for example, if the gesture detected in the T-th frame image is the 1 st category, and the gesture detected in the T + 1-th frame image is the 1 st category, the gestures of the two frames of images are the same gesture.
The detection and identification in step S12 and step S13 are specifically:
selecting a first neural network model or a second neural network model according to the gesture detection result in the previous frame of image, wherein the first neural network model is a pre-trained convolutional network single detection model and is used for directly predicting the possible area and the category of the gesture on the full image, and the second neural network model is a pre-trained convolutional network single detection model and is used for tracking the gesture according to the detection result of the previous frame;
if the gesture is not detected in the first detected image or the previous image, inputting the current image into a first neural network model for gesture detection and recognition, outputting coordinates of a possible region of the gesture in the current image, possible types of the gesture and a confidence coefficient of a result by the first neural network model, if the confidence coefficient is greater than or equal to a preset confidence coefficient threshold value, receiving the detection and recognition result predicted by the first neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the result;
if the gesture is detected in the previous frame of image, mapping the position of the gesture in the previous frame of image to the current frame of image, expanding the mapping region on the current frame of image outwards according to a preset multiple, inputting the expanded mapping region to a second neural network model for gesture detection and recognition, outputting the coordinates of the possible region of the gesture in the current image, the possible types of the gesture and the confidence coefficient of the result by the second neural network model, if the confidence coefficient is greater than or equal to a preset confidence coefficient threshold value, receiving the prediction result of the second neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the result.
The gesture recognition of the present invention actually involves two tasks, detection and recognition respectively. The gesture detection is to locate the position of the gesture in the whole picture, namely predicting the possible area of the gesture; after the gesture area is located, the type of the gesture is judged, namely the gesture is identified. Detecting the area where the gesture exists is a prerequisite for gesture recognition. The "gesture detection and recognition" is already described in both steps S12 and S13. The possible regions of the gesture are given in the form of four values — (x, y, w, h) representing the vertex coordinates and width and height, respectively, as detailed below:
in the present invention, the training method of the first neural network model is:
(1) Acquiring labeling information of a first type of training sample set and gestures, wherein the labeling information of the gestures comprises two aspects: (a) The method comprises the steps that frame information of all gestures to be recognized in an image comprises a central point x value of a gesture frame, a central point y value of the gesture frame, the width of the gesture frame and the height of the gesture frame; (b) All gestures to be recognized in the image are coded in a category mode, and gesture marking information is manually marked;
(2) Performing data preprocessing on the first class training sample set: cutting and mirror image turning the first type of training samples according to a preset aspect ratio; converting the labeling information of the gesture according to the cutting and turning conditions, and performing random color enhancement on the cut picture;
the cutting area keeps the input aspect ratio of the first neural network model to ensure that the image is not deformed when being input to a network for training, and simultaneously label information can be correspondingly converted, and the random size can ensure that the cut image comprises gestures with different proportions, which is beneficial to ensuring that the neural network model can adapt to gestures with different distances and sizes when gesture detection is carried out;
(3) And training a first neural network model by using the preprocessed first-class sample set. The training method of the second neural network model comprises the following steps:
(1) Acquiring a second type of training sample set and labeling information of the gesture;
(2) And (3) performing data preprocessing on the second type training sample set: taking the position of the gesture frame and the position after random deviation as the center, randomly expanding the second type of training samples outwards by 3 to 6 times for cutting and mirror image turning, converting the labeling information of the gesture according to the cutting and turning conditions, and performing random color enhancement on the cut picture;
if the cutting area exceeds the range of the original image, zero value filling is carried out, the training sample diversity is increased by random multiple cutting, the tracking model is beneficial to adapting to the fluctuation of the size of the gesture frame caused by the detection error of the previous frame, and therefore the stability of the model is improved;
(3) And training a second neural network model by using the preprocessed second class training sample set.
After the effective gesture is recognized, the instruction corresponding to the effective gesture can be executed, and the method comprises the following steps:
s21, counting and analyzing the effective gesture recognition results of all detection frames in a fixed time interval before the current frame, and judging whether continuous and stable effective gestures exist in the fixed time interval;
s22, judging whether a continuous stable effective gesture type is changed into another continuous stable effective gesture type in the fixed time interval;
and S23, when the gesture type is found to be changed, executing control operation corresponding to the gesture change.
Wherein, the judgment of whether the gesture generates the category change is as follows: judging all image frames in the fixed time interval, and if the detected gesture in a certain image frame is changed from the stable state of one type to the stable state of another type, determining that the gesture type is changed; wherein the steady state of the classes are: and the proportion of the image frames with the same gesture in all the image frames of the video in the fixed time interval is greater than a preset proportion threshold value.
The method or apparatus according to the invention described above is illustrated below:
example one
As shown in fig. 1, an embodiment of a gesture recognition method includes the following steps:
11. and acquiring image data of the current frame from the camera, and converting the image data into a three-channel RGB image format.
12. The method comprises the steps of preprocessing an acquired image, firstly, normalizing the image, and generally, normalizing the image by the following formula:
Figure BDA0002162081490000091
wherein min is x i Minimum value of (i =1,2.. N), max being x i A maximum value of (i =1,2.. N).
And analyzing the gesture detection and recognition results of the previous frame of image, judging whether an effective gesture is detected and carrying out corresponding processing. And if the effective gesture is not detected in the previous frame of image, scaling the image size after the normalization processing to the input size of the first neural network model. If an effective gesture is detected and recognized in the previous frame of image, mapping the position of the gesture in the previous frame to the image after normalization processing, taking the position as the center, outwards expanding the gesture frame to be k times of the width-height mean value of the original gesture frame, wherein k is a numerical value with a preset size, filling the gesture frame with zero values if the gesture frame exceeds the range of the original image in the expansion process, cutting the expanded area, and scaling the size of the image to the input size of the second neural network model.
13. And inputting the preprocessed image into a corresponding neural network model for gesture detection and recognition. And if the effective gesture is not detected in the previous frame of image, inputting the current preprocessed image into the first neural network model for gesture detection and recognition. And if the effective gesture is recognized in the last frame of image, inputting the current preprocessed image into a second neural network model for gesture tracking and recognition.
14. And outputting a gesture recognition result in the current image. Outputting a prediction result of whether the effective gesture exists in the current image and a possible region of the gesture by the model, wherein the output result is a one-dimensional vector with the length of 6, and the length of the one-dimensional vector is respectively represented as follows: the x value of the center point of the gesture box, the y value of the center point of the gesture box, the width of the gesture box, the height of the gesture box, the type of the gesture and the confidence coefficient of the prediction result.
According to the embodiment, the gestures existing in the image are detected and recognized through two neural network models, and possible existence areas, gesture categories and prediction confidence degrees of the gestures in the image are output. The first neural network model is responsible for detecting and recognizing gestures in the whole image range, and the second neural network model is responsible for tracking and recognizing gestures around the gesture area of the previous frame, so that the stability and reliability of gesture recognition can be ensured, the gesture recognition speed is greatly improved by using the second neural network model, and real-time detection can be realized on embedded equipment.
Example two
As shown in fig. 2, an embodiment of a gesture control method includes the following steps:
and 21, counting and analyzing the gesture recognition results of all the detection frames in a fixed time interval before the current frame, and judging whether continuous and stable effective gestures exist in the fixed time interval.
A continuously stable active gesture is defined as: in the specified number of continuous frames, the proportion of the frames with the detected effective gestures is larger than a specified threshold, the fluctuation range of the gesture area is smaller, and the gesture category is not changed. The number of consecutive frames and the proportional threshold are specified by those skilled in the art according to the model performance and the product reality, and the fluctuation of the gesture area is measured by the relative position of the effective gesture area detected in two adjacent frames.
And 22, counting whether the gesture type continuously and stably changes from one type to another type within the fixed time interval, and performing corresponding control operation according to the change of the gesture.
If the statistical result is yes, the gesture change detected in the picture is effective, and corresponding control operation is executed according to the change of the gesture category;
if the statistical result is negative, the gesture change detected in the picture is invalid, at this moment, the control operation is not executed, and the gesture detection and recognition of the next frame are continued.
According to the embodiment, the intelligent device is correspondingly controlled through the result of gesture detection and recognition on the continuous frames. It should be noted that the gesture category recognizable by the model can be flexibly defined by those skilled in the art according to the actual requirement, and is not a condition for limiting the present invention. The effective gesture category change can be flexibly defined by those skilled in the art according to the actual requirement, and is not a condition for limiting the present invention.
EXAMPLE III
As shown in fig. 3, an embodiment of a neural network model training process in an effective gesture recognition method is provided, which includes:
and 31, acquiring a training image containing the required gesture and gesture labeling information. The training images are all images containing gesture categories to be recognized, and the training images are not independently used as the training images under the condition of no gesture. The gesture category to be recognized can be flexibly designated by a person skilled in the art according to actual requirements, and is not limited to a certain category or several categories. The gesture labeling information comprises two aspects: (1) Frame information of all gestures to be recognized in the image, wherein the frame information comprises a central point x value of the gesture frame, a central point y value of the gesture frame, a width of the gesture frame and a height of the gesture frame; and (2) encoding all gesture classes to be recognized in the image. The gesture labeling information is manually labeled.
32, preprocessing training samples and labeling information:
32-1, in order to obtain a training sample which can be used for training the first neural network model, cutting the training image in random size, wherein the cutting area keeps the input aspect ratio of the first neural network model so as to ensure that the image is not deformed when being input into the network for training, and simultaneously carrying out corresponding conversion on the labeling information. The random size can enable the cut image to contain gestures with different proportions, which is beneficial to enabling the neural network model to adapt to gestures with different distances and sizes when gesture detection is carried out.
32-2, in order to improve the robustness of the neural network model and ensure that the neural network model can correctly identify the left hand and the right hand, the image obtained in the step 32-1 is subjected to random mirror image overturning, and meanwhile, the labeling information is subjected to corresponding conversion.
32-3, in order to improve the robustness of the neural network model and enable the neural network model to adapt to color differences caused by different illumination, different scenes and different cameras, carrying out random color enhancement, brightness enhancement, contrast enhancement and the like on the image obtained in the step 32-2. This step includes, but is not limited to, the three enhancements described above.
32-4, in order to obtain a training sample which can be used for training the second neural network model, cutting the training image in the step 3-1 in the following manner: taking the center point of a gesture frame in the image as a reference, and randomly increasing a certain amount of offset in the x direction and the y direction at the point, wherein the offset does not exceed the gesture frame; taking the point after the deviation as a central point, taking 3-6 times of the maximum width and height of the gesture frame as the side length of the cutting frame to perform square cutting, wherein the multiple of the side length is obtained by taking a random floating point number between 3 and 6; if the cutting area exceeds the range of the original image, the cutting area is filled with zero values. And simultaneously, correspondingly converting the labeling information. The random multiple cutting increases the diversity of training samples, and is beneficial to enabling the tracking model to adapt to the fluctuation of the size of the gesture frame caused by the detection error of the previous frame, thereby improving the stability of the model.
32-5, in order to improve the robustness of the second neural network model and enable the second neural network model to correctly track and identify the left hand and the right hand, the image obtained in the step 32-4 is subjected to random mirror image overturning, and meanwhile, the labeling information is subjected to corresponding conversion.
32-6, in order to improve the robustness of the second neural network model and enable the second neural network model to adapt to color differences caused by different illumination, different scenes and different cameras, carrying out random color enhancement, brightness enhancement, contrast enhancement and the like on the image obtained in the step 3-2-5. This step includes, but is not limited to, the three enhancements described above.
And 33, training a neural network model:
33-1, training a first neural network model using the first neural network training samples preprocessed in step 32. Adjusting the width and height of a training sample into the input size of a first neural network model, inputting the training sample into a network for forward propagation, and calculating loss according to the output result of the model, wherein the loss value consists of three parts: a loss of gesture box position, a loss of gesture box confidence, a loss of gesture category. The gesture box position loss is calculated by adopting a mean square error loss function, and the gesture box confidence loss and the gesture category loss are calculated by adopting a cross entropy loss function. And optimizing parameters in the network model by using a gradient descent method and a back propagation algorithm according to the calculated loss value. And repeating the steps for a plurality of times, judging whether the model is converged, if so, stopping the training process to obtain a trained first neural network model, otherwise, continuing the training until the model is converged. Since non-gestures are not considered as a separate class, a specific strategy needs to be adopted to distinguish between positive and negative samples during the training process: when the IoU of a certain bounding box and group channel is larger than that of all other bounding boxes, the target is given 1. If a bounding box is not the one with the largest IoU, but the IoU is also greater than 0.5, then we ignore it (neither penalize nor reward). We assign only one best bounding box to each group route. If a bounding box does not correspond to any group route, it does not contribute to the regression of the frame position size and the prediction of class, and only penalizes its confidence.
33-2, training the second neural network model by using the preprocessed second neural network training samples in the step 32. Adjusting the width and height of the training sample to be the input size of a second neural network model, inputting the training sample into a network for forward propagation, and calculating loss according to the output result of the model, wherein the loss value comprises three parts: a loss of gesture box position, a loss of gesture box confidence, a loss of gesture category. The gesture box position loss is calculated by adopting a mean square error loss function, and the gesture box confidence loss and the gesture category loss are calculated by adopting a cross entropy loss function. And optimizing parameters in the network model by using a gradient descent method and a back propagation algorithm according to the calculated loss value. And after repeating the steps for a plurality of times, judging whether the model is converged, if so, stopping the training process to obtain a trained second neural network model, otherwise, continuing to train until the model is converged. Since non-gestures are not considered as a separate class, a specific strategy needs to be adopted to distinguish between positive and negative samples during the training process: when the IoU of a certain bounding box and group route is larger than that of all other bounding boxes, the target is given 1. If a bounding box is not the one with the largest IoU, but the IoU is also greater than 0.5, then we ignore it (neither penalize nor reward). We assign only one best bounding box to each group channel. If a bounding box does not correspond to any one group route, it does not contribute to the regression of the size of the bezel position and the prediction of class, and only penalizes its confidence.
In practical applications, the first neural network model and the second neural network model may both adopt the variant structure of the YOLOV3 model, and the YOLOV3 variant structures of the first neural network model and the second neural network model are summarized below respectively.
The input size of the first neural network is 576 width and 320 height, the convolution kernel with the step length of 1, the size of 3 x 3 and 1 x 1 is used for extracting features, the feature map is subjected to down-sampling by using the maximum pooling layer, the bilinear interpolation method is used as an up-sampling layer, the feature maps with different depths are spliced by using the routing layer, and the 19 th layer and the 25 th layer of the model are respectively used as two output layers and used for predicting a gesture frame on the scales with two sizes, so that gestures with different distances can be detected more accurately.
The input size of the second neural network is width 208 and height 208, the convolution kernel with the step length of 1, the size of 3 x 3 and 1 x 1 is used for extracting features, the feature map is subjected to down-sampling by using the maximum pooling layer, the bilinear interpolation method is used as an up-sampling layer, the feature maps with different depths are spliced by using the routing layer, and the 14 th layer and the 21 st layer of the model are respectively used as two output layers and used for predicting the gesture frame on the scales with two sizes, so that the model can be more stably represented when the gesture frame based on the previous frame is tracked.
Example four
As shown mainly in fig. 4, an embodiment of a recognition device for valid gestures belongs to a virtual device of software, and includes: the device comprises a picture acquisition module, an image preprocessing module, a gesture detection and identification module, a model selection module and a gesture response module.
The image acquisition module is used for acquiring the current frame image acquired by the camera.
The image preprocessing module is used for carrying out normalization processing on the current frame image and judging whether the gesture is detected in the previous frame image or not according to the gesture detection and recognition result of the previous frame image; however, if the image is the first frame image, only normalization processing is required.
The gesture detection and recognition module is used for carrying out gesture detection and recognition on the current frame image according to a preset recognition algorithm to obtain a possible region, a gesture category and a confidence coefficient of a gesture in the current frame image; sequentially carrying out gesture detection and recognition on all image frames of the video within a fixed time interval after the current frame to obtain possible regions of gestures in the image, gesture categories and confidence degrees of the gestures; and the gesture recognition module is also used for judging whether the proportion of the image frames with the same gesture in the image frames in the time interval is greater than a preset proportion threshold value or not, if so, the gesture is considered to be an effective gesture, and a judgment result is returned.
The model selection module is used for selecting a first neural network model or a second neural network model according to a gesture detection result in a previous frame of image, the first neural network model is a pre-trained convolution network single detection model and is used for directly predicting possible regions and types of gestures on a full graph, the second neural network model is a pre-trained convolution network single detection model, and the gestures are tracked according to the previous frame of detection result;
if the gesture is not detected in the first frame of image or the last frame of image, inputting the current frame of image into a first neural network model for gesture detection and recognition, outputting coordinates of a possible region of the gesture, possible types of the gesture and a confidence coefficient of a result in the current frame of image by the first neural network model, if the confidence coefficient is greater than a preset confidence coefficient threshold value, receiving the detection and recognition result predicted by the first neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the result;
if the gesture is detected in the previous frame of image, mapping the position of the gesture in the previous frame of image into the current frame of image, expanding the mapping region on the current frame of image outwards according to a preset multiple, inputting the expanded mapping region into a second neural network model for gesture detection and recognition, outputting the coordinates of the possible region of the gesture in the current image, the possible types of the gesture and the confidence coefficient of the result by the second neural network model, if the confidence coefficient is greater than a preset confidence coefficient threshold value, receiving the prediction result of the second neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the result;
the gesture response module is configured to determine a change of a gesture category within a preset time period, and execute a preset control operation, where the preset time period is a time period obtained by determining the change of the gesture category.
The judgment process of whether the gesture response module generates the category change is as follows: judging all image frames in a preset time interval, and if the detected gesture in a certain image frame is changed from the stable state of one type to the stable state of another type, determining that the gesture type is changed; wherein the steady state of the classes are: the proportion of the image frames with the same gesture in all the image frames of the video in the preset time interval is larger than a preset proportion threshold value.
EXAMPLE five
As shown generally in fig. 1, an embodiment of an electronic device for recognizing valid gestures includes: a processor and a memory, wherein the processor is capable of executing the above-mentioned method for recognizing valid gestures (the specific process is as described above and is not repeated here); the memory is used for storing all the obtained detection images, the result of image preprocessing and the result of gesture detection and recognition, and storing an executable program for gesture response.
While specific embodiments of the invention have been described, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, as equivalent modifications and variations as will be made by those skilled in the art in light of the spirit of the invention are intended to be included within the scope of the appended claims.

Claims (7)

1. A method for recognizing valid gestures is characterized in that: the method comprises the following steps:
s11, acquiring a current frame image acquired by a camera, and preprocessing the acquired current frame image: firstly, normalizing a current frame image, and judging whether a gesture is detected in a previous frame image according to a gesture detection and recognition result of the previous frame image;
s12, performing gesture detection and recognition on the current frame image according to a preset recognition algorithm to obtain a gesture category of a gesture in the current frame image and a confidence coefficient of a recognition result, and judging whether to accept the recognition result according to the confidence coefficient;
s13, sequentially carrying out gesture detection and recognition on all image frames of the video within a fixed time interval after the current frame to obtain the gesture category of the gesture in the image and the confidence coefficient of the recognition result, and judging whether to accept the recognition result according to the confidence coefficient;
s14, judging whether the proportion of the image frames with the same gesture category in the image frames in the fixed time interval is larger than a preset proportion threshold value or not, if so, considering the gesture as an effective gesture, if not, identifying the next image frame, and returning to the step S13;
the detection and identification in the step 12 and the step 13 are specifically:
selecting a first neural network model or a second neural network model according to a gesture detection result in a previous frame of image, wherein the first neural network model is a pre-trained convolution network single detection model and is used for directly predicting possible regions and types of gestures on a full graph, and the second neural network model is a pre-trained convolution network single detection model and is used for tracking the gestures according to the gesture regions detected in the previous frame;
if the gesture is not detected in the first frame image or the previous frame image, inputting the current frame image into a first neural network model for gesture detection and recognition, outputting coordinates of a possible region of the gesture, possible types of the gesture and a confidence coefficient of a recognition result in the current frame image by the first neural network model, if the confidence coefficient is greater than or equal to a preset confidence coefficient threshold value, receiving the detection and recognition result predicted by the first neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the current frame image;
if the gesture is detected in the previous frame of image, mapping the position of the gesture in the previous frame of image to the current frame of image, expanding the mapping region on the current frame of image outwards according to a preset multiple, inputting the expanded mapping region to a second neural network model for gesture detection and recognition, outputting the coordinates of the possible region of the gesture in the current image, the possible types of the gesture and the confidence coefficient of the recognition result by the second neural network model, if the confidence coefficient is greater than a preset confidence coefficient threshold value, receiving the prediction result of the second neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the prediction result.
2. A method of active gesture recognition according to claim 1, wherein: the training method of the first neural network model comprises the following steps:
acquiring a first type of training sample set and labeling information of gestures;
performing data preprocessing on the first class training sample set: cutting and mirror image turning the first type of training samples according to a preset aspect ratio;
converting the labeling information of the gesture according to the cutting and turning conditions, and performing random color enhancement on the cut picture;
and training a first neural network model by using the preprocessed first-class sample set.
3. A method of recognizing valid gestures according to claim 1, characterized by: the training method of the second neural network model comprises the following steps:
acquiring a second type of training sample set and labeling information of gestures;
and (3) performing data preprocessing on the second type training sample set: taking the position of the gesture frame and the position after random offset as the center, randomly expanding the second type of training sample by 3 to 6 times outwards to perform cutting and mirror image turning, converting the labeling information of the gesture according to the cutting and turning conditions, and performing random color enhancement on the cut picture;
training a second neural network model using the preprocessed second class sample set.
4. A control method after recognition of an effective gesture is characterized by comprising the following steps: after the effective gesture is recognized by the effective gesture recognition method according to claim 1, the following steps are performed:
s21, counting and analyzing the effective gesture recognition results of all detection frames in a fixed time interval before the current frame, and judging whether continuous and stable effective gestures exist in the fixed time interval;
s22, judging whether a continuous stable effective gesture type is changed into another continuous stable effective gesture type in the fixed time interval;
and S23, when the gesture type is found to be changed, executing control operation corresponding to the gesture change.
5. The method of claim 4, wherein the method further comprises the steps of: the judgment of whether the gesture generates the category change is as follows:
judging all image frames in the fixed time interval, and if the detected gesture in a certain image frame is changed from a stable state of one type to a stable state of another type, determining that the gesture type is changed;
wherein the steady state of the classes are: and the proportion of the image frames with the same gesture in all the image frames of the video in the fixed time interval is greater than a preset proportion threshold value.
6. An apparatus for recognizing a valid gesture, comprising: the method comprises the following steps:
the image acquisition module is used for acquiring a current frame image acquired by the camera;
the image preprocessing module is used for carrying out normalization processing on the current frame image and judging whether the gesture is detected in the previous frame image or not according to the gesture detection and recognition result of the previous frame image;
the gesture detection and recognition module is used for carrying out gesture detection and recognition on the current frame image according to a preset recognition algorithm to obtain the gesture category of the gesture in the current frame image and the confidence coefficient of a recognition result, and judging whether to accept the recognition result according to the confidence coefficient;
sequentially carrying out gesture detection and recognition on all image frames of the video within a fixed time interval after the current frame to obtain the gesture category of the gesture in the image and the confidence of the recognition result, and judging whether to accept the recognition result according to the confidence;
the gesture recognition module is also used for judging whether the proportion of the image frames with the same gesture in the image frames in the time interval is larger than a preset proportion threshold value or not, if so, the gesture is considered to be an effective gesture, and a judgment result is returned;
the model selection module is used for selecting a first neural network model or a second neural network model according to the gesture detection result in the previous frame of image, the first neural network model is a pre-trained convolutional network single detection model and is used for directly predicting the possible regions and types of gestures on the full graph, and the second neural network model is a pre-trained convolutional network single detection model and is used for tracking the gestures according to the previous frame of detection result;
if the gesture is not detected in the previous frame of image, inputting the current frame of image into a first neural network model for gesture detection and recognition, outputting coordinates of a possible region of the gesture, possible types of the gesture and a confidence coefficient of a recognition result in the current frame of image by the first neural network model, if the confidence coefficient is greater than or equal to a preset confidence coefficient threshold value, receiving the detection and recognition result predicted by the first neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the result;
if the gesture is detected in the first frame of image or the last frame of image, mapping the position of the gesture in the last frame of image into the current frame of image, expanding the mapping region on the current frame of image outwards according to a preset multiple, inputting the expanded mapping region into a second neural network model for gesture detection and recognition, outputting the coordinates of the possible region of the gesture in the current image, the possible category of the gesture and the confidence coefficient of the recognition result by the second neural network model, if the confidence coefficient is greater than a preset confidence coefficient threshold value, receiving the prediction result of the second neural network model, and if the confidence coefficient is less than the preset confidence coefficient threshold value, ignoring the gesture.
7. An electronic device for recognition of valid gestures, characterized by: a processor and a memory, the processor being operable to perform a method of recognition of a valid gesture according to any one of claims 1 to 3; the memory is used for storing all the obtained detection images, the result of image preprocessing and the result of gesture detection and recognition, and also storing an executable program for gesture response.
CN201910735669.8A 2019-08-09 2019-08-09 Effective gesture recognition method, control method and device and electronic equipment Active CN110458095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910735669.8A CN110458095B (en) 2019-08-09 2019-08-09 Effective gesture recognition method, control method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910735669.8A CN110458095B (en) 2019-08-09 2019-08-09 Effective gesture recognition method, control method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110458095A CN110458095A (en) 2019-11-15
CN110458095B true CN110458095B (en) 2022-11-18

Family

ID=68485693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910735669.8A Active CN110458095B (en) 2019-08-09 2019-08-09 Effective gesture recognition method, control method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110458095B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158467A (en) * 2019-12-12 2020-05-15 青岛小鸟看看科技有限公司 Gesture interaction method and terminal
CN112262393A (en) * 2019-12-23 2021-01-22 商汤国际私人有限公司 Gesture recognition method and device, electronic equipment and storage medium
CN111382687A (en) * 2020-03-05 2020-07-07 平安科技(深圳)有限公司 Face detection method and system
CN111597969A (en) * 2020-05-14 2020-08-28 新疆爱华盈通信息技术有限公司 Elevator control method and system based on gesture recognition
CN111931677A (en) * 2020-08-19 2020-11-13 北京影谱科技股份有限公司 Face detection method and device and face expression detection method and device
CN112306235B (en) * 2020-09-25 2023-12-29 北京字节跳动网络技术有限公司 Gesture operation method, device, equipment and storage medium
CN114510142B (en) * 2020-10-29 2023-11-10 舜宇光学(浙江)研究院有限公司 Gesture recognition method based on two-dimensional image, gesture recognition system based on two-dimensional image and electronic equipment
CN112508016B (en) * 2020-12-15 2024-04-16 深圳万兴软件有限公司 Image processing method, device, computer equipment and storage medium
CN112860212A (en) * 2021-02-08 2021-05-28 海信视像科技股份有限公司 Volume adjusting method and display device
WO2022183321A1 (en) * 2021-03-01 2022-09-09 华为技术有限公司 Image detection method, apparatus, and electronic device
CN113076836B (en) * 2021-03-25 2022-04-01 东风汽车集团股份有限公司 Automobile gesture interaction method
CN113065458B (en) * 2021-03-29 2024-05-28 芯算一体(深圳)科技有限公司 Voting method and system based on gesture recognition and electronic equipment
CN113095292A (en) * 2021-05-06 2021-07-09 广州虎牙科技有限公司 Gesture recognition method and device, electronic equipment and readable storage medium
CN113326829B (en) * 2021-08-03 2021-11-23 北京世纪好未来教育科技有限公司 Method and device for recognizing gesture in video, readable storage medium and electronic equipment
CN113780083A (en) * 2021-08-10 2021-12-10 新线科技有限公司 Gesture recognition method, device, equipment and storage medium
WO2023077886A1 (en) * 2021-11-04 2023-05-11 海信视像科技股份有限公司 Display device and control method therefor
CN114546106A (en) * 2021-12-27 2022-05-27 深圳市鸿合创新信息技术有限责任公司 Method and device for identifying air gesture, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103376890A (en) * 2012-04-16 2013-10-30 富士通株式会社 Gesture remote control system based on vision
CN106247561A (en) * 2016-08-30 2016-12-21 广东美的制冷设备有限公司 A kind of air-conditioning and long-range control method thereof and device
CN108229277A (en) * 2017-03-31 2018-06-29 北京市商汤科技开发有限公司 Gesture identification, control and neural network training method, device and electronic equipment
CN109598198A (en) * 2018-10-31 2019-04-09 深圳市商汤科技有限公司 The method, apparatus of gesture moving direction, medium, program and equipment for identification
CN109814717A (en) * 2019-01-29 2019-05-28 珠海格力电器股份有限公司 Household equipment control method and device, control equipment and readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902588B (en) * 2019-01-29 2021-08-20 北京奇艺世纪科技有限公司 Gesture recognition method and device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103376890A (en) * 2012-04-16 2013-10-30 富士通株式会社 Gesture remote control system based on vision
CN106247561A (en) * 2016-08-30 2016-12-21 广东美的制冷设备有限公司 A kind of air-conditioning and long-range control method thereof and device
CN108229277A (en) * 2017-03-31 2018-06-29 北京市商汤科技开发有限公司 Gesture identification, control and neural network training method, device and electronic equipment
CN109598198A (en) * 2018-10-31 2019-04-09 深圳市商汤科技有限公司 The method, apparatus of gesture moving direction, medium, program and equipment for identification
CN109814717A (en) * 2019-01-29 2019-05-28 珠海格力电器股份有限公司 Household equipment control method and device, control equipment and readable storage medium

Also Published As

Publication number Publication date
CN110458095A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN110458095B (en) Effective gesture recognition method, control method and device and electronic equipment
CN108564097B (en) Multi-scale target detection method based on deep convolutional neural network
CN110610166B (en) Text region detection model training method and device, electronic equipment and storage medium
CN112508975A (en) Image identification method, device, equipment and storage medium
CN111008576B (en) Pedestrian detection and model training method, device and readable storage medium
CN113095152B (en) Regression-based lane line detection method and system
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN109902631B (en) Rapid face detection method based on image pyramid
CN111126209B (en) Lane line detection method and related equipment
CN112085789A (en) Pose estimation method, device, equipment and medium
JP2023527615A (en) Target object detection model training method, target object detection method, device, electronic device, storage medium and computer program
CN111259808A (en) Detection and identification method of traffic identification based on improved SSD algorithm
CN115335872A (en) Training method of target detection network, target detection method and device
CN112801236A (en) Image recognition model migration method, device, equipment and storage medium
CN114445853A (en) Visual gesture recognition system recognition method
CN111027526A (en) Method for improving vehicle target detection, identification and detection efficiency
CN115953744A (en) Vehicle identification tracking method based on deep learning
CN109241893B (en) Road selection method and device based on artificial intelligence technology and readable storage medium
CN111476226B (en) Text positioning method and device and model training method
CN116258931B (en) Visual finger representation understanding method and system based on ViT and sliding window attention fusion
CN115345932A (en) Laser SLAM loop detection method based on semantic information
CN115311244A (en) Method and device for determining lesion size, electronic equipment and storage medium
CN113033593B (en) Text detection training method and device based on deep learning
CN114092766A (en) Robot grabbing detection method based on characteristic attention mechanism
CN114067359A (en) Pedestrian detection method integrating human body key points and attention features of visible parts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant