CN110334584B - Gesture recognition method based on regional full convolution network - Google Patents

Gesture recognition method based on regional full convolution network Download PDF

Info

Publication number
CN110334584B
CN110334584B CN201910419349.1A CN201910419349A CN110334584B CN 110334584 B CN110334584 B CN 110334584B CN 201910419349 A CN201910419349 A CN 201910419349A CN 110334584 B CN110334584 B CN 110334584B
Authority
CN
China
Prior art keywords
network
candidate
frame
layer
regional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910419349.1A
Other languages
Chinese (zh)
Other versions
CN110334584A (en
Inventor
杨锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910419349.1A priority Critical patent/CN110334584B/en
Publication of CN110334584A publication Critical patent/CN110334584A/en
Application granted granted Critical
Publication of CN110334584B publication Critical patent/CN110334584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Abstract

The invention discloses a gesture recognition method based on a regional full convolution network, which comprises the steps of carrying out feature extraction on an input gesture image through the full convolution network to obtain a group of feature maps and generate a candidate frame, generating a position sensitive score map by a position sensitive sub-network, and scoring each gesture category through a pooling layer so as to realize the positioning and classification of target gestures; the invention is mainly characterized in that the whole area full convolution network is a shared full convolution structure, the whole structure is end-to-end learning, the high-precision identification rate is realized, meanwhile, the complex calculation is avoided, and in combination with the OHEM technology, the network model has higher rejection rate to the negative sample, is convenient for practical application, and has important significance for the field of human-computer interaction.

Description

Gesture recognition method based on regional full convolution network
Technical Field
The invention relates to the technical field of computer vision, machine learning and pattern recognition, in particular to a method for realizing end-to-end gesture recognition by utilizing a regional full convolution network.
Background
Currently, with the increasing popularity of VR (visual Reality) and AR (administrative Reality), human-computer interaction technology is receiving increasing attention. Gestures are paid attention to by extensive researchers as the most direct and convenient human-computer interaction mode, and gesture recognition gradually becomes an important research direction in the field of computer vision. The significance of how a computer accurately recognizes a gesture is an important part in a gesture human-computer interaction system, and because a human hand is a complicated morphism, the gesture has the characteristics of diversity, ambiguity, time difference and the like, and the gesture is usually in a complicated scene, such as various complicated scene factors of over-bright or over-dark light, existence of a plurality of gestures, different distances between the gesture and equipment and the like, the gesture recognition is still a huge challenge.
The typical gesture recognition method is mainly based on hidden markov model, template matching, artificial neural network and the like. The traditional gesture recognition methods have the defects that the features must be manually set, and then the features are extracted from the gesture for recognition, so that the processing process is complex and the efficiency is low.
Disclosure of Invention
The invention aims to provide a gesture recognition method based on a regional full convolution network, so as to improve recognition efficiency and reduce calculation complexity.
In order to realize the task, the invention adopts the following technical scheme:
a gesture recognition method based on a regional full convolution network comprises the following steps:
step 1, establishing a full convolution network
Using a residual error network ResNet-34 network architecture as a framework, changing the step length of a RerNet-34 network from 32 pixels to 16 pixels, deleting an average pooling layer and a full-connection layer of the ResNet-34 network architecture, and then constructing a full convolution network by using a convolution layer of the ResNet-34 network architecture so as to extract the characteristics of an input image; the method comprises the steps that an input image outputs a feature map after passing through a full convolution network, and each pixel point on the feature map generates a plurality of candidate frames for predicting the position of a coordinate frame;
step 2, establishing a regional candidate network
Establishing a regional candidate network, wherein the network comprises the last convolutional layer of the full convolutional network, two branches are arranged behind the convolutional layer, one branch is sequentially the convolutional layer, a first adjusting layer, a normalization layer and a second adjusting layer, the function of the branch is used for judging the candidate frame belongs to the fraction of the foreground and the background, and the other branch is the convolutional layer and used for predicting the offset of the candidate frame and the position of the real coordinate frame; the first adjusting layer and the second adjusting layer are used for changing the dimensionality of the image, and the normalizing layer is used for performing normalization operation;
step 3, training the area candidate network
The screening candidate box is used for training the regional candidate network, and the screening rule is as follows:
if the overlapping rate of the candidate frame and the real coordinate frame is more than or equal to 0.7, the candidate frame is considered as a foreground; if the overlapping rate of the candidate frame and the real coordinate frame is less than 0.3, the candidate frame is considered as the background; training by taking candidate frames corresponding to the foreground and the background as training data of the regional candidate network, wherein the candidate frame corresponding to the foreground is a positive sample, and the candidate frame corresponding to the background is a negative sample; the loss function for the regional candidate network training is:
L=cls_loss+λ*reg_loss
wherein λ is an adjustable parameter; to train the area candidate network, a binary class label is assigned to the candidate box to be trained, let p i Is the predicted probability that the ith candidate box belongs to the foreground,
Figure BDA0002065498950000021
is a true tag, thencls_lossIs defined as:
Figure BDA0002065498950000022
reg _ loss is used to regress the deviation between the candidate frame and the real coordinate frame, and is defined as:
Figure BDA0002065498950000023
Figure BDA0002065498950000024
where i ∈ (x, y, w, h), t i Is the ith candidate frame and the real coordinate frame [ x, y, w, h ]]The predicted output of the offset of (a),
Figure BDA0002065498950000025
is [ x, y, w, h ] of the ith candidate frame and the real coordinate frame]X and y represent coordinates, and w and h represent width and height;
the regional candidate network utilizes the loss function L to carry out end-to-end training by a back propagation and random gradient descent method, and weight is initialized by zero mean Gaussian distribution with standard deviation of 0.01;
step 4, constructing a position sensitive sub-network
The location sensitive subnetwork includes the last convolutional layer in the full convolutional networkAfter the input image is processed by the full convolution network, the output characteristic diagram is subjected to convolution operation by the convolution layer to obtain a position sensitive score diagram; the convolutional layer generates a dimension k for each gesture class 2 (c + 1) position sensitive score plot, k 2 The position sensitivity score maps are relative positions described by k × k space grids, wherein c represents the number of classes of the identified objects;
step 5, pooling of location sensitive candidate frames
Outputting the deviation amount of the candidate frame and the real coordinate frame by the trained regional candidate network, wherein the deviation amount comprises the position information of the candidate frame region; according to the position information, corresponding the candidate frame to the position sensitive score map obtained in the step 4, wherein the candidate frame is divided into k × k sub-regions, and each sub-region corresponds to a region on the score map; the location sensitive subnetwork further comprises a pooling layer for implementing the following functions:
respectively extracting the position sensitive score map corresponding to each category from the candidate frame, respectively calculating the mean value of the extracted score maps, then forming a matrix according to the positions, and summing all values in the matrix to obtain a value; after all the categories are processed in the same way, all the obtained values jointly form an output vector, and the output vector is normalized, so that the category of the current candidate area is estimated;
and 6, training the network by using a database of the gesture pictures, and storing the trained network model for gesture classification.
The invention has the following technical characteristics:
the whole area full convolution network is a shared full convolution structure, the whole structure is end-to-end learning, high-precision recognition rate is achieved, meanwhile, complex calculation is avoided, and in combination with an OHEM technology, a network model has higher rejection rate on negative samples, and practical application is facilitated. The intelligent behavior analysis and post-processing method in the man-machine interaction system and the like has certain practical value for the intelligent construction in the fields of auxiliary automobile control systems, sign language recognition, personal wearing systems and the like.
Drawings
FIG. 1 is a block diagram of a network in the method of the present invention;
FIG. 2 is a block diagram of a regional candidate network;
FIG. 3 is a schematic diagram of candidate boxes obtained from a feature map;
FIG. 4 is a schematic diagram of seven gestures to be trained in an embodiment of the present invention;
FIG. 5 shows the correspondence between the 9 positions of the position sensitivity score map in gesture 1;
FIG. 6 shows the result of the gesture recognition test according to the present invention.
Detailed Description
The invention provides a gesture recognition method based on a regional full convolution network, which comprises the following steps of:
step 1, establishing a full convolution network
In the scheme, a residual error network ResNet-34 network architecture is used as a framework, the step length of a RerNet-34 network is changed from 32 pixels to 16 pixels, an average pooling layer and a full connection layer of the ResNet-34 network architecture are deleted, and then a full convolution network is constructed by utilizing a convolution layer of the ResNet-34 network architecture so as to extract the characteristics of an input image.
As shown in fig. 1, the full convolution network in this scheme includes two parts, the first part is a convolution layer processing input image with a convolution kernel size of 7 × 7, the second part is four groups of residual blocks with different depths, which are composed of 3 × 3 convolution kernels, and the residual blocks are important structures used by the residual network to extract features.
After an input image passes through a full convolution network, outputting a feature map through a last convolution layer, wherein each pixel point on the feature map generates 9 candidate frames for predicting the position of a coordinate frame; then a total of w x h x 9 candidate frames are generated for the three-dimensional convolution layer with dimension w x h x d (width x height x depth). The candidate box is a rectangular box, and has three shapes, and the length-width ratio is [1, 1.
Step 2, establishing a regional candidate network
Establishing a regional candidate network, wherein the network comprises the last convolution layer of the full convolution network, two branches are arranged behind the convolution layer, one branch is the convolution layer, a first adjusting layer, a normalization layer and a second adjusting layer in sequence, and the branch is used for judging the scores of the foreground and the background of a candidate frame generated by each pixel point on an output characteristic diagram of the convolution layer; the other branch is a convolution layer and is used for predicting the offset of the position of the candidate frame and the position of the real coordinate frame; the first adjusting layer and the second adjusting layer are used for carrying out Reshape operation, namely changing the dimension of the image.
In this embodiment, the convolution kernel size of the convolution layer of the first branch is 1 × 18, after the feature map output by the last convolution layer of the full convolution network passes through the convolution layer of the first branch, the dimension of the obtained feature map is (w, h,9 × 2), and then the feature map is normalized by the first adjustment layer Reshape and the normalization layer, and the prediction probability of the foreground and the background of the candidate frame input image is obtained after the second adjustment layer Reshape; the convolution kernel size of the convolution layer of the second branch is 1 × 36, and the feature map dimension obtained after passing through the convolution layer can be represented as (w, h,4 × 9), which represents the offset of w × h × 9 candidate frames from the real coordinate frame position.
Step 3, training the regional candidate network
The screening candidate box is used for training the area candidate network, and the screening rule is as follows:
as shown in fig. 3, if the overlapping rate of the candidate frame and the real coordinate frame is greater than or equal to 0.7, the candidate frame is considered as a foreground; if the overlapping rate of the candidate frame and the real coordinate frame is less than 0.3, the candidate frame is considered as the background; taking the candidate frames corresponding to the foreground and the background as training data of the regional candidate network for training, wherein the candidate frame corresponding to the foreground is a positive sample and corresponds to the category of the target gesture region; the candidate frame corresponding to the background is a negative sample; while the other candidate boxes do not participate in the training.
The loss function of the regional candidate network training is divided into two parts:cls_lossandreg_loss
cls _ loss is used for classifying the candidate frame as foreground/background, in the scheme, in order to train the regional candidate network, a binary class label (foreground: 0 and background: 1) is allocated to the candidate frame needing to be trained, and p is set i Is the predicted probability that the ith candidate box belongs to the foreground,
Figure BDA0002065498950000051
is a true tag (can only be 0 or 1), thencls_lossThe cross entropy loss function is defined as:
Figure BDA0002065498950000052
reg _ loss is used for regressing the deviation amount of the candidate frame and the real coordinate frame, the regression task can not use the middle cross entropy loss function, and the reg _ loss function is defined as:
Figure BDA0002065498950000053
Figure BDA0002065498950000054
where i ∈ (x, y, w, h), t i Is the ith candidate frame and the real coordinate frame [ x, y, w, h [ ]]The predicted output of the offset of (a),
Figure BDA0002065498950000055
is [ x, y, w, h ] of the ith candidate frame and the real coordinate frame]X, y represent coordinates, and w, h represent width and height.
Due to different orders of magnitude, the two losses are balanced by using an adjustable parameter lambda, so that the two losses can be uniformly considered in the calculation process of the total loss function of the area candidate network during training. The loss function L for a regional candidate network is defined as:
L=cls_loss+λ*reg_loss
the regional candidate network utilizes the loss function L to carry out end-to-end training by a back propagation and random gradient descent method, and weight is initialized by zero mean Gaussian distribution with standard deviation of 0.01; the part for training the required initialization parameters comprises the parameters of the full convolution network in the step 1 and the parameters of the convolution layer in the area candidate network.
Step 4, constructing a position sensitive sub-network
The position sensitive sub-network comprises a convolution layer conv _ L connected after the last convolution layer of the full convolution network, and the position sensitive score map is obtained after the convolution operation is carried out on the output characteristic map after the input image is processed by the full convolution network. The convolutional layer generates a dimension k for each gesture class 2 (c + 1) location sensitive score map (c is gesture class +1 background class), k 2 The position sensitivity scores are the relative positions described by k x k spatial meshes.
The high height and width of the convolution layer are the same as the last convolution layer of the full convolution network, but the number of channels is k (c + 1), wherein k represents the number of grids to be divided, c represents the number of categories of the recognition object, and a background category is added, as shown in fig. 4, the number of categories of the recognition task of the scheme is seven, so 8 categories are provided, each category has k scoring maps, taking gesture 1 as an example, each scoring map represents that the positions in the original input image contain a certain part of gesture 1, the scoring map has high response at a position containing a certain part of corresponding gesture 1, k is taken as 3, and the original input image is divided into 9 different positions and has 9 position sensitivity scoring maps.
Step 5, pooling of location sensitive candidate frames
Step 3, the trained area candidate network outputs the deviation value of the candidate frame and the real coordinate frame, wherein the deviation value comprises four values of position information [ x, y, w, h ] of the candidate frame area; according to the position information, corresponding the candidate frame area to the position sensitive score map obtained in the step 4, wherein the candidate frame area is divided into k × k sub-areas, and each sub-area corresponds to an area on the score map; the location sensitive subnetwork further comprises a pooling layer for implementing the following functions:
respectively extracting position sensitive score maps corresponding to each category from the candidate frames, respectively solving the mean values of the extracted score maps, then forming a matrix according to the positions, and summing all values in the matrix to obtain a value S; after all the categories are processed in the same way, all the obtained values S jointly form an output vector, and the output vector is normalized, so that the category of the current candidate area is estimated.
In this embodiment, each category has 9 position-sensitive score maps, and taking category gesture 1 as an example, 9 score maps of category 1 are extracted from the candidate box, as shown in fig. 5, the extracted score maps are respectively averaged, then a matrix of 3 × 3 size is formed according to the positions, and all values in the matrix of 3 × 3 are summed to obtain a value; and reusing the steps for the categories 2-8 to finally obtain a vector 1 x 8, performing softmax normalization processing on the vector, and calculating softmax responses belonging to each category to estimate the category of the current frame selection area, thereby outputting a prediction result.
The candidate box classification loss function in the location sensitive subnetwork is defined by a cross entropy loss function as follows:
Figure BDA0002065498950000061
wherein s is ci Is the true output belonging to class i in a 1 x 8 dimensional output vector,
Figure BDA0002065498950000071
is the prediction output belonging to class i in the 1 x 8 dimensional output vector, and θ is the parameter set of the convolution layer in the whole network (full convolution network, regional candidate network, location sensitive subnetwork).
And 6, training the whole network by using a database of gesture pictures, and storing the trained network model for gesture classification.
In this embodiment, a CGD database is used to train a network, the database has thirty basic gesture actions, the size of a picture needs to be normalized to 224 × 224, and in addition, a regional candidate network and a position sensitive subnetwork share network parameters, so that the network needs to be trained once to meet requirements, a network model is built through a deep learning framework, and representative 7 gestures are selected from the CGD database, as shown in fig. 4, 8000 training sets and 500 test sets. The training period is set to 500, after the model is saved, an end-to-end test is performed, and after a picture containing a gesture is input, a result is output, as shown in fig. 5. As can be seen from the figure, the type and the coordinate frame position of the gesture are recognized, so that the gesture recognition method has good performance.
The method adopts an OHEM (Online hard execution) technology to calculate the loss function values of all gesture areas, then sorts all the gesture areas according to the loss values, and selects B gesture areas with the highest loss and the largest loss value to carry out back propagation.

Claims (1)

1. A gesture recognition method based on a regional full convolution network is characterized by comprising the following steps:
step 1, establishing a full convolution network
Using a residual error network ResNet-34 network architecture as a framework, changing the step length of a RerNet-34 network from 32 pixels to 16 pixels, deleting an average pooling layer and a full-connection layer of the ResNet-34 network architecture, and then constructing a full convolution network by using a convolution layer of the ResNet-34 network architecture so as to extract the characteristics of an input image; the method comprises the steps that an input image passes through a full convolution network and then outputs a feature map, and each pixel point on the feature map generates a plurality of candidate frames for predicting the position of a coordinate frame;
step 2, establishing regional candidate network
Establishing a regional candidate network, wherein the network comprises the last convolutional layer of the full convolutional network, two branches are arranged behind the convolutional layer, one branch is sequentially the convolutional layer, a first adjusting layer, a normalization layer and a second adjusting layer, the function of the branch is used for judging the candidate frame belongs to the fraction of the foreground and the background, and the other branch is the convolutional layer and used for predicting the offset of the candidate frame and the position of the real coordinate frame; the first adjusting layer and the second adjusting layer are used for changing the dimensionality of the image, and the normalizing layer is used for performing normalization operation;
step 3, training the regional candidate network
The screening candidate box is used for training the regional candidate network, and the screening rule is as follows:
if the overlapping rate of the candidate frame and the real coordinate frame is more than or equal to 0.7, the candidate frame is regarded as a foreground; if the overlapping rate of the candidate frame and the real coordinate frame is less than 0.3, the candidate frame is considered as the background; training by taking candidate frames corresponding to the foreground and the background as training data of the regional candidate network, wherein the candidate frame corresponding to the foreground is a positive sample, and the candidate frame corresponding to the background is a negative sample; the loss function for the training of the regional candidate network is:
L=cls_loss+λ*reg_loss
wherein λ is an adjustable parameter; to train the area candidate network, a binary class label is assigned to the candidate box to be trained, let p i Is the predicted probability that the ith candidate box belongs to the foreground,
Figure FDA0002065498940000011
if it is a true tag, cls _ loss is defined as:
Figure FDA0002065498940000012
reg _ loss is used for the deviation amount between the regression candidate frame and the real coordinate frame, and is defined as:
Figure FDA0002065498940000013
Figure FDA0002065498940000021
where i ∈ (x, y, w, h), t i Is the ith candidate frame and the real coordinate frame [ x, y, w, h [ ]]The predicted output of the offset of (a),
Figure FDA0002065498940000022
is [ x, y, w, h ] of the ith candidate frame and the real coordinate frame]True value of the offset of (2)X, y represent coordinates, w, h represent width, height;
the regional candidate network utilizes the loss function L to carry out end-to-end training by a back propagation and random gradient descent method, and weight is initialized by zero mean Gaussian distribution with standard deviation of 0.01;
step 4, constructing a position sensitive sub-network
The position sensitive sub-network comprises a convolution layer connected after the last convolution layer of the full convolution network, and the position sensitive score map is obtained after the convolution operation is carried out on the output characteristic map after the input image is processed by the full convolution network; the convolutional layer generates a dimension k for each gesture class 2 (c + 1) position sensitive score plot, k 2 The position sensitivity score maps are relative positions described by k × k space grids, wherein c represents the number of classes of the identified objects;
step 5, pooling of location sensitive candidate frames
Outputting the deviation amount of the candidate frame and the real coordinate frame by the trained regional candidate network, wherein the deviation amount comprises the position information of the candidate frame region; according to the position information, corresponding the candidate frame to the position sensitive score map obtained in the step 4, wherein the candidate frame is divided into k × k sub-regions, and each sub-region corresponds to a region on the score map; the location sensitive subnetwork further comprises a pooling layer for implementing the following functions:
respectively extracting the position sensitive score map corresponding to each category from the candidate frame, respectively calculating the mean value of the extracted score maps, then forming a matrix according to the positions, and summing all values in the matrix to obtain a value; after all classes are processed in the same way, all obtained values jointly form an output vector, and the output vector is normalized, so that the class of the current candidate area is estimated;
and 6, training the network by utilizing a database of the gesture pictures, and storing the trained network model for gesture classification.
CN201910419349.1A 2019-05-20 2019-05-20 Gesture recognition method based on regional full convolution network Active CN110334584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910419349.1A CN110334584B (en) 2019-05-20 2019-05-20 Gesture recognition method based on regional full convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910419349.1A CN110334584B (en) 2019-05-20 2019-05-20 Gesture recognition method based on regional full convolution network

Publications (2)

Publication Number Publication Date
CN110334584A CN110334584A (en) 2019-10-15
CN110334584B true CN110334584B (en) 2023-01-20

Family

ID=68139443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910419349.1A Active CN110334584B (en) 2019-05-20 2019-05-20 Gesture recognition method based on regional full convolution network

Country Status (1)

Country Link
CN (1) CN110334584B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026898A (en) * 2019-12-10 2020-04-17 云南大学 Weak supervision image emotion classification and positioning method based on cross space pooling strategy
CN111814626B (en) * 2020-06-29 2021-01-26 中南民族大学 Dynamic gesture recognition method and system based on self-attention mechanism
CN112699837A (en) * 2021-01-13 2021-04-23 新大陆数字技术股份有限公司 Gesture recognition method and device based on deep learning
CN113591764A (en) * 2021-08-09 2021-11-02 广州博冠信息科技有限公司 Gesture recognition method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108010049A (en) * 2017-11-09 2018-05-08 华南理工大学 Split the method in human hand region in stop-motion animation using full convolutional neural networks
CN109299644A (en) * 2018-07-18 2019-02-01 广东工业大学 A kind of vehicle target detection method based on the full convolutional network in region

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709532B (en) * 2017-01-25 2020-03-10 京东方科技集团股份有限公司 Image processing method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108010049A (en) * 2017-11-09 2018-05-08 华南理工大学 Split the method in human hand region in stop-motion animation using full convolutional neural networks
CN109299644A (en) * 2018-07-18 2019-02-01 广东工业大学 A kind of vehicle target detection method based on the full convolutional network in region

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于改进的R-FCN带纹理透明塑料裂痕检测;关日钊 等;《计算机工程与应用》;20190315;第55卷(第6期);第1-2页 *

Also Published As

Publication number Publication date
CN110334584A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN111310861B (en) License plate recognition and positioning method based on deep neural network
CN110334584B (en) Gesture recognition method based on regional full convolution network
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN111310773B (en) Efficient license plate positioning method of convolutional neural network
CN111428765B (en) Target detection method based on global convolution and local depth convolution fusion
CN106845430A (en) Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN112907602B (en) Three-dimensional scene point cloud segmentation method based on improved K-nearest neighbor algorithm
CN109285162A (en) A kind of image, semantic dividing method based on regional area conditional random field models
CN109492596B (en) Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN109064389B (en) Deep learning method for generating realistic images by hand-drawn line drawings
CN111126459A (en) Method and device for identifying fine granularity of vehicle
CN113744311A (en) Twin neural network moving target tracking method based on full-connection attention module
CN111626200A (en) Multi-scale target detection network and traffic identification detection method based on Libra R-CNN
CN111311702B (en) Image generation and identification module and method based on BlockGAN
CN112347970A (en) Remote sensing image ground object identification method based on graph convolution neural network
CN107146219B (en) Image significance detection method based on manifold regularization support vector machine
CN105809716A (en) Superpixel and three-dimensional self-organizing background subtraction algorithm-combined foreground extraction method
CN107577983A (en) It is a kind of to circulate the method for finding region-of-interest identification multi-tag image
CN115223017B (en) Multi-scale feature fusion bridge detection method based on depth separable convolution
CN112396655A (en) Point cloud data-based ship target 6D pose estimation method
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN113988164B (en) Lightweight point cloud target detection method for representative point self-attention mechanism
CN113032613B (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
KR20110037184A (en) Pipelining computer system combining neuro-fuzzy system and parallel processor, method and apparatus for recognizing objects using the computer system in images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant