CN110334584B - Gesture recognition method based on regional full convolution network - Google Patents
Gesture recognition method based on regional full convolution network Download PDFInfo
- Publication number
- CN110334584B CN110334584B CN201910419349.1A CN201910419349A CN110334584B CN 110334584 B CN110334584 B CN 110334584B CN 201910419349 A CN201910419349 A CN 201910419349A CN 110334584 B CN110334584 B CN 110334584B
- Authority
- CN
- China
- Prior art keywords
- network
- candidate
- frame
- layer
- regional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Abstract
The invention discloses a gesture recognition method based on a regional full convolution network, which comprises the steps of carrying out feature extraction on an input gesture image through the full convolution network to obtain a group of feature maps and generate a candidate frame, generating a position sensitive score map by a position sensitive sub-network, and scoring each gesture category through a pooling layer so as to realize the positioning and classification of target gestures; the invention is mainly characterized in that the whole area full convolution network is a shared full convolution structure, the whole structure is end-to-end learning, the high-precision identification rate is realized, meanwhile, the complex calculation is avoided, and in combination with the OHEM technology, the network model has higher rejection rate to the negative sample, is convenient for practical application, and has important significance for the field of human-computer interaction.
Description
Technical Field
The invention relates to the technical field of computer vision, machine learning and pattern recognition, in particular to a method for realizing end-to-end gesture recognition by utilizing a regional full convolution network.
Background
Currently, with the increasing popularity of VR (visual Reality) and AR (administrative Reality), human-computer interaction technology is receiving increasing attention. Gestures are paid attention to by extensive researchers as the most direct and convenient human-computer interaction mode, and gesture recognition gradually becomes an important research direction in the field of computer vision. The significance of how a computer accurately recognizes a gesture is an important part in a gesture human-computer interaction system, and because a human hand is a complicated morphism, the gesture has the characteristics of diversity, ambiguity, time difference and the like, and the gesture is usually in a complicated scene, such as various complicated scene factors of over-bright or over-dark light, existence of a plurality of gestures, different distances between the gesture and equipment and the like, the gesture recognition is still a huge challenge.
The typical gesture recognition method is mainly based on hidden markov model, template matching, artificial neural network and the like. The traditional gesture recognition methods have the defects that the features must be manually set, and then the features are extracted from the gesture for recognition, so that the processing process is complex and the efficiency is low.
Disclosure of Invention
The invention aims to provide a gesture recognition method based on a regional full convolution network, so as to improve recognition efficiency and reduce calculation complexity.
In order to realize the task, the invention adopts the following technical scheme:
a gesture recognition method based on a regional full convolution network comprises the following steps:
Using a residual error network ResNet-34 network architecture as a framework, changing the step length of a RerNet-34 network from 32 pixels to 16 pixels, deleting an average pooling layer and a full-connection layer of the ResNet-34 network architecture, and then constructing a full convolution network by using a convolution layer of the ResNet-34 network architecture so as to extract the characteristics of an input image; the method comprises the steps that an input image outputs a feature map after passing through a full convolution network, and each pixel point on the feature map generates a plurality of candidate frames for predicting the position of a coordinate frame;
step 2, establishing a regional candidate network
Establishing a regional candidate network, wherein the network comprises the last convolutional layer of the full convolutional network, two branches are arranged behind the convolutional layer, one branch is sequentially the convolutional layer, a first adjusting layer, a normalization layer and a second adjusting layer, the function of the branch is used for judging the candidate frame belongs to the fraction of the foreground and the background, and the other branch is the convolutional layer and used for predicting the offset of the candidate frame and the position of the real coordinate frame; the first adjusting layer and the second adjusting layer are used for changing the dimensionality of the image, and the normalizing layer is used for performing normalization operation;
The screening candidate box is used for training the regional candidate network, and the screening rule is as follows:
if the overlapping rate of the candidate frame and the real coordinate frame is more than or equal to 0.7, the candidate frame is considered as a foreground; if the overlapping rate of the candidate frame and the real coordinate frame is less than 0.3, the candidate frame is considered as the background; training by taking candidate frames corresponding to the foreground and the background as training data of the regional candidate network, wherein the candidate frame corresponding to the foreground is a positive sample, and the candidate frame corresponding to the background is a negative sample; the loss function for the regional candidate network training is:
L=cls_loss+λ*reg_loss
wherein λ is an adjustable parameter; to train the area candidate network, a binary class label is assigned to the candidate box to be trained, let p i Is the predicted probability that the ith candidate box belongs to the foreground,is a true tag, thencls_lossIs defined as:
reg _ loss is used to regress the deviation between the candidate frame and the real coordinate frame, and is defined as:
where i ∈ (x, y, w, h), t i Is the ith candidate frame and the real coordinate frame [ x, y, w, h ]]The predicted output of the offset of (a),is [ x, y, w, h ] of the ith candidate frame and the real coordinate frame]X and y represent coordinates, and w and h represent width and height;
the regional candidate network utilizes the loss function L to carry out end-to-end training by a back propagation and random gradient descent method, and weight is initialized by zero mean Gaussian distribution with standard deviation of 0.01;
step 4, constructing a position sensitive sub-network
The location sensitive subnetwork includes the last convolutional layer in the full convolutional networkAfter the input image is processed by the full convolution network, the output characteristic diagram is subjected to convolution operation by the convolution layer to obtain a position sensitive score diagram; the convolutional layer generates a dimension k for each gesture class 2 (c + 1) position sensitive score plot, k 2 The position sensitivity score maps are relative positions described by k × k space grids, wherein c represents the number of classes of the identified objects;
step 5, pooling of location sensitive candidate frames
Outputting the deviation amount of the candidate frame and the real coordinate frame by the trained regional candidate network, wherein the deviation amount comprises the position information of the candidate frame region; according to the position information, corresponding the candidate frame to the position sensitive score map obtained in the step 4, wherein the candidate frame is divided into k × k sub-regions, and each sub-region corresponds to a region on the score map; the location sensitive subnetwork further comprises a pooling layer for implementing the following functions:
respectively extracting the position sensitive score map corresponding to each category from the candidate frame, respectively calculating the mean value of the extracted score maps, then forming a matrix according to the positions, and summing all values in the matrix to obtain a value; after all the categories are processed in the same way, all the obtained values jointly form an output vector, and the output vector is normalized, so that the category of the current candidate area is estimated;
and 6, training the network by using a database of the gesture pictures, and storing the trained network model for gesture classification.
The invention has the following technical characteristics:
the whole area full convolution network is a shared full convolution structure, the whole structure is end-to-end learning, high-precision recognition rate is achieved, meanwhile, complex calculation is avoided, and in combination with an OHEM technology, a network model has higher rejection rate on negative samples, and practical application is facilitated. The intelligent behavior analysis and post-processing method in the man-machine interaction system and the like has certain practical value for the intelligent construction in the fields of auxiliary automobile control systems, sign language recognition, personal wearing systems and the like.
Drawings
FIG. 1 is a block diagram of a network in the method of the present invention;
FIG. 2 is a block diagram of a regional candidate network;
FIG. 3 is a schematic diagram of candidate boxes obtained from a feature map;
FIG. 4 is a schematic diagram of seven gestures to be trained in an embodiment of the present invention;
FIG. 5 shows the correspondence between the 9 positions of the position sensitivity score map in gesture 1;
FIG. 6 shows the result of the gesture recognition test according to the present invention.
Detailed Description
The invention provides a gesture recognition method based on a regional full convolution network, which comprises the following steps of:
In the scheme, a residual error network ResNet-34 network architecture is used as a framework, the step length of a RerNet-34 network is changed from 32 pixels to 16 pixels, an average pooling layer and a full connection layer of the ResNet-34 network architecture are deleted, and then a full convolution network is constructed by utilizing a convolution layer of the ResNet-34 network architecture so as to extract the characteristics of an input image.
As shown in fig. 1, the full convolution network in this scheme includes two parts, the first part is a convolution layer processing input image with a convolution kernel size of 7 × 7, the second part is four groups of residual blocks with different depths, which are composed of 3 × 3 convolution kernels, and the residual blocks are important structures used by the residual network to extract features.
After an input image passes through a full convolution network, outputting a feature map through a last convolution layer, wherein each pixel point on the feature map generates 9 candidate frames for predicting the position of a coordinate frame; then a total of w x h x 9 candidate frames are generated for the three-dimensional convolution layer with dimension w x h x d (width x height x depth). The candidate box is a rectangular box, and has three shapes, and the length-width ratio is [1, 1.
Step 2, establishing a regional candidate network
Establishing a regional candidate network, wherein the network comprises the last convolution layer of the full convolution network, two branches are arranged behind the convolution layer, one branch is the convolution layer, a first adjusting layer, a normalization layer and a second adjusting layer in sequence, and the branch is used for judging the scores of the foreground and the background of a candidate frame generated by each pixel point on an output characteristic diagram of the convolution layer; the other branch is a convolution layer and is used for predicting the offset of the position of the candidate frame and the position of the real coordinate frame; the first adjusting layer and the second adjusting layer are used for carrying out Reshape operation, namely changing the dimension of the image.
In this embodiment, the convolution kernel size of the convolution layer of the first branch is 1 × 18, after the feature map output by the last convolution layer of the full convolution network passes through the convolution layer of the first branch, the dimension of the obtained feature map is (w, h,9 × 2), and then the feature map is normalized by the first adjustment layer Reshape and the normalization layer, and the prediction probability of the foreground and the background of the candidate frame input image is obtained after the second adjustment layer Reshape; the convolution kernel size of the convolution layer of the second branch is 1 × 36, and the feature map dimension obtained after passing through the convolution layer can be represented as (w, h,4 × 9), which represents the offset of w × h × 9 candidate frames from the real coordinate frame position.
The screening candidate box is used for training the area candidate network, and the screening rule is as follows:
as shown in fig. 3, if the overlapping rate of the candidate frame and the real coordinate frame is greater than or equal to 0.7, the candidate frame is considered as a foreground; if the overlapping rate of the candidate frame and the real coordinate frame is less than 0.3, the candidate frame is considered as the background; taking the candidate frames corresponding to the foreground and the background as training data of the regional candidate network for training, wherein the candidate frame corresponding to the foreground is a positive sample and corresponds to the category of the target gesture region; the candidate frame corresponding to the background is a negative sample; while the other candidate boxes do not participate in the training.
The loss function of the regional candidate network training is divided into two parts:cls_lossandreg_loss。
cls _ loss is used for classifying the candidate frame as foreground/background, in the scheme, in order to train the regional candidate network, a binary class label (foreground: 0 and background: 1) is allocated to the candidate frame needing to be trained, and p is set i Is the predicted probability that the ith candidate box belongs to the foreground,is a true tag (can only be 0 or 1), thencls_lossThe cross entropy loss function is defined as:
reg _ loss is used for regressing the deviation amount of the candidate frame and the real coordinate frame, the regression task can not use the middle cross entropy loss function, and the reg _ loss function is defined as:
where i ∈ (x, y, w, h), t i Is the ith candidate frame and the real coordinate frame [ x, y, w, h [ ]]The predicted output of the offset of (a),is [ x, y, w, h ] of the ith candidate frame and the real coordinate frame]X, y represent coordinates, and w, h represent width and height.
Due to different orders of magnitude, the two losses are balanced by using an adjustable parameter lambda, so that the two losses can be uniformly considered in the calculation process of the total loss function of the area candidate network during training. The loss function L for a regional candidate network is defined as:
L=cls_loss+λ*reg_loss
the regional candidate network utilizes the loss function L to carry out end-to-end training by a back propagation and random gradient descent method, and weight is initialized by zero mean Gaussian distribution with standard deviation of 0.01; the part for training the required initialization parameters comprises the parameters of the full convolution network in the step 1 and the parameters of the convolution layer in the area candidate network.
Step 4, constructing a position sensitive sub-network
The position sensitive sub-network comprises a convolution layer conv _ L connected after the last convolution layer of the full convolution network, and the position sensitive score map is obtained after the convolution operation is carried out on the output characteristic map after the input image is processed by the full convolution network. The convolutional layer generates a dimension k for each gesture class 2 (c + 1) location sensitive score map (c is gesture class +1 background class), k 2 The position sensitivity scores are the relative positions described by k x k spatial meshes.
The high height and width of the convolution layer are the same as the last convolution layer of the full convolution network, but the number of channels is k (c + 1), wherein k represents the number of grids to be divided, c represents the number of categories of the recognition object, and a background category is added, as shown in fig. 4, the number of categories of the recognition task of the scheme is seven, so 8 categories are provided, each category has k scoring maps, taking gesture 1 as an example, each scoring map represents that the positions in the original input image contain a certain part of gesture 1, the scoring map has high response at a position containing a certain part of corresponding gesture 1, k is taken as 3, and the original input image is divided into 9 different positions and has 9 position sensitivity scoring maps.
Step 5, pooling of location sensitive candidate frames
respectively extracting position sensitive score maps corresponding to each category from the candidate frames, respectively solving the mean values of the extracted score maps, then forming a matrix according to the positions, and summing all values in the matrix to obtain a value S; after all the categories are processed in the same way, all the obtained values S jointly form an output vector, and the output vector is normalized, so that the category of the current candidate area is estimated.
In this embodiment, each category has 9 position-sensitive score maps, and taking category gesture 1 as an example, 9 score maps of category 1 are extracted from the candidate box, as shown in fig. 5, the extracted score maps are respectively averaged, then a matrix of 3 × 3 size is formed according to the positions, and all values in the matrix of 3 × 3 are summed to obtain a value; and reusing the steps for the categories 2-8 to finally obtain a vector 1 x 8, performing softmax normalization processing on the vector, and calculating softmax responses belonging to each category to estimate the category of the current frame selection area, thereby outputting a prediction result.
The candidate box classification loss function in the location sensitive subnetwork is defined by a cross entropy loss function as follows:
wherein s is ci Is the true output belonging to class i in a 1 x 8 dimensional output vector,is the prediction output belonging to class i in the 1 x 8 dimensional output vector, and θ is the parameter set of the convolution layer in the whole network (full convolution network, regional candidate network, location sensitive subnetwork).
And 6, training the whole network by using a database of gesture pictures, and storing the trained network model for gesture classification.
In this embodiment, a CGD database is used to train a network, the database has thirty basic gesture actions, the size of a picture needs to be normalized to 224 × 224, and in addition, a regional candidate network and a position sensitive subnetwork share network parameters, so that the network needs to be trained once to meet requirements, a network model is built through a deep learning framework, and representative 7 gestures are selected from the CGD database, as shown in fig. 4, 8000 training sets and 500 test sets. The training period is set to 500, after the model is saved, an end-to-end test is performed, and after a picture containing a gesture is input, a result is output, as shown in fig. 5. As can be seen from the figure, the type and the coordinate frame position of the gesture are recognized, so that the gesture recognition method has good performance.
The method adopts an OHEM (Online hard execution) technology to calculate the loss function values of all gesture areas, then sorts all the gesture areas according to the loss values, and selects B gesture areas with the highest loss and the largest loss value to carry out back propagation.
Claims (1)
1. A gesture recognition method based on a regional full convolution network is characterized by comprising the following steps:
step 1, establishing a full convolution network
Using a residual error network ResNet-34 network architecture as a framework, changing the step length of a RerNet-34 network from 32 pixels to 16 pixels, deleting an average pooling layer and a full-connection layer of the ResNet-34 network architecture, and then constructing a full convolution network by using a convolution layer of the ResNet-34 network architecture so as to extract the characteristics of an input image; the method comprises the steps that an input image passes through a full convolution network and then outputs a feature map, and each pixel point on the feature map generates a plurality of candidate frames for predicting the position of a coordinate frame;
step 2, establishing regional candidate network
Establishing a regional candidate network, wherein the network comprises the last convolutional layer of the full convolutional network, two branches are arranged behind the convolutional layer, one branch is sequentially the convolutional layer, a first adjusting layer, a normalization layer and a second adjusting layer, the function of the branch is used for judging the candidate frame belongs to the fraction of the foreground and the background, and the other branch is the convolutional layer and used for predicting the offset of the candidate frame and the position of the real coordinate frame; the first adjusting layer and the second adjusting layer are used for changing the dimensionality of the image, and the normalizing layer is used for performing normalization operation;
step 3, training the regional candidate network
The screening candidate box is used for training the regional candidate network, and the screening rule is as follows:
if the overlapping rate of the candidate frame and the real coordinate frame is more than or equal to 0.7, the candidate frame is regarded as a foreground; if the overlapping rate of the candidate frame and the real coordinate frame is less than 0.3, the candidate frame is considered as the background; training by taking candidate frames corresponding to the foreground and the background as training data of the regional candidate network, wherein the candidate frame corresponding to the foreground is a positive sample, and the candidate frame corresponding to the background is a negative sample; the loss function for the training of the regional candidate network is:
L=cls_loss+λ*reg_loss
wherein λ is an adjustable parameter; to train the area candidate network, a binary class label is assigned to the candidate box to be trained, let p i Is the predicted probability that the ith candidate box belongs to the foreground,if it is a true tag, cls _ loss is defined as:
reg _ loss is used for the deviation amount between the regression candidate frame and the real coordinate frame, and is defined as:
where i ∈ (x, y, w, h), t i Is the ith candidate frame and the real coordinate frame [ x, y, w, h [ ]]The predicted output of the offset of (a),is [ x, y, w, h ] of the ith candidate frame and the real coordinate frame]True value of the offset of (2)X, y represent coordinates, w, h represent width, height;
the regional candidate network utilizes the loss function L to carry out end-to-end training by a back propagation and random gradient descent method, and weight is initialized by zero mean Gaussian distribution with standard deviation of 0.01;
step 4, constructing a position sensitive sub-network
The position sensitive sub-network comprises a convolution layer connected after the last convolution layer of the full convolution network, and the position sensitive score map is obtained after the convolution operation is carried out on the output characteristic map after the input image is processed by the full convolution network; the convolutional layer generates a dimension k for each gesture class 2 (c + 1) position sensitive score plot, k 2 The position sensitivity score maps are relative positions described by k × k space grids, wherein c represents the number of classes of the identified objects;
step 5, pooling of location sensitive candidate frames
Outputting the deviation amount of the candidate frame and the real coordinate frame by the trained regional candidate network, wherein the deviation amount comprises the position information of the candidate frame region; according to the position information, corresponding the candidate frame to the position sensitive score map obtained in the step 4, wherein the candidate frame is divided into k × k sub-regions, and each sub-region corresponds to a region on the score map; the location sensitive subnetwork further comprises a pooling layer for implementing the following functions:
respectively extracting the position sensitive score map corresponding to each category from the candidate frame, respectively calculating the mean value of the extracted score maps, then forming a matrix according to the positions, and summing all values in the matrix to obtain a value; after all classes are processed in the same way, all obtained values jointly form an output vector, and the output vector is normalized, so that the class of the current candidate area is estimated;
and 6, training the network by utilizing a database of the gesture pictures, and storing the trained network model for gesture classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910419349.1A CN110334584B (en) | 2019-05-20 | 2019-05-20 | Gesture recognition method based on regional full convolution network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910419349.1A CN110334584B (en) | 2019-05-20 | 2019-05-20 | Gesture recognition method based on regional full convolution network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110334584A CN110334584A (en) | 2019-10-15 |
CN110334584B true CN110334584B (en) | 2023-01-20 |
Family
ID=68139443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910419349.1A Active CN110334584B (en) | 2019-05-20 | 2019-05-20 | Gesture recognition method based on regional full convolution network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334584B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026898A (en) * | 2019-12-10 | 2020-04-17 | 云南大学 | Weak supervision image emotion classification and positioning method based on cross space pooling strategy |
CN111814626B (en) * | 2020-06-29 | 2021-01-26 | 中南民族大学 | Dynamic gesture recognition method and system based on self-attention mechanism |
CN112699837A (en) * | 2021-01-13 | 2021-04-23 | 新大陆数字技术股份有限公司 | Gesture recognition method and device based on deep learning |
CN113591764A (en) * | 2021-08-09 | 2021-11-02 | 广州博冠信息科技有限公司 | Gesture recognition method and device, storage medium and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108010049A (en) * | 2017-11-09 | 2018-05-08 | 华南理工大学 | Split the method in human hand region in stop-motion animation using full convolutional neural networks |
CN109299644A (en) * | 2018-07-18 | 2019-02-01 | 广东工业大学 | A kind of vehicle target detection method based on the full convolutional network in region |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709532B (en) * | 2017-01-25 | 2020-03-10 | 京东方科技集团股份有限公司 | Image processing method and device |
-
2019
- 2019-05-20 CN CN201910419349.1A patent/CN110334584B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108010049A (en) * | 2017-11-09 | 2018-05-08 | 华南理工大学 | Split the method in human hand region in stop-motion animation using full convolutional neural networks |
CN109299644A (en) * | 2018-07-18 | 2019-02-01 | 广东工业大学 | A kind of vehicle target detection method based on the full convolutional network in region |
Non-Patent Citations (1)
Title |
---|
基于改进的R-FCN带纹理透明塑料裂痕检测;关日钊 等;《计算机工程与应用》;20190315;第55卷(第6期);第1-2页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110334584A (en) | 2019-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111310861B (en) | License plate recognition and positioning method based on deep neural network | |
CN110334584B (en) | Gesture recognition method based on regional full convolution network | |
CN111489358B (en) | Three-dimensional point cloud semantic segmentation method based on deep learning | |
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN111310773B (en) | Efficient license plate positioning method of convolutional neural network | |
CN111428765B (en) | Target detection method based on global convolution and local depth convolution fusion | |
CN106845430A (en) | Pedestrian detection and tracking based on acceleration region convolutional neural networks | |
CN112907602B (en) | Three-dimensional scene point cloud segmentation method based on improved K-nearest neighbor algorithm | |
CN109285162A (en) | A kind of image, semantic dividing method based on regional area conditional random field models | |
CN109492596B (en) | Pedestrian detection method and system based on K-means clustering and regional recommendation network | |
CN109064389B (en) | Deep learning method for generating realistic images by hand-drawn line drawings | |
CN111126459A (en) | Method and device for identifying fine granularity of vehicle | |
CN113744311A (en) | Twin neural network moving target tracking method based on full-connection attention module | |
CN111626200A (en) | Multi-scale target detection network and traffic identification detection method based on Libra R-CNN | |
CN111311702B (en) | Image generation and identification module and method based on BlockGAN | |
CN112347970A (en) | Remote sensing image ground object identification method based on graph convolution neural network | |
CN107146219B (en) | Image significance detection method based on manifold regularization support vector machine | |
CN105809716A (en) | Superpixel and three-dimensional self-organizing background subtraction algorithm-combined foreground extraction method | |
CN107577983A (en) | It is a kind of to circulate the method for finding region-of-interest identification multi-tag image | |
CN115223017B (en) | Multi-scale feature fusion bridge detection method based on depth separable convolution | |
CN112396655A (en) | Point cloud data-based ship target 6D pose estimation method | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN113988164B (en) | Lightweight point cloud target detection method for representative point self-attention mechanism | |
CN113032613B (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
KR20110037184A (en) | Pipelining computer system combining neuro-fuzzy system and parallel processor, method and apparatus for recognizing objects using the computer system in images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |