CN108491880A - Object classification based on neural network and position and orientation estimation method - Google Patents
Object classification based on neural network and position and orientation estimation method Download PDFInfo
- Publication number
- CN108491880A CN108491880A CN201810243399.4A CN201810243399A CN108491880A CN 108491880 A CN108491880 A CN 108491880A CN 201810243399 A CN201810243399 A CN 201810243399A CN 108491880 A CN108491880 A CN 108491880A
- Authority
- CN
- China
- Prior art keywords
- layers
- input
- cad model
- layer
- characteristic pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of object classification based on neural network and position and orientation estimation method, mainly solve the problems, such as that prior art precision when carrying out object detection and Attitude estimation using convolutional neural networks is low.Its implementation is:1) each CAD model multi-view image in data set is obtained;2) mathematical model of joint-detection is built according to the multi-view image of CAD model;3) convolutional neural networks and the multi-view image training convolutional neural networks using CAD model are built;(4) multi-view image of each CAD model in test set is input to neural network, the class label and pose label of output nerve neural network forecast.Present invention incorporates neural network shallow-layer characteristic patterns and further feature figure so that had both remained abundant posture information in conjunction with later characteristic pattern, and had also remained good classification information, and had improved the accuracy of classification and pose estimation.It can be used for intelligent machine arm and robot crawl.
Description
Technical field
The invention belongs to artificial intelligence field, it is related to a kind of object classification and position and orientation estimation method, can be used for intelligent machine
Arm and robot crawl.
Background technology
Convolutional neural networks CNN is a kind of feedforward neural network, by convolutional layer, full articulamentum, pond layer and active coating
Composition.Compared to the neural network that tradition connects entirely, convolutional neural networks are made by the connection of application part and weights technology of sharing
The neuron weights obtained on same Feature Mapping face are identical, greatly reduce the number of parameters of network, reduce the complexity of network
Degree.Activation primitive is also gradually evolved into the ReLU of unilateral inhibition by sigmoid.Activation primitive is continuously improved so that neuron
It is more nearly the characteristic of biological neuron activation.In addition, CNN avoids the complicated pre-processing to image, including complicated
Feature extraction and data reconstruction can directly input original image.Gradient declines and the application of chain type Rule for derivation so that nerve
Network is capable of the mutual iteration of good progress propagated forward and backpropagation, and accuracy of detection is continuously improved.And in numerous depths
It spends in learning framework, caffe is relatively common one kind, using more in terms of video, image procossing.The modularization of Caffe,
Expression is detached with realization, Python the and Matlab interfaces for facilitating switching and offer between gpu and cpu so that Wo Menke
To use Caffe easily to carry out network structure regulation and network training.
In recent years, deep learning achieved significantly in image classification, object detection, semantic segmentation, example segmentation etc.
Progress.General vision system needs solve two problems:The Attitude estimation of object classification and object, so-called Attitude estimation,
Refer to posture of the object relative to camera.Object pose estimation is crucial in many applications, such as robot crawl
Etc..But object classification and pose estimation are conflicting again, no matter categorizing system needs object in any posture, all may be used
Correctly to classify.Therefore categorizing system study is and the incoherent feature of viewpoint.And object pose is estimated, system needs
Study keeps the feature of object geometry and vision, to distinguish its pose.For convolutional neural networks, the characteristic pattern of shallow-layer tends to
In more generally, the uncertain feature of classification, but contain the feature between more different positions and poses.Further feature figure is more
Add abstract, category feature is more obvious, but the information of specific pose is because of high abstraction and unobvious.Existing detection method one
As be all the layer for selecting a centre feature, the feature of this layer is all better in the performance that classification and pose are estimated,
Therefore it is a kind of method of compromise, the precision of object detection and Attitude estimation cannot be made while reaching best.
Method MVCNN by Hang Su et al. a kind of object classifications and pose estimation proposed in 2015, this method propose
It converts sample 3D data in the various visual angles picture of 2D, carries out Data Dimensionality Reduction under the premise of ensureing accuracy of detection, although can
It to simplify processing procedure, but needs to carry out feature extraction to the picture at all visual angles of object, remerges each multi-perspective picture
Information.This is in actual scene, because target object has phenomena such as blocking, blocking, gives from all predefined view collection objects
Body various visual angles picture brings difficulty, does not meet the demand in actual scene.
Invention content
It is an object of the invention in view of the above shortcomings of the prior art, propose a kind of object classification based on neural network
And position and orientation estimation method accelerates detection speed, meets the need of actual scene to improve the precision of object detection and pose estimation
It asks.
The present invention technical thought be:By improving object in conjunction with convolutional neural networks middle-shallow layer feature and further feature
Detection and pose estimated accuracy;By the iteration of the image to detection object part visual angle, accelerate the speed of detection.Its realization side
Case includes as follows:
(1) training set and test set, the corresponding image of setting CAD model are obtained:
3429 CAD models are taken out from ModelNet10 data sets as training set, take out 1469 CAD as test
Collection;
To the CAD model of each sample in ModelNet10 data sets, two kinds of tactful pretreatments are carried out successively:The first
Be where CAD model visual angle circle on 12 predefined viewpoints are equably set, this 12 it is predefined each regard
The corresponding image of point acquisition CAD model;Second is that CAD model is placed on regular dodecahedron center, by regular dodecahedron
20 vertex are set as predefined viewpoint, in this 20 each predefined corresponding image of view collection CAD model;
(2) according to the multi-view image that each CAD model pre-processes is concentrated to data, the mathematics of joint-detection is built
Model:
(2a) is denoted as { v using the pose label of each CAD model as hidden variablei};
(2b) is by M image of CAD model different visual anglesIt is fixed with the class label y ∈ { 1 .., N } of CAD model
Justice is training sample, and wherein N is total classification number of CAD model, each multi-view image xi, a visual angle label v is corresponded to respectivelyi∈
{1,..,M};
Object identification and pose estimation task are abstracted as following optimization by (2c) basis above to the definition of training sample
Problem:
Wherein R is neural network weight parameter,For the class label of neural network prediction,
It is the probability that the class label of the Softmax layers output in convolutional neural networks CNN is y;
(3) structure and training convolutional neural networks CNN:
(3a) on the basis of existing AlexNet networks, increase Eltwise1 layers, fc_a1 layers, fc_a2 layers,
It Eltwise2 layers, obtains one and contains 16 layer convolutional neural networks CNN, wherein:
The Eltwise1 layers for melting Conv3 layers in AlexNet networks with Conv4 layers of characteristic pattern corresponding position
It closes;
The fc_a1 layers by Eltwise1 layers of characteristic pattern for being mapped as feature vector;
Pool5 layers of Feature Mapping in AlexNet networks are feature vector by the fc_a2;
The Eltwise2 layers for melting fc_a1 layers, fc_a2 layers and Eltwise1 layers of characteristic pattern corresponding position
It closes;
(3b) is by the multi-view image of each CAD model in training setIt is input in convolutional network, iterative convolution
Neural network, optimization neural network parameter R, until neural network are trained in the forward calculation of neural network CNN and backpropagation
Until loss function J (θ)≤0.0001, trained neural network CNN is obtained;
(4) test network
By the multi-view image of each CAD model in ModelNet10 test setsIt is input to trained nerve
In network, the precision of object classification and Attitude estimation is counted.
Compared with the prior art, the present invention has the following advantages:
1. the present invention is melted due to merging the element of the characteristic pattern relative position of different depth in convolutional neural networks
It closes obtained new characteristic pattern and had both contained posture information abundant in shallow-layer characteristic pattern, also contain in further feature figure and be abstracted
Specific classification information, therefore improve the precision of detection.
2. the present invention generates its corresponding multi-view image, i.e., due to concentrating each 3D CAD model to data
It converts the sample data of 3D to the multi-view image of 2D, dimension-reduction treatment is carried out to data, therefore reduce the complexity of data,
The calculation amount for reducing feature extraction accelerates the speed of detection.
Description of the drawings
Fig. 1 is the implementation flow chart of the present invention;
Fig. 2 is two kinds of predefined viewpoint strategy schematic diagrames in the present invention;
Fig. 3 is the convolutional neural networks CNN structure charts built in the present invention.
Specific implementation mode
Below in conjunction with the accompanying drawings, the example and effect of the present invention are described in further detail.
Referring to Fig.1, steps are as follows for realization of the invention:
Step 1, CAD model multi-view image is obtained.
To the CAD model of each sample in ModelNet10 data sets, two kinds of tactful pretreatments are carried out successively.
As shown in Fig. 2 (a), the first pretreatment strategy is equably arranged 12 in advance on the visual angle circle where CAD model
The viewpoint of definition first fixes an axis as rotary shaft, then every 30 degree of settings, one sight on the visual angle circle where object
It examines a little, so that it may on 360 degree of visual angle circle, obtain the image that each CAD model corresponds to 12 different visual angles;
As shown in Fig. 2 (b), second of pretreatment strategy is that CAD model is placed on regular dodecahedron center, by positive 12
20 vertex of face body are set as predefined viewpoint, corresponding in this 20 each predefined view collection CAD model
Image.
Step 2, the multi-view image that each CAD model pre-processes is concentrated according to data, builds joint-detection
Mathematical model.
(2a) is denoted as { v using the pose label of each CAD model as hidden variablei};
(2b) is by M image of CAD model different visual anglesIt is fixed with the class label y ∈ { 1 .., N } of CAD model
Justice is training sample, and wherein N is total classification number of CAD model, xiFor multi-view image, each multi-view image xiOne is corresponded to respectively
Visual angle label vi∈{1,..,M};
Object identification and pose estimation task are abstracted as following optimization by (2c) basis above to the definition of training sample
Problem:
Wherein R is neural network weight parameter,For the class label of neural network prediction,It is volume
The probability that the class label of Softmax layers output in product neural network CNN is y;
It willIt is denoted asThen optimization problem is expressed as following form:
Wherein (i) indicates input picture xi, k expression images xiClass label, j indicates image xiIt is predefined from j-th
What viewing point arrived.
Step 3, structure convolutional neural networks CNN.
(3a) structure convolutional neural networks CNN containing 16 layers as shown in Figure 3, this 16 layers are the first convolutional layer successively
Conv1, the first pond layer Pool1, the second convolutional layer Conv2, the second pond layer Pool2, third convolutional layer Conv3, Volume Four
Lamination Conv4, fisrt feature fused layer Eltwise1, the 5th convolutional layer Conv5, the 5th pond layer Pool5, the first full articulamentum
Fc_a1, the second full articulamentum fc_a2, the full articulamentum fc6 of third, the 4th full articulamentum fc7, second feature fused layer
Eltwise2, the 5th full articulamentum fc8, classification layer Softmax, every layer of feature extraction details are as follows:
The image of 227*227 pixel sizes is input to the first convolutional layer Conv1 by (3a1), and convolution kernel size is carried out to it
For the convolution operation that 11*11 pixels and step-length are 4 pixels the spy of 96 55*55 pixel sizes is obtained in total with 96 convolution kernels
Sign figure;
96 characteristic patterns of the first convolutional layer Conv1 outputs are input to the first pond layer Pool1 by (3a2), are carried out to it
The size of maximum pondization operation, pond block is 3*3 pixels, and step-length is 2 pixels, obtains the characteristic pattern of 96 27*27 pixel sizes;
First pond layer Pool1,96 characteristic patterns exported are input to the second convolutional layer Conv2 by (3a3), are carried out to it
It is big to obtain 256 27*27 pixels in total with 256 convolution kernels for the convolution operation that convolution kernel size is 5*5 pixels and step-length is 1
Small characteristic pattern;
256 characteristic patterns of the second convolutional layer Conv2 outputs are input to the second pond layer Pool2 by (3a4), are carried out to it
The size of maximum pondization operation, pond block is 3*3 pixels, and step-length is 2 pixels, obtains the feature of 256 13*13 pixel sizes
Figure;
Second pond layer Pool2,256 characteristic patterns exported are input to third convolutional layer Conv3 by (3a5), are carried out to it
The convolution operation that convolution kernel size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, obtains 384 13*13 pictures
The characteristic pattern of plain size;
384 characteristic patterns that third convolutional layer Conv3 is exported are input to Volume Four lamination Con4 by (3a6), are carried out to it
The convolution operation that convolution kernel size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, obtains 384 13*13 pictures
The characteristic pattern of plain size;
(3a7) by the characteristic pattern of third convolutional layer Conv3 and Volume Four lamination Conv4 be input to the first Eltwise1 layers into
Row characteristic pattern merges, and obtains the characteristic pattern of 384 13*13 pixel sizes;
384 characteristic patterns that Volume Four lamination Conv4 is exported are input to the 5th convolutional layer Conv5 by (3a8), are carried out to it
The convolution operation that convolution kernel size is 3*3 pixels and step-length is 1 pixel obtains 256 13*13 pixels that is, with 256 convolution kernels
The characteristic pattern of size;
256 characteristic patterns of the 5th convolutional layer Conv5 outputs are input to the 5th pond layer Pool5 by (3a9), are carried out to it
Maximum pondization operation, pond block size are 3*3 pixel sizes, and step-length is 2 pixels, obtains the feature of 256 6*6 pixel sizes
Figure;
384 characteristic patterns of the first Eltwise1 layers of output are input to the first full articulamentum fc_a1 by (3a10), by feature
Figure is mapped as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the second full articulamentum fc_a2 by (3a11), will
Characteristic pattern is mapped as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the full articulamentum fc6 of third by (3a12), will be special
Sign figure is mapped as the feature vector of 1*1*4096 pixel sizes;
The feature vector of the 1*1*4096 pixel sizes of the full articulamentum fc6 layers of output of third is input to the 4th entirely by (3a13)
Articulamentum fc7 continues feature extraction, obtains the feature vector of 1*1*4096 pixel sizes;
First full articulamentum fc_a1, the second full connection fc_a2 and the 4th are connected fc7 layers of feature vector by (3a14) entirely
The 2nd Eltwise2 layers are input to, the fusion of feature vector is carried out, obtains the feature vector of 1*1*4096 pixel sizes;
The characteristic pattern of the 2nd Eltwise2 layers of 1*1*4096 pixel sizes exported is input to the 5th full connection by (3a15)
Layer fc8, by the feature vector that maps feature vectors are 1*1*11*M pixel sizes, wherein M is multi-view image number, symbol " * "
It indicates to be multiplied;
(3a16) is by 1*1*11*M) feature vector of plain size is input to classification layer Softmax, obtain image xiClassification
Label, selection is so that the maximum visual angle label v of class probabilityiAs its pose label;
Step 4, convolutional neural networks CNN training is carried out.
(3b1) takes a training sample in the propagated forward stage from training set, by the multi-view image of the training sampleIt is input to the input layer of convolutional neural networks CNN, by feature extraction and Feature Mapping, most by Softmax layers of output
Terminate fruit;
(3b2) calculates the ideal output of convolutional neural networks CNN reality outputs and training sample in back-propagation phase
Difference, by the method for minimization error, backpropagation adjusts the weighting parameter R of convolutional neural networks;
(3b3) repeats the operation of (3b1) and (3b2), until convolutional neural networks CNN loss functions J (θ)≤0.0001 is
Only, trained neural network is obtained.
Step 5, test network.
By the multi-view image of each CAD model in ModelNet10 test setsIt is input to trained nerve
In network, the class label and pose label of output nerve neural network forecast;
Statistical test concentrates the CAD model number of class label and pose tag error to account for all CAD moulds in test set respectively
The percentage of type quantity obtains object classification and Attitude estimation precision.
With reference to emulation, the effect of the present invention is further described:
1, simulated conditions
Computer operating system used in the emulation experiment of the present invention is the 64 Ubuntu systems for being, CPU is Intel Core
I3 4.2GHz inside save as 16.00GB, and GPU is GeForce GTX 1070, and the deep learning frame used is Caffe2.
2, experiment content and result
In experiment, the training and test of network are carried out using ModelNet10 data sets.It is wrapped in ModelNet10 data sets
4898 CAD models containing 10 classifications, CAD model number is 3429 wherein in training set, and CAD model is in test set
1469, to each CAD model that data are concentrated, generate its multi-view image;
The multi-view image of sample in test set is input in trained convolutional network, wherein neural network prediction
The CAD model number of class label mistake is 77, and the CAD model number of pose tag error is 609.Statistics obtains point of network
Class and Attitude estimation precision, and compared with several existing detection methods, as shown in the table:
Table 1
Method | Nicety of grading | Attitude estimation precision |
The present invention | 94.76 | 58.52 |
Rotationnet | 94.38 | 58.33 |
MVCNN | 92.10 | - |
FusionNet | 90.80 | - |
Wherein, RotationNet is twiddle iteration algorithm,
MVCNN is that various visual angles merge algorithm,
FusionNet is characterized blending algorithm, it is that existing several more advanced object identifications and pose are estimated
Method.
As seen from Table 1, the method for merging the characteristic pattern of network different depth layer proposed in the present invention, Ke Yiti
The precision of high-class and Attitude estimation.
Claims (5)
1. the method for object classification and pose estimation based on neural network, including:
(1) training set and test set, the corresponding image of setting CAD model are obtained:
3429 CAD models are taken out from ModelNet10 data sets as training set, take out 1469 CAD as test set;
To the CAD model of each sample in ModelNet10 data sets, two kinds of tactful pretreatments are carried out successively:The first be
12 predefined viewpoints are equably set on visual angle circle where CAD model, are adopted in this 12 each predefined viewpoint
Collect the corresponding image of CAD model;Second is that CAD model is placed on regular dodecahedron center, by 20 of regular dodecahedron
Vertex is set as predefined viewpoint, in this 20 each predefined corresponding image of view collection CAD model;
(2) according to the multi-view image that each CAD model pre-processes is concentrated to data, the mathematical modulo of joint-detection is built
Type:
(2a) is denoted as { v using the pose label of each CAD model as hidden variablei};
(2b) is by M image of CAD model different visual anglesWith the class label y ∈ { 1 .., N } of CAD model, it is defined as
Training sample, wherein N are total classification number of CAD model, each multi-view image xi, a visual angle label v is corresponded to respectivelyi∈
{1,..,M};
Object identification and pose estimation task to the definition of training sample, are abstracted as following optimization and asked by (2c) basis above
Topic:
Wherein R is neural network weight parameter,For the class label of neural network prediction,It is convolutional Neural
The probability that the class label of Softmax layers output in network C NN is y;
(3) structure and training convolutional neural networks CNN:
(3a) increases Eltwise1 layers, fc_a1 layers, fc_a2 layers, Eltwise2 on the basis of existing AlexNet networks
Layer obtains one and contains 16 layer convolutional neural networks CNN, wherein:
The Eltwise1 layers for merging Conv3 layers in AlexNet networks with Conv4 layers of characteristic pattern corresponding position;
The fc_a1 layers by Eltwise1 layers of characteristic pattern for being mapped as feature vector;
Pool5 layers of Feature Mapping in AlexNet networks are feature vector by the fc_a2;
The Eltwise2 layers for merging fc_a1 layers, fc_a2 layers and Eltwise1 layers of characteristic pattern corresponding position;
(3b) is by the multi-view image of each CAD model in training setIt is input in convolutional network, iterative convolution nerve
Neural network, optimization neural network parameter R, until the loss of neural network are trained in the forward calculation of network C NN and backpropagation
Until function J (θ)≤0.0001, trained neural network CNN is obtained;
(4) test network
By the multi-view image of each CAD model in ModelNet10 test setsIt is input to trained neural network
In, count the precision of object classification and Attitude estimation.
2. according to the method described in claim 1, wherein the first pretreatment strategy of step (1) is in regarding where CAD model
12 predefined viewpoints are equably set on the circle of angle, are first to fix an axis as rotary shaft, then at the visual angle where object
One point of observation is set on circle every 30 degree, i.e., on 360 degree of visual angle circle, obtains 12 differences of each CAD model correspondence and regards
The image at angle.
3. according to the method described in claim 1, optimization problem wherein in step (2c), is realized as follows:
It willIt is denoted asThen optimization problem is expressed as following form:
Wherein (i) indicates input picture xi, k expression images xiClass label, j indicates image xiIt is from j-th of predefined viewpoint
It observes, R is neural network weight parameter.
4. according to the method described in claim 1, convolutional neural networks CNN of the structure containing 16 layers wherein in step (3a), step
It is rapid as follows:
The image of 227*227 pixel sizes is input to the first convolutional layer Conv1 by (3a1), and it is 11* that convolution kernel size is carried out to it
11 pixels and the convolution operation that step-length is 4 pixels obtain the characteristic pattern of 96 55*55 pixel sizes in total with 96 convolution kernels;
96 characteristic patterns of the first convolutional layer Conv1 outputs are input to the first pond layer Pool1 by (3a2), and maximum is carried out to it
Pondization operates, and the size of pond block is 3*3 pixels, and step-length is 2 pixels, obtains the characteristic pattern of 96 27*27 pixel sizes;
First pond layer Pool1,96 characteristic patterns exported are input to the second convolutional layer Conv2 by (3a3), and convolution is carried out to it
The convolution operation that core size is 5*5 pixels and step-length is 1 obtains 256 27*27 pixel sizes in total with 256 convolution kernels
Characteristic pattern;
256 characteristic patterns of the second convolutional layer Conv2 outputs are input to the second pond layer Pool2 by (3a4), and maximum is carried out to it
Pondization operates, and the size of pond block is 3*3 pixels, and step-length is 2 pixels, obtains the characteristic pattern of 256 13*13 pixel sizes;
Second pond layer Pool2,256 characteristic patterns exported are input to third convolutional layer Conv3 by (3a5), and convolution is carried out to it
The convolution operation that core size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, it is big to obtain 384 13*13 pixels
Small characteristic pattern;
384 characteristic patterns that third convolutional layer Conv3 is exported are input to Volume Four lamination Con4 by (3a6), and convolution is carried out to it
The convolution operation that core size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, it is big to obtain 384 13*13 pixels
Small characteristic pattern;
The characteristic pattern of third convolutional layer Conv3 and Volume Four lamination Conv4 are input to the first Eltwise1 layers of progress spy by (3a7)
Sign figure fusion, obtains the characteristic pattern of 384 13*13 pixel sizes;
384 characteristic patterns that Volume Four lamination Conv4 is exported are input to the 5th convolutional layer Conv5 by (3a8), and convolution is carried out to it
The convolution operation that core size is 3*3 pixels and step-length is 1 pixel obtains 256 13*13 pixel sizes that is, with 256 convolution kernels
Characteristic pattern;
256 characteristic patterns of the 5th convolutional layer Conv5 outputs are input to the 5th pond layer Pool5 by (3a9), and maximum is carried out to it
Pondization operates, and pond block size is 3*3 pixel sizes, and step-length is 2 pixels, obtains the characteristic pattern of 256 6*6 pixel sizes;
384 characteristic patterns of the first Eltwise1 layers of output are input to the first full articulamentum fc_a1 by (3a10), and characteristic pattern is reflected
It penetrates as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the second full articulamentum fc_a2 by (3a11), by feature
Figure is mapped as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the full articulamentum fc6 of third by (3a12), by characteristic pattern
It is mapped as the feature vector of 1*1*4096 pixel sizes;
The feature vector of the 1*1*4096 pixel sizes of the full articulamentum fc6 layers of output of third is input to the 4th full connection by (3a13)
Layer fc7 continues feature extraction, obtains the feature vector of 1*1*4096 pixel sizes;
The feature vector that first full articulamentum fc_a1, the second full connection fc_a2 and the 4th are connected fc7 layers by (3a14) entirely inputs
To the 2nd Eltwise2 layers, the fusion of feature vector is carried out, the feature vector of 1*1*4096 pixel sizes is obtained;
The characteristic pattern of the 2nd Eltwise2 layers of 1*1*4096 pixel sizes exported is input to the 5th full articulamentum by (3a15)
Fc8, by the feature vector that maps feature vectors are 1*1*11*M pixel sizes, wherein M is multi-view image number, symbol " * " table
Show multiplication;
(3a16) is by 1*1*11*M) feature vector of plain size is input to classification layer Softmax, obtain image xiClass label,
Selection is so that the maximum visual angle label v of class probabilityiAs its pose label.
5. according to the method described in claim 1, training convolutional neural networks CNN wherein in step (3b), as follows into
Row:
(3b1) takes a training sample in the propagated forward stage from training set, by the multi-view image of the training sampleIt is input to the input layer of convolutional neural networks CNN, it is final by Softmax layers of output by feature extraction and Feature Mapping
As a result;
(3b2) calculates the difference of the ideal output of convolutional neural networks CNN reality outputs and training sample in back-propagation phase,
By the method for minimization error, backpropagation adjusts the weighting parameter R of convolutional neural networks;
(3b3) repeats the operation of (3b1) and (3b2), until convolutional neural networks CNN loss functions J (θ)≤0.0001.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810243399.4A CN108491880B (en) | 2018-03-23 | 2018-03-23 | Object classification and pose estimation method based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810243399.4A CN108491880B (en) | 2018-03-23 | 2018-03-23 | Object classification and pose estimation method based on neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108491880A true CN108491880A (en) | 2018-09-04 |
CN108491880B CN108491880B (en) | 2021-09-03 |
Family
ID=63319473
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810243399.4A Active CN108491880B (en) | 2018-03-23 | 2018-03-23 | Object classification and pose estimation method based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108491880B (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109493417A (en) * | 2018-10-31 | 2019-03-19 | 深圳大学 | Three-dimension object method for reconstructing, device, equipment and storage medium |
CN109598339A (en) * | 2018-12-07 | 2019-04-09 | 电子科技大学 | A kind of vehicle attitude detection method based on grid convolutional network |
CN109902675A (en) * | 2018-09-17 | 2019-06-18 | 华为技术有限公司 | The method and apparatus of the pose acquisition methods of object, scene reconstruction |
CN109903332A (en) * | 2019-01-08 | 2019-06-18 | 杭州电子科技大学 | A kind of object's pose estimation method based on deep learning |
CN109934864A (en) * | 2019-03-14 | 2019-06-25 | 东北大学 | Residual error network depth learning method towards mechanical arm crawl pose estimation |
CN109978907A (en) * | 2019-03-22 | 2019-07-05 | 南京邮电大学 | A kind of sitting posture of student detection method towards household scene |
CN110322510A (en) * | 2019-06-27 | 2019-10-11 | 电子科技大学 | A kind of 6D position and orientation estimation method using profile information |
CN110728222A (en) * | 2019-09-30 | 2020-01-24 | 清华大学深圳国际研究生院 | Pose estimation method for target object in mechanical arm grabbing system |
CN110728192A (en) * | 2019-09-16 | 2020-01-24 | 河海大学 | High-resolution remote sensing image classification method based on novel characteristic pyramid depth network |
CN110728187A (en) * | 2019-09-09 | 2020-01-24 | 武汉大学 | Remote sensing image scene classification method based on fault tolerance deep learning |
CN111126441A (en) * | 2019-11-25 | 2020-05-08 | 西安工程大学 | Construction method of classification detection network model |
CN111191492A (en) * | 2018-11-15 | 2020-05-22 | 北京三星通信技术研究有限公司 | Information estimation, model retrieval and model alignment methods and apparatus |
CN111259735A (en) * | 2020-01-08 | 2020-06-09 | 西安电子科技大学 | Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network |
CN111325166A (en) * | 2020-02-26 | 2020-06-23 | 南京工业大学 | Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network |
CN111860039A (en) * | 2019-04-26 | 2020-10-30 | 四川大学 | Cross-connection CNN + SVR-based street space quality quantification method |
CN112163477A (en) * | 2020-09-16 | 2021-01-01 | 厦门市特种设备检验检测院 | Escalator pedestrian pose target detection method and system based on FasterR-CNN |
CN112381879A (en) * | 2020-11-16 | 2021-02-19 | 华南理工大学 | Object posture estimation method, system and medium based on image and three-dimensional model |
CN112396077A (en) * | 2019-08-15 | 2021-02-23 | 瑞昱半导体股份有限公司 | Fully-connected convolutional neural network image processing method and circuit system |
CN112528941A (en) * | 2020-12-23 | 2021-03-19 | 泰州市朗嘉馨网络科技有限公司 | Automatic parameter setting system based on neural network |
CN112634367A (en) * | 2020-12-25 | 2021-04-09 | 天津大学 | Anti-occlusion object pose estimation method based on deep neural network |
CN112857215A (en) * | 2021-01-08 | 2021-05-28 | 河北工业大学 | Monocular 6D pose estimation method based on regular icosahedron |
CN113129370A (en) * | 2021-03-04 | 2021-07-16 | 同济大学 | Semi-supervised object pose estimation method combining generated data and label-free data |
CN113436266A (en) * | 2020-03-23 | 2021-09-24 | 丰田自动车株式会社 | Image processing system, image processing method, method of training neural network, and recording medium for executing the method |
CN113705480A (en) * | 2021-08-31 | 2021-11-26 | 新东方教育科技集团有限公司 | Gesture recognition method, device and medium based on gesture recognition neural network |
WO2022022063A1 (en) * | 2020-07-27 | 2022-02-03 | 腾讯科技(深圳)有限公司 | Three-dimensional human pose estimation method and related device |
CN114742212A (en) * | 2022-06-13 | 2022-07-12 | 南昌大学 | Electronic digital information resampling rate estimation method |
CN111191492B (en) * | 2018-11-15 | 2024-07-02 | 北京三星通信技术研究有限公司 | Information estimation, model retrieval and model alignment methods and devices |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102375831A (en) * | 2010-08-13 | 2012-03-14 | 富士通株式会社 | Three-dimensional model search device and method thereof and model base generation device and method thereof |
US20160327653A1 (en) * | 2014-02-03 | 2016-11-10 | Board Of Regents, The University Of Texas System | System and method for fusion of camera and global navigation satellite system (gnss) carrier-phase measurements for globally-referenced mobile device pose determination |
WO2017015390A1 (en) * | 2015-07-20 | 2017-01-26 | University Of Maryland, College Park | Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition |
CN106372648A (en) * | 2016-10-20 | 2017-02-01 | 中国海洋大学 | Multi-feature-fusion-convolutional-neural-network-based plankton image classification method |
CN106845510A (en) * | 2016-11-07 | 2017-06-13 | 中国传媒大学 | Chinese tradition visual culture Symbol Recognition based on depth level Fusion Features |
CN106845515A (en) * | 2016-12-06 | 2017-06-13 | 上海交通大学 | Robot target identification and pose reconstructing method based on virtual sample deep learning |
CN107169421A (en) * | 2017-04-20 | 2017-09-15 | 华南理工大学 | A kind of car steering scene objects detection method based on depth convolutional neural networks |
CN107330463A (en) * | 2017-06-29 | 2017-11-07 | 南京信息工程大学 | Model recognizing method based on CNN multiple features combinings and many nuclear sparse expressions |
CN107527068A (en) * | 2017-08-07 | 2017-12-29 | 南京信息工程大学 | Model recognizing method based on CNN and domain adaptive learning |
CN107657249A (en) * | 2017-10-26 | 2018-02-02 | 珠海习悦信息技术有限公司 | Method, apparatus, storage medium and the processor that Analysis On Multi-scale Features pedestrian identifies again |
CN107808146A (en) * | 2017-11-17 | 2018-03-16 | 北京师范大学 | A kind of multi-modal emotion recognition sorting technique |
-
2018
- 2018-03-23 CN CN201810243399.4A patent/CN108491880B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102375831A (en) * | 2010-08-13 | 2012-03-14 | 富士通株式会社 | Three-dimensional model search device and method thereof and model base generation device and method thereof |
US20160327653A1 (en) * | 2014-02-03 | 2016-11-10 | Board Of Regents, The University Of Texas System | System and method for fusion of camera and global navigation satellite system (gnss) carrier-phase measurements for globally-referenced mobile device pose determination |
WO2017015390A1 (en) * | 2015-07-20 | 2017-01-26 | University Of Maryland, College Park | Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition |
CN106372648A (en) * | 2016-10-20 | 2017-02-01 | 中国海洋大学 | Multi-feature-fusion-convolutional-neural-network-based plankton image classification method |
CN106845510A (en) * | 2016-11-07 | 2017-06-13 | 中国传媒大学 | Chinese tradition visual culture Symbol Recognition based on depth level Fusion Features |
CN106845515A (en) * | 2016-12-06 | 2017-06-13 | 上海交通大学 | Robot target identification and pose reconstructing method based on virtual sample deep learning |
CN107169421A (en) * | 2017-04-20 | 2017-09-15 | 华南理工大学 | A kind of car steering scene objects detection method based on depth convolutional neural networks |
CN107330463A (en) * | 2017-06-29 | 2017-11-07 | 南京信息工程大学 | Model recognizing method based on CNN multiple features combinings and many nuclear sparse expressions |
CN107527068A (en) * | 2017-08-07 | 2017-12-29 | 南京信息工程大学 | Model recognizing method based on CNN and domain adaptive learning |
CN107657249A (en) * | 2017-10-26 | 2018-02-02 | 珠海习悦信息技术有限公司 | Method, apparatus, storage medium and the processor that Analysis On Multi-scale Features pedestrian identifies again |
CN107808146A (en) * | 2017-11-17 | 2018-03-16 | 北京师范大学 | A kind of multi-modal emotion recognition sorting technique |
Non-Patent Citations (4)
Title |
---|
ELHOSEINY M等: "A comparative analysis and study of multiview cnn models for joint object categorization and pose estimation", 《INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 * |
RAJEEV RANJAN等: "HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition", 《10.1109/TPAMI.2017.2781233》 * |
TOMAS PFISTER等: "Flowing ConvNets for Human Pose Estimation in Videos", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
郭树旭等: "基于全卷积神经网络的肝脏CT影像分割研究", 《计算机工程与应用》 * |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902675A (en) * | 2018-09-17 | 2019-06-18 | 华为技术有限公司 | The method and apparatus of the pose acquisition methods of object, scene reconstruction |
CN109902675B (en) * | 2018-09-17 | 2021-05-04 | 华为技术有限公司 | Object pose acquisition method and scene reconstruction method and device |
CN109493417B (en) * | 2018-10-31 | 2023-04-07 | 深圳大学 | Three-dimensional object reconstruction method, device, equipment and storage medium |
CN109493417A (en) * | 2018-10-31 | 2019-03-19 | 深圳大学 | Three-dimension object method for reconstructing, device, equipment and storage medium |
CN111191492A (en) * | 2018-11-15 | 2020-05-22 | 北京三星通信技术研究有限公司 | Information estimation, model retrieval and model alignment methods and apparatus |
CN111191492B (en) * | 2018-11-15 | 2024-07-02 | 北京三星通信技术研究有限公司 | Information estimation, model retrieval and model alignment methods and devices |
CN109598339A (en) * | 2018-12-07 | 2019-04-09 | 电子科技大学 | A kind of vehicle attitude detection method based on grid convolutional network |
CN109903332A (en) * | 2019-01-08 | 2019-06-18 | 杭州电子科技大学 | A kind of object's pose estimation method based on deep learning |
CN109934864B (en) * | 2019-03-14 | 2023-01-20 | 东北大学 | Residual error network deep learning method for mechanical arm grabbing pose estimation |
CN109934864A (en) * | 2019-03-14 | 2019-06-25 | 东北大学 | Residual error network depth learning method towards mechanical arm crawl pose estimation |
CN109978907A (en) * | 2019-03-22 | 2019-07-05 | 南京邮电大学 | A kind of sitting posture of student detection method towards household scene |
CN111860039A (en) * | 2019-04-26 | 2020-10-30 | 四川大学 | Cross-connection CNN + SVR-based street space quality quantification method |
CN110322510A (en) * | 2019-06-27 | 2019-10-11 | 电子科技大学 | A kind of 6D position and orientation estimation method using profile information |
CN112396077A (en) * | 2019-08-15 | 2021-02-23 | 瑞昱半导体股份有限公司 | Fully-connected convolutional neural network image processing method and circuit system |
CN110728187A (en) * | 2019-09-09 | 2020-01-24 | 武汉大学 | Remote sensing image scene classification method based on fault tolerance deep learning |
CN110728187B (en) * | 2019-09-09 | 2022-03-04 | 武汉大学 | Remote sensing image scene classification method based on fault tolerance deep learning |
CN110728192A (en) * | 2019-09-16 | 2020-01-24 | 河海大学 | High-resolution remote sensing image classification method based on novel characteristic pyramid depth network |
CN110728192B (en) * | 2019-09-16 | 2022-08-19 | 河海大学 | High-resolution remote sensing image classification method based on novel characteristic pyramid depth network |
CN110728222A (en) * | 2019-09-30 | 2020-01-24 | 清华大学深圳国际研究生院 | Pose estimation method for target object in mechanical arm grabbing system |
CN111126441A (en) * | 2019-11-25 | 2020-05-08 | 西安工程大学 | Construction method of classification detection network model |
CN111259735A (en) * | 2020-01-08 | 2020-06-09 | 西安电子科技大学 | Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network |
CN111325166B (en) * | 2020-02-26 | 2023-07-07 | 南京工业大学 | Sitting posture identification method based on projection reconstruction and MIMO neural network |
CN111325166A (en) * | 2020-02-26 | 2020-06-23 | 南京工业大学 | Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network |
CN113436266B (en) * | 2020-03-23 | 2024-05-14 | 丰田自动车株式会社 | Image processing system, image processing method, method of training neural network, and recording medium for performing the method |
CN113436266A (en) * | 2020-03-23 | 2021-09-24 | 丰田自动车株式会社 | Image processing system, image processing method, method of training neural network, and recording medium for executing the method |
WO2022022063A1 (en) * | 2020-07-27 | 2022-02-03 | 腾讯科技(深圳)有限公司 | Three-dimensional human pose estimation method and related device |
CN112163477A (en) * | 2020-09-16 | 2021-01-01 | 厦门市特种设备检验检测院 | Escalator pedestrian pose target detection method and system based on FasterR-CNN |
CN112163477B (en) * | 2020-09-16 | 2023-09-22 | 厦门市特种设备检验检测院 | Escalator pedestrian pose target detection method and system based on Faster R-CNN |
CN112381879A (en) * | 2020-11-16 | 2021-02-19 | 华南理工大学 | Object posture estimation method, system and medium based on image and three-dimensional model |
WO2022100379A1 (en) * | 2020-11-16 | 2022-05-19 | 华南理工大学 | Object attitude estimation method and system based on image and three-dimensional model, and medium |
CN112528941A (en) * | 2020-12-23 | 2021-03-19 | 泰州市朗嘉馨网络科技有限公司 | Automatic parameter setting system based on neural network |
CN112528941B (en) * | 2020-12-23 | 2021-11-19 | 芜湖神图驭器智能科技有限公司 | Automatic parameter setting system based on neural network |
CN112634367A (en) * | 2020-12-25 | 2021-04-09 | 天津大学 | Anti-occlusion object pose estimation method based on deep neural network |
CN112857215B (en) * | 2021-01-08 | 2022-02-08 | 河北工业大学 | Monocular 6D pose estimation method based on regular icosahedron |
CN112857215A (en) * | 2021-01-08 | 2021-05-28 | 河北工业大学 | Monocular 6D pose estimation method based on regular icosahedron |
CN113129370B (en) * | 2021-03-04 | 2022-08-19 | 同济大学 | Semi-supervised object pose estimation method combining generated data and label-free data |
CN113129370A (en) * | 2021-03-04 | 2021-07-16 | 同济大学 | Semi-supervised object pose estimation method combining generated data and label-free data |
CN113705480A (en) * | 2021-08-31 | 2021-11-26 | 新东方教育科技集团有限公司 | Gesture recognition method, device and medium based on gesture recognition neural network |
CN114742212A (en) * | 2022-06-13 | 2022-07-12 | 南昌大学 | Electronic digital information resampling rate estimation method |
Also Published As
Publication number | Publication date |
---|---|
CN108491880B (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491880A (en) | Object classification based on neural network and position and orientation estimation method | |
Xu et al. | Light-YOLOv3: fast method for detecting green mangoes in complex scenes using picking robots | |
Hu et al. | SAC-Net: Spatial attenuation context for salient object detection | |
Uçar et al. | A new facial expression recognition based on curvelet transform and online sequential extreme learning machine initialized with spherical clustering | |
CN108304826A (en) | Facial expression recognizing method based on convolutional neural networks | |
CN108510012A (en) | A kind of target rapid detection method based on Analysis On Multi-scale Features figure | |
CN110032925B (en) | Gesture image segmentation and recognition method based on improved capsule network and algorithm | |
CN104700076B (en) | Facial image virtual sample generation method | |
CN107609460A (en) | A kind of Human bodys' response method for merging space-time dual-network stream and attention mechanism | |
CN107316307A (en) | A kind of Chinese medicine tongue image automatic segmentation method based on depth convolutional neural networks | |
CN112862792B (en) | Wheat powdery mildew spore segmentation method for small sample image dataset | |
CN110222718B (en) | Image processing method and device | |
Xu et al. | Spherical DNNs and Their Applications in 360$^\circ $∘ Images and Videos | |
CN113436227A (en) | Twin network target tracking method based on inverted residual error | |
Yan et al. | Monocular depth estimation with guidance of surface normal map | |
CN115393596B (en) | Garment image segmentation method based on artificial intelligence | |
Xu et al. | Face expression recognition based on convolutional neural network | |
Hu et al. | A spatio-temporal integrated model based on local and global features for video expression recognition | |
Li et al. | Fast recognition of pig faces based on improved Yolov3 | |
Yue et al. | DRGCNN: Dynamic region graph convolutional neural network for point clouds | |
Sun et al. | Overview of capsule neural networks | |
Zhang | Research on Image Recognition Based on Neural Network | |
Chauhan et al. | Empirical Study on convergence of Capsule Networks with various hyperparameters | |
Ocegueda-Hernandez et al. | A lightweight convolutional neural network for pose estimation of a planar model | |
Li | Research on target feature extraction and location positioning with machine learning algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |