CN108491880A - Object classification based on neural network and position and orientation estimation method - Google Patents

Object classification based on neural network and position and orientation estimation method Download PDF

Info

Publication number
CN108491880A
CN108491880A CN201810243399.4A CN201810243399A CN108491880A CN 108491880 A CN108491880 A CN 108491880A CN 201810243399 A CN201810243399 A CN 201810243399A CN 108491880 A CN108491880 A CN 108491880A
Authority
CN
China
Prior art keywords
layers
input
cad model
layer
characteristic pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810243399.4A
Other languages
Chinese (zh)
Other versions
CN108491880B (en
Inventor
张向东
张泽宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201810243399.4A priority Critical patent/CN108491880B/en
Publication of CN108491880A publication Critical patent/CN108491880A/en
Application granted granted Critical
Publication of CN108491880B publication Critical patent/CN108491880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of object classification based on neural network and position and orientation estimation method, mainly solve the problems, such as that prior art precision when carrying out object detection and Attitude estimation using convolutional neural networks is low.Its implementation is:1) each CAD model multi-view image in data set is obtained;2) mathematical model of joint-detection is built according to the multi-view image of CAD model;3) convolutional neural networks and the multi-view image training convolutional neural networks using CAD model are built;(4) multi-view image of each CAD model in test set is input to neural network, the class label and pose label of output nerve neural network forecast.Present invention incorporates neural network shallow-layer characteristic patterns and further feature figure so that had both remained abundant posture information in conjunction with later characteristic pattern, and had also remained good classification information, and had improved the accuracy of classification and pose estimation.It can be used for intelligent machine arm and robot crawl.

Description

Object classification based on neural network and position and orientation estimation method
Technical field
The invention belongs to artificial intelligence field, it is related to a kind of object classification and position and orientation estimation method, can be used for intelligent machine Arm and robot crawl.
Background technology
Convolutional neural networks CNN is a kind of feedforward neural network, by convolutional layer, full articulamentum, pond layer and active coating Composition.Compared to the neural network that tradition connects entirely, convolutional neural networks are made by the connection of application part and weights technology of sharing The neuron weights obtained on same Feature Mapping face are identical, greatly reduce the number of parameters of network, reduce the complexity of network Degree.Activation primitive is also gradually evolved into the ReLU of unilateral inhibition by sigmoid.Activation primitive is continuously improved so that neuron It is more nearly the characteristic of biological neuron activation.In addition, CNN avoids the complicated pre-processing to image, including complicated Feature extraction and data reconstruction can directly input original image.Gradient declines and the application of chain type Rule for derivation so that nerve Network is capable of the mutual iteration of good progress propagated forward and backpropagation, and accuracy of detection is continuously improved.And in numerous depths It spends in learning framework, caffe is relatively common one kind, using more in terms of video, image procossing.The modularization of Caffe, Expression is detached with realization, Python the and Matlab interfaces for facilitating switching and offer between gpu and cpu so that Wo Menke To use Caffe easily to carry out network structure regulation and network training.
In recent years, deep learning achieved significantly in image classification, object detection, semantic segmentation, example segmentation etc. Progress.General vision system needs solve two problems:The Attitude estimation of object classification and object, so-called Attitude estimation, Refer to posture of the object relative to camera.Object pose estimation is crucial in many applications, such as robot crawl Etc..But object classification and pose estimation are conflicting again, no matter categorizing system needs object in any posture, all may be used Correctly to classify.Therefore categorizing system study is and the incoherent feature of viewpoint.And object pose is estimated, system needs Study keeps the feature of object geometry and vision, to distinguish its pose.For convolutional neural networks, the characteristic pattern of shallow-layer tends to In more generally, the uncertain feature of classification, but contain the feature between more different positions and poses.Further feature figure is more Add abstract, category feature is more obvious, but the information of specific pose is because of high abstraction and unobvious.Existing detection method one As be all the layer for selecting a centre feature, the feature of this layer is all better in the performance that classification and pose are estimated, Therefore it is a kind of method of compromise, the precision of object detection and Attitude estimation cannot be made while reaching best.
Method MVCNN by Hang Su et al. a kind of object classifications and pose estimation proposed in 2015, this method propose It converts sample 3D data in the various visual angles picture of 2D, carries out Data Dimensionality Reduction under the premise of ensureing accuracy of detection, although can It to simplify processing procedure, but needs to carry out feature extraction to the picture at all visual angles of object, remerges each multi-perspective picture Information.This is in actual scene, because target object has phenomena such as blocking, blocking, gives from all predefined view collection objects Body various visual angles picture brings difficulty, does not meet the demand in actual scene.
Invention content
It is an object of the invention in view of the above shortcomings of the prior art, propose a kind of object classification based on neural network And position and orientation estimation method accelerates detection speed, meets the need of actual scene to improve the precision of object detection and pose estimation It asks.
The present invention technical thought be:By improving object in conjunction with convolutional neural networks middle-shallow layer feature and further feature Detection and pose estimated accuracy;By the iteration of the image to detection object part visual angle, accelerate the speed of detection.Its realization side Case includes as follows:
(1) training set and test set, the corresponding image of setting CAD model are obtained:
3429 CAD models are taken out from ModelNet10 data sets as training set, take out 1469 CAD as test Collection;
To the CAD model of each sample in ModelNet10 data sets, two kinds of tactful pretreatments are carried out successively:The first Be where CAD model visual angle circle on 12 predefined viewpoints are equably set, this 12 it is predefined each regard The corresponding image of point acquisition CAD model;Second is that CAD model is placed on regular dodecahedron center, by regular dodecahedron 20 vertex are set as predefined viewpoint, in this 20 each predefined corresponding image of view collection CAD model;
(2) according to the multi-view image that each CAD model pre-processes is concentrated to data, the mathematics of joint-detection is built Model:
(2a) is denoted as { v using the pose label of each CAD model as hidden variablei};
(2b) is by M image of CAD model different visual anglesIt is fixed with the class label y ∈ { 1 .., N } of CAD model Justice is training sample, and wherein N is total classification number of CAD model, each multi-view image xi, a visual angle label v is corresponded to respectivelyi∈ {1,..,M};
Object identification and pose estimation task are abstracted as following optimization by (2c) basis above to the definition of training sample Problem:
Wherein R is neural network weight parameter,For the class label of neural network prediction,
It is the probability that the class label of the Softmax layers output in convolutional neural networks CNN is y;
(3) structure and training convolutional neural networks CNN:
(3a) on the basis of existing AlexNet networks, increase Eltwise1 layers, fc_a1 layers, fc_a2 layers, It Eltwise2 layers, obtains one and contains 16 layer convolutional neural networks CNN, wherein:
The Eltwise1 layers for melting Conv3 layers in AlexNet networks with Conv4 layers of characteristic pattern corresponding position It closes;
The fc_a1 layers by Eltwise1 layers of characteristic pattern for being mapped as feature vector;
Pool5 layers of Feature Mapping in AlexNet networks are feature vector by the fc_a2;
The Eltwise2 layers for melting fc_a1 layers, fc_a2 layers and Eltwise1 layers of characteristic pattern corresponding position It closes;
(3b) is by the multi-view image of each CAD model in training setIt is input in convolutional network, iterative convolution Neural network, optimization neural network parameter R, until neural network are trained in the forward calculation of neural network CNN and backpropagation Until loss function J (θ)≤0.0001, trained neural network CNN is obtained;
(4) test network
By the multi-view image of each CAD model in ModelNet10 test setsIt is input to trained nerve In network, the precision of object classification and Attitude estimation is counted.
Compared with the prior art, the present invention has the following advantages:
1. the present invention is melted due to merging the element of the characteristic pattern relative position of different depth in convolutional neural networks It closes obtained new characteristic pattern and had both contained posture information abundant in shallow-layer characteristic pattern, also contain in further feature figure and be abstracted Specific classification information, therefore improve the precision of detection.
2. the present invention generates its corresponding multi-view image, i.e., due to concentrating each 3D CAD model to data It converts the sample data of 3D to the multi-view image of 2D, dimension-reduction treatment is carried out to data, therefore reduce the complexity of data, The calculation amount for reducing feature extraction accelerates the speed of detection.
Description of the drawings
Fig. 1 is the implementation flow chart of the present invention;
Fig. 2 is two kinds of predefined viewpoint strategy schematic diagrames in the present invention;
Fig. 3 is the convolutional neural networks CNN structure charts built in the present invention.
Specific implementation mode
Below in conjunction with the accompanying drawings, the example and effect of the present invention are described in further detail.
Referring to Fig.1, steps are as follows for realization of the invention:
Step 1, CAD model multi-view image is obtained.
To the CAD model of each sample in ModelNet10 data sets, two kinds of tactful pretreatments are carried out successively.
As shown in Fig. 2 (a), the first pretreatment strategy is equably arranged 12 in advance on the visual angle circle where CAD model The viewpoint of definition first fixes an axis as rotary shaft, then every 30 degree of settings, one sight on the visual angle circle where object It examines a little, so that it may on 360 degree of visual angle circle, obtain the image that each CAD model corresponds to 12 different visual angles;
As shown in Fig. 2 (b), second of pretreatment strategy is that CAD model is placed on regular dodecahedron center, by positive 12 20 vertex of face body are set as predefined viewpoint, corresponding in this 20 each predefined view collection CAD model Image.
Step 2, the multi-view image that each CAD model pre-processes is concentrated according to data, builds joint-detection Mathematical model.
(2a) is denoted as { v using the pose label of each CAD model as hidden variablei};
(2b) is by M image of CAD model different visual anglesIt is fixed with the class label y ∈ { 1 .., N } of CAD model Justice is training sample, and wherein N is total classification number of CAD model, xiFor multi-view image, each multi-view image xiOne is corresponded to respectively Visual angle label vi∈{1,..,M};
Object identification and pose estimation task are abstracted as following optimization by (2c) basis above to the definition of training sample Problem:
Wherein R is neural network weight parameter,For the class label of neural network prediction,It is volume The probability that the class label of Softmax layers output in product neural network CNN is y;
It willIt is denoted asThen optimization problem is expressed as following form:
Wherein (i) indicates input picture xi, k expression images xiClass label, j indicates image xiIt is predefined from j-th What viewing point arrived.
Step 3, structure convolutional neural networks CNN.
(3a) structure convolutional neural networks CNN containing 16 layers as shown in Figure 3, this 16 layers are the first convolutional layer successively Conv1, the first pond layer Pool1, the second convolutional layer Conv2, the second pond layer Pool2, third convolutional layer Conv3, Volume Four Lamination Conv4, fisrt feature fused layer Eltwise1, the 5th convolutional layer Conv5, the 5th pond layer Pool5, the first full articulamentum Fc_a1, the second full articulamentum fc_a2, the full articulamentum fc6 of third, the 4th full articulamentum fc7, second feature fused layer Eltwise2, the 5th full articulamentum fc8, classification layer Softmax, every layer of feature extraction details are as follows:
The image of 227*227 pixel sizes is input to the first convolutional layer Conv1 by (3a1), and convolution kernel size is carried out to it For the convolution operation that 11*11 pixels and step-length are 4 pixels the spy of 96 55*55 pixel sizes is obtained in total with 96 convolution kernels Sign figure;
96 characteristic patterns of the first convolutional layer Conv1 outputs are input to the first pond layer Pool1 by (3a2), are carried out to it The size of maximum pondization operation, pond block is 3*3 pixels, and step-length is 2 pixels, obtains the characteristic pattern of 96 27*27 pixel sizes;
First pond layer Pool1,96 characteristic patterns exported are input to the second convolutional layer Conv2 by (3a3), are carried out to it It is big to obtain 256 27*27 pixels in total with 256 convolution kernels for the convolution operation that convolution kernel size is 5*5 pixels and step-length is 1 Small characteristic pattern;
256 characteristic patterns of the second convolutional layer Conv2 outputs are input to the second pond layer Pool2 by (3a4), are carried out to it The size of maximum pondization operation, pond block is 3*3 pixels, and step-length is 2 pixels, obtains the feature of 256 13*13 pixel sizes Figure;
Second pond layer Pool2,256 characteristic patterns exported are input to third convolutional layer Conv3 by (3a5), are carried out to it The convolution operation that convolution kernel size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, obtains 384 13*13 pictures The characteristic pattern of plain size;
384 characteristic patterns that third convolutional layer Conv3 is exported are input to Volume Four lamination Con4 by (3a6), are carried out to it The convolution operation that convolution kernel size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, obtains 384 13*13 pictures The characteristic pattern of plain size;
(3a7) by the characteristic pattern of third convolutional layer Conv3 and Volume Four lamination Conv4 be input to the first Eltwise1 layers into Row characteristic pattern merges, and obtains the characteristic pattern of 384 13*13 pixel sizes;
384 characteristic patterns that Volume Four lamination Conv4 is exported are input to the 5th convolutional layer Conv5 by (3a8), are carried out to it The convolution operation that convolution kernel size is 3*3 pixels and step-length is 1 pixel obtains 256 13*13 pixels that is, with 256 convolution kernels The characteristic pattern of size;
256 characteristic patterns of the 5th convolutional layer Conv5 outputs are input to the 5th pond layer Pool5 by (3a9), are carried out to it Maximum pondization operation, pond block size are 3*3 pixel sizes, and step-length is 2 pixels, obtains the feature of 256 6*6 pixel sizes Figure;
384 characteristic patterns of the first Eltwise1 layers of output are input to the first full articulamentum fc_a1 by (3a10), by feature Figure is mapped as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the second full articulamentum fc_a2 by (3a11), will Characteristic pattern is mapped as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the full articulamentum fc6 of third by (3a12), will be special Sign figure is mapped as the feature vector of 1*1*4096 pixel sizes;
The feature vector of the 1*1*4096 pixel sizes of the full articulamentum fc6 layers of output of third is input to the 4th entirely by (3a13) Articulamentum fc7 continues feature extraction, obtains the feature vector of 1*1*4096 pixel sizes;
First full articulamentum fc_a1, the second full connection fc_a2 and the 4th are connected fc7 layers of feature vector by (3a14) entirely The 2nd Eltwise2 layers are input to, the fusion of feature vector is carried out, obtains the feature vector of 1*1*4096 pixel sizes;
The characteristic pattern of the 2nd Eltwise2 layers of 1*1*4096 pixel sizes exported is input to the 5th full connection by (3a15) Layer fc8, by the feature vector that maps feature vectors are 1*1*11*M pixel sizes, wherein M is multi-view image number, symbol " * " It indicates to be multiplied;
(3a16) is by 1*1*11*M) feature vector of plain size is input to classification layer Softmax, obtain image xiClassification Label, selection is so that the maximum visual angle label v of class probabilityiAs its pose label;
Step 4, convolutional neural networks CNN training is carried out.
(3b1) takes a training sample in the propagated forward stage from training set, by the multi-view image of the training sampleIt is input to the input layer of convolutional neural networks CNN, by feature extraction and Feature Mapping, most by Softmax layers of output Terminate fruit;
(3b2) calculates the ideal output of convolutional neural networks CNN reality outputs and training sample in back-propagation phase Difference, by the method for minimization error, backpropagation adjusts the weighting parameter R of convolutional neural networks;
(3b3) repeats the operation of (3b1) and (3b2), until convolutional neural networks CNN loss functions J (θ)≤0.0001 is Only, trained neural network is obtained.
Step 5, test network.
By the multi-view image of each CAD model in ModelNet10 test setsIt is input to trained nerve In network, the class label and pose label of output nerve neural network forecast;
Statistical test concentrates the CAD model number of class label and pose tag error to account for all CAD moulds in test set respectively The percentage of type quantity obtains object classification and Attitude estimation precision.
With reference to emulation, the effect of the present invention is further described:
1, simulated conditions
Computer operating system used in the emulation experiment of the present invention is the 64 Ubuntu systems for being, CPU is Intel Core I3 4.2GHz inside save as 16.00GB, and GPU is GeForce GTX 1070, and the deep learning frame used is Caffe2.
2, experiment content and result
In experiment, the training and test of network are carried out using ModelNet10 data sets.It is wrapped in ModelNet10 data sets 4898 CAD models containing 10 classifications, CAD model number is 3429 wherein in training set, and CAD model is in test set 1469, to each CAD model that data are concentrated, generate its multi-view image;
The multi-view image of sample in test set is input in trained convolutional network, wherein neural network prediction The CAD model number of class label mistake is 77, and the CAD model number of pose tag error is 609.Statistics obtains point of network Class and Attitude estimation precision, and compared with several existing detection methods, as shown in the table:
Table 1
Method Nicety of grading Attitude estimation precision
The present invention 94.76 58.52
Rotationnet 94.38 58.33
MVCNN 92.10 -
FusionNet 90.80 -
Wherein, RotationNet is twiddle iteration algorithm,
MVCNN is that various visual angles merge algorithm,
FusionNet is characterized blending algorithm, it is that existing several more advanced object identifications and pose are estimated Method.
As seen from Table 1, the method for merging the characteristic pattern of network different depth layer proposed in the present invention, Ke Yiti The precision of high-class and Attitude estimation.

Claims (5)

1. the method for object classification and pose estimation based on neural network, including:
(1) training set and test set, the corresponding image of setting CAD model are obtained:
3429 CAD models are taken out from ModelNet10 data sets as training set, take out 1469 CAD as test set;
To the CAD model of each sample in ModelNet10 data sets, two kinds of tactful pretreatments are carried out successively:The first be 12 predefined viewpoints are equably set on visual angle circle where CAD model, are adopted in this 12 each predefined viewpoint Collect the corresponding image of CAD model;Second is that CAD model is placed on regular dodecahedron center, by 20 of regular dodecahedron Vertex is set as predefined viewpoint, in this 20 each predefined corresponding image of view collection CAD model;
(2) according to the multi-view image that each CAD model pre-processes is concentrated to data, the mathematical modulo of joint-detection is built Type:
(2a) is denoted as { v using the pose label of each CAD model as hidden variablei};
(2b) is by M image of CAD model different visual anglesWith the class label y ∈ { 1 .., N } of CAD model, it is defined as Training sample, wherein N are total classification number of CAD model, each multi-view image xi, a visual angle label v is corresponded to respectivelyi∈ {1,..,M};
Object identification and pose estimation task to the definition of training sample, are abstracted as following optimization and asked by (2c) basis above Topic:
Wherein R is neural network weight parameter,For the class label of neural network prediction,It is convolutional Neural The probability that the class label of Softmax layers output in network C NN is y;
(3) structure and training convolutional neural networks CNN:
(3a) increases Eltwise1 layers, fc_a1 layers, fc_a2 layers, Eltwise2 on the basis of existing AlexNet networks Layer obtains one and contains 16 layer convolutional neural networks CNN, wherein:
The Eltwise1 layers for merging Conv3 layers in AlexNet networks with Conv4 layers of characteristic pattern corresponding position;
The fc_a1 layers by Eltwise1 layers of characteristic pattern for being mapped as feature vector;
Pool5 layers of Feature Mapping in AlexNet networks are feature vector by the fc_a2;
The Eltwise2 layers for merging fc_a1 layers, fc_a2 layers and Eltwise1 layers of characteristic pattern corresponding position;
(3b) is by the multi-view image of each CAD model in training setIt is input in convolutional network, iterative convolution nerve Neural network, optimization neural network parameter R, until the loss of neural network are trained in the forward calculation of network C NN and backpropagation Until function J (θ)≤0.0001, trained neural network CNN is obtained;
(4) test network
By the multi-view image of each CAD model in ModelNet10 test setsIt is input to trained neural network In, count the precision of object classification and Attitude estimation.
2. according to the method described in claim 1, wherein the first pretreatment strategy of step (1) is in regarding where CAD model 12 predefined viewpoints are equably set on the circle of angle, are first to fix an axis as rotary shaft, then at the visual angle where object One point of observation is set on circle every 30 degree, i.e., on 360 degree of visual angle circle, obtains 12 differences of each CAD model correspondence and regards The image at angle.
3. according to the method described in claim 1, optimization problem wherein in step (2c), is realized as follows:
It willIt is denoted asThen optimization problem is expressed as following form:
Wherein (i) indicates input picture xi, k expression images xiClass label, j indicates image xiIt is from j-th of predefined viewpoint It observes, R is neural network weight parameter.
4. according to the method described in claim 1, convolutional neural networks CNN of the structure containing 16 layers wherein in step (3a), step It is rapid as follows:
The image of 227*227 pixel sizes is input to the first convolutional layer Conv1 by (3a1), and it is 11* that convolution kernel size is carried out to it 11 pixels and the convolution operation that step-length is 4 pixels obtain the characteristic pattern of 96 55*55 pixel sizes in total with 96 convolution kernels;
96 characteristic patterns of the first convolutional layer Conv1 outputs are input to the first pond layer Pool1 by (3a2), and maximum is carried out to it Pondization operates, and the size of pond block is 3*3 pixels, and step-length is 2 pixels, obtains the characteristic pattern of 96 27*27 pixel sizes;
First pond layer Pool1,96 characteristic patterns exported are input to the second convolutional layer Conv2 by (3a3), and convolution is carried out to it The convolution operation that core size is 5*5 pixels and step-length is 1 obtains 256 27*27 pixel sizes in total with 256 convolution kernels Characteristic pattern;
256 characteristic patterns of the second convolutional layer Conv2 outputs are input to the second pond layer Pool2 by (3a4), and maximum is carried out to it Pondization operates, and the size of pond block is 3*3 pixels, and step-length is 2 pixels, obtains the characteristic pattern of 256 13*13 pixel sizes;
Second pond layer Pool2,256 characteristic patterns exported are input to third convolutional layer Conv3 by (3a5), and convolution is carried out to it The convolution operation that core size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, it is big to obtain 384 13*13 pixels Small characteristic pattern;
384 characteristic patterns that third convolutional layer Conv3 is exported are input to Volume Four lamination Con4 by (3a6), and convolution is carried out to it The convolution operation that core size is 3*3 pixels and step-length is 1 pixel shares 384 convolution kernels, it is big to obtain 384 13*13 pixels Small characteristic pattern;
The characteristic pattern of third convolutional layer Conv3 and Volume Four lamination Conv4 are input to the first Eltwise1 layers of progress spy by (3a7) Sign figure fusion, obtains the characteristic pattern of 384 13*13 pixel sizes;
384 characteristic patterns that Volume Four lamination Conv4 is exported are input to the 5th convolutional layer Conv5 by (3a8), and convolution is carried out to it The convolution operation that core size is 3*3 pixels and step-length is 1 pixel obtains 256 13*13 pixel sizes that is, with 256 convolution kernels Characteristic pattern;
256 characteristic patterns of the 5th convolutional layer Conv5 outputs are input to the 5th pond layer Pool5 by (3a9), and maximum is carried out to it Pondization operates, and pond block size is 3*3 pixel sizes, and step-length is 2 pixels, obtains the characteristic pattern of 256 6*6 pixel sizes;
384 characteristic patterns of the first Eltwise1 layers of output are input to the first full articulamentum fc_a1 by (3a10), and characteristic pattern is reflected It penetrates as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the second full articulamentum fc_a2 by (3a11), by feature Figure is mapped as the feature vector of 1*1*4096 pixel sizes;
256 characteristic patterns of the layer Pool5 layers of output in the 5th pond are input to the full articulamentum fc6 of third by (3a12), by characteristic pattern It is mapped as the feature vector of 1*1*4096 pixel sizes;
The feature vector of the 1*1*4096 pixel sizes of the full articulamentum fc6 layers of output of third is input to the 4th full connection by (3a13) Layer fc7 continues feature extraction, obtains the feature vector of 1*1*4096 pixel sizes;
The feature vector that first full articulamentum fc_a1, the second full connection fc_a2 and the 4th are connected fc7 layers by (3a14) entirely inputs To the 2nd Eltwise2 layers, the fusion of feature vector is carried out, the feature vector of 1*1*4096 pixel sizes is obtained;
The characteristic pattern of the 2nd Eltwise2 layers of 1*1*4096 pixel sizes exported is input to the 5th full articulamentum by (3a15) Fc8, by the feature vector that maps feature vectors are 1*1*11*M pixel sizes, wherein M is multi-view image number, symbol " * " table Show multiplication;
(3a16) is by 1*1*11*M) feature vector of plain size is input to classification layer Softmax, obtain image xiClass label, Selection is so that the maximum visual angle label v of class probabilityiAs its pose label.
5. according to the method described in claim 1, training convolutional neural networks CNN wherein in step (3b), as follows into Row:
(3b1) takes a training sample in the propagated forward stage from training set, by the multi-view image of the training sampleIt is input to the input layer of convolutional neural networks CNN, it is final by Softmax layers of output by feature extraction and Feature Mapping As a result;
(3b2) calculates the difference of the ideal output of convolutional neural networks CNN reality outputs and training sample in back-propagation phase, By the method for minimization error, backpropagation adjusts the weighting parameter R of convolutional neural networks;
(3b3) repeats the operation of (3b1) and (3b2), until convolutional neural networks CNN loss functions J (θ)≤0.0001.
CN201810243399.4A 2018-03-23 2018-03-23 Object classification and pose estimation method based on neural network Active CN108491880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810243399.4A CN108491880B (en) 2018-03-23 2018-03-23 Object classification and pose estimation method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810243399.4A CN108491880B (en) 2018-03-23 2018-03-23 Object classification and pose estimation method based on neural network

Publications (2)

Publication Number Publication Date
CN108491880A true CN108491880A (en) 2018-09-04
CN108491880B CN108491880B (en) 2021-09-03

Family

ID=63319473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810243399.4A Active CN108491880B (en) 2018-03-23 2018-03-23 Object classification and pose estimation method based on neural network

Country Status (1)

Country Link
CN (1) CN108491880B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493417A (en) * 2018-10-31 2019-03-19 深圳大学 Three-dimension object method for reconstructing, device, equipment and storage medium
CN109598339A (en) * 2018-12-07 2019-04-09 电子科技大学 A kind of vehicle attitude detection method based on grid convolutional network
CN109902675A (en) * 2018-09-17 2019-06-18 华为技术有限公司 The method and apparatus of the pose acquisition methods of object, scene reconstruction
CN109903332A (en) * 2019-01-08 2019-06-18 杭州电子科技大学 A kind of object's pose estimation method based on deep learning
CN109934864A (en) * 2019-03-14 2019-06-25 东北大学 Residual error network depth learning method towards mechanical arm crawl pose estimation
CN109978907A (en) * 2019-03-22 2019-07-05 南京邮电大学 A kind of sitting posture of student detection method towards household scene
CN110322510A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of 6D position and orientation estimation method using profile information
CN110728222A (en) * 2019-09-30 2020-01-24 清华大学深圳国际研究生院 Pose estimation method for target object in mechanical arm grabbing system
CN110728192A (en) * 2019-09-16 2020-01-24 河海大学 High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN110728187A (en) * 2019-09-09 2020-01-24 武汉大学 Remote sensing image scene classification method based on fault tolerance deep learning
CN111126441A (en) * 2019-11-25 2020-05-08 西安工程大学 Construction method of classification detection network model
CN111191492A (en) * 2018-11-15 2020-05-22 北京三星通信技术研究有限公司 Information estimation, model retrieval and model alignment methods and apparatus
CN111259735A (en) * 2020-01-08 2020-06-09 西安电子科技大学 Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network
CN111325166A (en) * 2020-02-26 2020-06-23 南京工业大学 Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network
CN111860039A (en) * 2019-04-26 2020-10-30 四川大学 Cross-connection CNN + SVR-based street space quality quantification method
CN112163477A (en) * 2020-09-16 2021-01-01 厦门市特种设备检验检测院 Escalator pedestrian pose target detection method and system based on FasterR-CNN
CN112381879A (en) * 2020-11-16 2021-02-19 华南理工大学 Object posture estimation method, system and medium based on image and three-dimensional model
CN112396077A (en) * 2019-08-15 2021-02-23 瑞昱半导体股份有限公司 Fully-connected convolutional neural network image processing method and circuit system
CN112528941A (en) * 2020-12-23 2021-03-19 泰州市朗嘉馨网络科技有限公司 Automatic parameter setting system based on neural network
CN112634367A (en) * 2020-12-25 2021-04-09 天津大学 Anti-occlusion object pose estimation method based on deep neural network
CN112857215A (en) * 2021-01-08 2021-05-28 河北工业大学 Monocular 6D pose estimation method based on regular icosahedron
CN113129370A (en) * 2021-03-04 2021-07-16 同济大学 Semi-supervised object pose estimation method combining generated data and label-free data
CN113436266A (en) * 2020-03-23 2021-09-24 丰田自动车株式会社 Image processing system, image processing method, method of training neural network, and recording medium for executing the method
CN113705480A (en) * 2021-08-31 2021-11-26 新东方教育科技集团有限公司 Gesture recognition method, device and medium based on gesture recognition neural network
WO2022022063A1 (en) * 2020-07-27 2022-02-03 腾讯科技(深圳)有限公司 Three-dimensional human pose estimation method and related device
CN114742212A (en) * 2022-06-13 2022-07-12 南昌大学 Electronic digital information resampling rate estimation method
CN111191492B (en) * 2018-11-15 2024-07-02 北京三星通信技术研究有限公司 Information estimation, model retrieval and model alignment methods and devices

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375831A (en) * 2010-08-13 2012-03-14 富士通株式会社 Three-dimensional model search device and method thereof and model base generation device and method thereof
US20160327653A1 (en) * 2014-02-03 2016-11-10 Board Of Regents, The University Of Texas System System and method for fusion of camera and global navigation satellite system (gnss) carrier-phase measurements for globally-referenced mobile device pose determination
WO2017015390A1 (en) * 2015-07-20 2017-01-26 University Of Maryland, College Park Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition
CN106372648A (en) * 2016-10-20 2017-02-01 中国海洋大学 Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN106845510A (en) * 2016-11-07 2017-06-13 中国传媒大学 Chinese tradition visual culture Symbol Recognition based on depth level Fusion Features
CN106845515A (en) * 2016-12-06 2017-06-13 上海交通大学 Robot target identification and pose reconstructing method based on virtual sample deep learning
CN107169421A (en) * 2017-04-20 2017-09-15 华南理工大学 A kind of car steering scene objects detection method based on depth convolutional neural networks
CN107330463A (en) * 2017-06-29 2017-11-07 南京信息工程大学 Model recognizing method based on CNN multiple features combinings and many nuclear sparse expressions
CN107527068A (en) * 2017-08-07 2017-12-29 南京信息工程大学 Model recognizing method based on CNN and domain adaptive learning
CN107657249A (en) * 2017-10-26 2018-02-02 珠海习悦信息技术有限公司 Method, apparatus, storage medium and the processor that Analysis On Multi-scale Features pedestrian identifies again
CN107808146A (en) * 2017-11-17 2018-03-16 北京师范大学 A kind of multi-modal emotion recognition sorting technique

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375831A (en) * 2010-08-13 2012-03-14 富士通株式会社 Three-dimensional model search device and method thereof and model base generation device and method thereof
US20160327653A1 (en) * 2014-02-03 2016-11-10 Board Of Regents, The University Of Texas System System and method for fusion of camera and global navigation satellite system (gnss) carrier-phase measurements for globally-referenced mobile device pose determination
WO2017015390A1 (en) * 2015-07-20 2017-01-26 University Of Maryland, College Park Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition
CN106372648A (en) * 2016-10-20 2017-02-01 中国海洋大学 Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN106845510A (en) * 2016-11-07 2017-06-13 中国传媒大学 Chinese tradition visual culture Symbol Recognition based on depth level Fusion Features
CN106845515A (en) * 2016-12-06 2017-06-13 上海交通大学 Robot target identification and pose reconstructing method based on virtual sample deep learning
CN107169421A (en) * 2017-04-20 2017-09-15 华南理工大学 A kind of car steering scene objects detection method based on depth convolutional neural networks
CN107330463A (en) * 2017-06-29 2017-11-07 南京信息工程大学 Model recognizing method based on CNN multiple features combinings and many nuclear sparse expressions
CN107527068A (en) * 2017-08-07 2017-12-29 南京信息工程大学 Model recognizing method based on CNN and domain adaptive learning
CN107657249A (en) * 2017-10-26 2018-02-02 珠海习悦信息技术有限公司 Method, apparatus, storage medium and the processor that Analysis On Multi-scale Features pedestrian identifies again
CN107808146A (en) * 2017-11-17 2018-03-16 北京师范大学 A kind of multi-modal emotion recognition sorting technique

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ELHOSEINY M等: "A comparative analysis and study of multiview cnn models for joint object categorization and pose estimation", 《INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *
RAJEEV RANJAN等: "HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition", 《10.1109/TPAMI.2017.2781233》 *
TOMAS PFISTER等: "Flowing ConvNets for Human Pose Estimation in Videos", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
郭树旭等: "基于全卷积神经网络的肝脏CT影像分割研究", 《计算机工程与应用》 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902675A (en) * 2018-09-17 2019-06-18 华为技术有限公司 The method and apparatus of the pose acquisition methods of object, scene reconstruction
CN109902675B (en) * 2018-09-17 2021-05-04 华为技术有限公司 Object pose acquisition method and scene reconstruction method and device
CN109493417B (en) * 2018-10-31 2023-04-07 深圳大学 Three-dimensional object reconstruction method, device, equipment and storage medium
CN109493417A (en) * 2018-10-31 2019-03-19 深圳大学 Three-dimension object method for reconstructing, device, equipment and storage medium
CN111191492A (en) * 2018-11-15 2020-05-22 北京三星通信技术研究有限公司 Information estimation, model retrieval and model alignment methods and apparatus
CN111191492B (en) * 2018-11-15 2024-07-02 北京三星通信技术研究有限公司 Information estimation, model retrieval and model alignment methods and devices
CN109598339A (en) * 2018-12-07 2019-04-09 电子科技大学 A kind of vehicle attitude detection method based on grid convolutional network
CN109903332A (en) * 2019-01-08 2019-06-18 杭州电子科技大学 A kind of object's pose estimation method based on deep learning
CN109934864B (en) * 2019-03-14 2023-01-20 东北大学 Residual error network deep learning method for mechanical arm grabbing pose estimation
CN109934864A (en) * 2019-03-14 2019-06-25 东北大学 Residual error network depth learning method towards mechanical arm crawl pose estimation
CN109978907A (en) * 2019-03-22 2019-07-05 南京邮电大学 A kind of sitting posture of student detection method towards household scene
CN111860039A (en) * 2019-04-26 2020-10-30 四川大学 Cross-connection CNN + SVR-based street space quality quantification method
CN110322510A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of 6D position and orientation estimation method using profile information
CN112396077A (en) * 2019-08-15 2021-02-23 瑞昱半导体股份有限公司 Fully-connected convolutional neural network image processing method and circuit system
CN110728187A (en) * 2019-09-09 2020-01-24 武汉大学 Remote sensing image scene classification method based on fault tolerance deep learning
CN110728187B (en) * 2019-09-09 2022-03-04 武汉大学 Remote sensing image scene classification method based on fault tolerance deep learning
CN110728192A (en) * 2019-09-16 2020-01-24 河海大学 High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN110728192B (en) * 2019-09-16 2022-08-19 河海大学 High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN110728222A (en) * 2019-09-30 2020-01-24 清华大学深圳国际研究生院 Pose estimation method for target object in mechanical arm grabbing system
CN111126441A (en) * 2019-11-25 2020-05-08 西安工程大学 Construction method of classification detection network model
CN111259735A (en) * 2020-01-08 2020-06-09 西安电子科技大学 Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network
CN111325166B (en) * 2020-02-26 2023-07-07 南京工业大学 Sitting posture identification method based on projection reconstruction and MIMO neural network
CN111325166A (en) * 2020-02-26 2020-06-23 南京工业大学 Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network
CN113436266B (en) * 2020-03-23 2024-05-14 丰田自动车株式会社 Image processing system, image processing method, method of training neural network, and recording medium for performing the method
CN113436266A (en) * 2020-03-23 2021-09-24 丰田自动车株式会社 Image processing system, image processing method, method of training neural network, and recording medium for executing the method
WO2022022063A1 (en) * 2020-07-27 2022-02-03 腾讯科技(深圳)有限公司 Three-dimensional human pose estimation method and related device
CN112163477A (en) * 2020-09-16 2021-01-01 厦门市特种设备检验检测院 Escalator pedestrian pose target detection method and system based on FasterR-CNN
CN112163477B (en) * 2020-09-16 2023-09-22 厦门市特种设备检验检测院 Escalator pedestrian pose target detection method and system based on Faster R-CNN
CN112381879A (en) * 2020-11-16 2021-02-19 华南理工大学 Object posture estimation method, system and medium based on image and three-dimensional model
WO2022100379A1 (en) * 2020-11-16 2022-05-19 华南理工大学 Object attitude estimation method and system based on image and three-dimensional model, and medium
CN112528941A (en) * 2020-12-23 2021-03-19 泰州市朗嘉馨网络科技有限公司 Automatic parameter setting system based on neural network
CN112528941B (en) * 2020-12-23 2021-11-19 芜湖神图驭器智能科技有限公司 Automatic parameter setting system based on neural network
CN112634367A (en) * 2020-12-25 2021-04-09 天津大学 Anti-occlusion object pose estimation method based on deep neural network
CN112857215B (en) * 2021-01-08 2022-02-08 河北工业大学 Monocular 6D pose estimation method based on regular icosahedron
CN112857215A (en) * 2021-01-08 2021-05-28 河北工业大学 Monocular 6D pose estimation method based on regular icosahedron
CN113129370B (en) * 2021-03-04 2022-08-19 同济大学 Semi-supervised object pose estimation method combining generated data and label-free data
CN113129370A (en) * 2021-03-04 2021-07-16 同济大学 Semi-supervised object pose estimation method combining generated data and label-free data
CN113705480A (en) * 2021-08-31 2021-11-26 新东方教育科技集团有限公司 Gesture recognition method, device and medium based on gesture recognition neural network
CN114742212A (en) * 2022-06-13 2022-07-12 南昌大学 Electronic digital information resampling rate estimation method

Also Published As

Publication number Publication date
CN108491880B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN108491880A (en) Object classification based on neural network and position and orientation estimation method
Xu et al. Light-YOLOv3: fast method for detecting green mangoes in complex scenes using picking robots
Hu et al. SAC-Net: Spatial attenuation context for salient object detection
Uçar et al. A new facial expression recognition based on curvelet transform and online sequential extreme learning machine initialized with spherical clustering
CN108304826A (en) Facial expression recognizing method based on convolutional neural networks
CN108510012A (en) A kind of target rapid detection method based on Analysis On Multi-scale Features figure
CN110032925B (en) Gesture image segmentation and recognition method based on improved capsule network and algorithm
CN104700076B (en) Facial image virtual sample generation method
CN107609460A (en) A kind of Human bodys' response method for merging space-time dual-network stream and attention mechanism
CN107316307A (en) A kind of Chinese medicine tongue image automatic segmentation method based on depth convolutional neural networks
CN112862792B (en) Wheat powdery mildew spore segmentation method for small sample image dataset
CN110222718B (en) Image processing method and device
Xu et al. Spherical DNNs and Their Applications in 360$^\circ $∘ Images and Videos
CN113436227A (en) Twin network target tracking method based on inverted residual error
Yan et al. Monocular depth estimation with guidance of surface normal map
CN115393596B (en) Garment image segmentation method based on artificial intelligence
Xu et al. Face expression recognition based on convolutional neural network
Hu et al. A spatio-temporal integrated model based on local and global features for video expression recognition
Li et al. Fast recognition of pig faces based on improved Yolov3
Yue et al. DRGCNN: Dynamic region graph convolutional neural network for point clouds
Sun et al. Overview of capsule neural networks
Zhang Research on Image Recognition Based on Neural Network
Chauhan et al. Empirical Study on convergence of Capsule Networks with various hyperparameters
Ocegueda-Hernandez et al. A lightweight convolutional neural network for pose estimation of a planar model
Li Research on target feature extraction and location positioning with machine learning algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant