CN108596256B - Object recognition classifier construction method based on RGB-D - Google Patents

Object recognition classifier construction method based on RGB-D Download PDF

Info

Publication number
CN108596256B
CN108596256B CN201810383002.1A CN201810383002A CN108596256B CN 108596256 B CN108596256 B CN 108596256B CN 201810383002 A CN201810383002 A CN 201810383002A CN 108596256 B CN108596256 B CN 108596256B
Authority
CN
China
Prior art keywords
rgb
network
depth
layer
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810383002.1A
Other languages
Chinese (zh)
Other versions
CN108596256A (en
Inventor
胡勇
周锋
迟小羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Research Institute Of Beihang University
Original Assignee
Qingdao Research Institute Of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Research Institute Of Beihang University filed Critical Qingdao Research Institute Of Beihang University
Priority to CN201810383002.1A priority Critical patent/CN108596256B/en
Publication of CN108596256A publication Critical patent/CN108596256A/en
Application granted granted Critical
Publication of CN108596256B publication Critical patent/CN108596256B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a new construction method of an object recognition classifier based on RGB-D, which mainly solves the problems that the scale of the existing RGB-D database is small and the recognition accuracy of the trained RGB-D classifier on rare objects in the database is not high, and comprises the following steps: the method comprises the steps of collecting RGB modal pictures of an object and depth modal pictures at the same pose, sequentially extracting the characteristics of the RGB modal pictures and the characteristics of the corresponding depth modal pictures, then manually analyzing the collected RGB modal pictures and the depth modal pictures in sequence, and adding labels. An object classifier is constructed by combining the RGB modal features and depth modal features together. The method can be applied to object identification application, and can effectively identify the type of the current object by sampling RGB and depth modal data of the current object.

Description

Object recognition classifier construction method based on RGB-D
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a RGB-D-based object recognition classifier construction method.
Background
Since the invention of ENIAC computer, which started to operate in Philadelphia on 2/14 of 1946, some researchers and users with advanced insights are thinking and discussing whether computers can have independent and autonomous thinking and problem solving ability like human, which is called early artificial intelligence. To determine whether a machine has intelligence, computer scientists and pioneering graphics of cryptography have proposed in the "computer machines and intelligence" literature the concept of "turing tests" that a computer passes a test if it can answer a series of questions posed by a human tester in 5 minutes, and if more than 30% of the answers are deemed to be answered by humans rather than by computers. The ultimate goal of artificial intelligence is to liberate human beings from complex, dangerous, repeated, monotonous work and the like, improve the lives of people and promote the development of human beings. In the research of biologists, more than 80% of the external information received by human beings comes from both eyes of the human beings, so that the machine vision is particularly important compared with the research of computers. The object recognition task is one of the most basic and important tasks in machine vision.
For the identification of objects, the existing technologies can be roughly classified into three categories: (1) object recognition based on RGB. The method is to extract the characteristic information of RGB modal data, and input the extracted characteristic RGB characteristic information into a specific classifier to identify the object. (2) Object recognition based on depth. The method is to extract the characteristic information of depth modal data, and to input the extracted depth characteristic information into a specific classifier to identify the object. (3) Based on a mode of combining RGB and depth modal information, fusing RGB data and depth data into 4-channel picture data and then extracting features, or respectively extracting features from RGB modal data and depth modal data, and then combining the RGB data and the depth modal data to carry out object identification on a classifier.
The patent with the application number of CN201510402298.3 discloses an RGB-D image classification method and system, which mainly use a method based on the current popular deep learning convolutional neural network CNN to extract the characteristics of RGB and depth, and then artificially splice together, and train an SVM in a dictionary learning mode. The CNN network is a data-driven method, namely a large amount of labeled training data is needed, and the existing RGB-D labeled classification data set is very small compared with the RGB label data set, so that the CNN network provided by the invention cannot be supported sufficiently, and a very serious overfitting problem is easily caused; meanwhile, since many real-world situations are rare, for example, apples bought and sold in fruit stores, and some apples are blocked in a large area due to a large number of trademarks attached to the apples, the situation is basically hard to see for the RGB-D data sets collected by people, and the long tail situation causes that the RGB-D classifier constructed by people is not ideal for solving the classification situation in the situation.
In view of the above, it is an important object of the present invention to provide a new object recognition method based on RGB-D modal data, so as to solve the problem of small scale of the existing RGB-D database and the problem of low accuracy of the trained RGB-D classifier in recognizing rare objects in the database.
Disclosure of Invention
The invention provides a new object recognition classifier construction method based on RGB-D, aiming at the problems that the existing RGB-D database is not enough to support the training of a deep neural network and is easy to cause overfitting, and meanwhile, the large-scale database has serious long tail distribution, and the scheme is as follows:
a construction method of an object recognition classifier based on RGB-D comprises the following steps:
step one, constructing an RGB-D object recognition database
Figure BDA0001641455200000021
Wherein the RGB modality data is recorded as
Figure BDA0001641455200000022
depth modal data note
Figure BDA0001641455200000023
Step two, identifying and classifying the collected RGB-D pictures, manually calibrating the category of each picture, c*E {1, 2.., C }, where C represents the total number of categories of pictures we captured;
step three, converting the acquired picture by using four conversion operations of T ═ T, s, r and c; creating an agent class for each picture to obtain an RGB modal agent class training set
Figure BDA0001641455200000024
And depth modal proxy class training set
Figure BDA0001641455200000025
Wherein t represents the vertical and horizontal movement of the picture, s represents the size conversion operation of the picture, r represents the rotation operation of the picture, and c represents the color conversion operation of the picture;
step four, network training process, utilizing the collected RGB modal data
Figure BDA0001641455200000026
Created proxy class
Figure BDA0001641455200000027
Training an RGB network for object recognition; preprocessing the picture input into the RGB training network, and inputting the processed picture into the network to train the RGB network by selectively shielding the most distinguishing area in the picture input into the network;
step five, network training process, utilizing collected depth modal data
Figure BDA0001641455200000031
Created proxy class
Figure BDA0001641455200000032
Training a depth network of object recognition, adopting the same preprocessing operation as RGB (red, green and blue) modal data for depth modal data, and inputting the processed pictures into the depth training network for training the depth network;
step six, a network training process, namely fusing the RGB network and the depth network together by a classifier fusion method to form an RGB-D object recognition network;
seventhly, a network reasoning process, namely extracting the characteristics of RGB modal data by utilizing an RGB network in an RGB-D object recognition network;
step eight, extracting features of depth modal data by using a depth network in the RGB-D object recognition network;
step nine, fusing the extracted RGB features and depth features together through fusion of the classifier layer, and recording the fused features as frgbd
Step ten, fusing the characteristics frgbdSending the data to a classifierrgbdThe identification of the object is performed.
Further, the process of extracting the features of the RGB modal data by using the RGB network in the seventh step is as follows: firstly, normalizing the collected pictures, then sending the normalized pictures into a 5-layer convolution network, obtaining a characteristic diagram after convolution and then connecting a pooling layer, inputting the obtained characteristic diagram into a three-layer full-connection network, and obtaining a characteristic diagram frgb
Further, the process of extracting features of depth modal data by using a depth network in the step eight is as follows: firstly, normalizing the collected pictures into the same size, then sending the normalized pictures into a 5-layer convolutional network, obtaining a heat map area of a characteristic map input into the network through a 5-layer multilayer perceptron, shielding one third area in the heat map by randomly selecting the area, inputting the shielded pictures into a pooling layer to obtain the characteristic map, inputting the obtained characteristic map into a two-layer fully-connected network to obtain a characteristic map fdepth
Further, in the ninth step, the extracted features of the two modalities are fused to construct a fusion feature frgbdThe fusion method comprises the following steps: f obtainedrgbAnd fdepthSplicing the materials together according to the channel dimension, and if the materials are convolutional layers, utilizing the formula:
Figure BDA0001641455200000041
where l denotes the layer i network, featurelThe first layer is shown as a characteristic diagram, stridelThe step size of the convolution kernel shift is indicated; if it is a pooling layer, then the formula is used
Figure BDA0001641455200000042
Where kernel represents the pooled kernel size of the pooled layer.
Further, in the step ten, the classification of the classifier is calculated as follows: the fusion features of a test sample are extracted and input into a classifier, the trained classifier returns C numerical values to the input RGB-D object image through SoftMax, and then the class of the object is predicted by calculating which numerical value is the largest.
Compared with the prior art, the invention has the advantages and positive effects that:
the invention provides a novel RGB-D-based object classifier construction method, which can analyze the class of a collected object according to collected multi-mode information and take the characteristics among the multi-mode information into consideration during training. Most object classifiers utilize RGB texture information to train the classifier by extracting features from RGB images, but in reality, the problem that the classification of two cups is difficult to solve only by texture exists, for example, two mug cups with similar colors are difficult to distinguish by texture, but because the distance relationship exists between the two cups, the two cups can be distinguished by the depth relationship between the two cups. The method provided by the invention combines the RGB characteristic information and depth characteristic information to train the object classifier, so that the method is more suitable for actual conditions.
Detailed Description
The invention provides a classifier construction method based on RGB-D object recognition, which is mainly used for providing a counterlearning module based on a deep learning convolutional neural network CNN which is very popular at present, and providing a sample which is difficult to classify through an artificially constructed classifier (the method used in the method is that the most discriminant area in an image is artificially shielded, for example, if an animal in the image is identified to be a dog or not, the most discriminant area is the dog head for the dog, the head of the dog is artificially shielded), and the difficulty of training of the classifier is increased by utilizing the artificially manufactured difficult example, so that the trained classifier is more discriminant and is more robust. In addition, the proposed method is an end-to-end method, segmentation optimization is not needed, and direct optimization can be performed from beginning to end.
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be further described with reference to the following examples.
The embodiment provides a classifier construction method based on RGB-D object recognition, which comprises the following steps:
the method comprises the following steps: constructing an RGB-D object recognition database;
the method comprises the steps of collecting indoor general office articles by using a Microsoft depth sensor Kinect V1, placing the articles on a rotary platform, collecting the articles once every 5 degrees to form an RGB-D object identification database
Figure BDA0001641455200000051
Wherein the RGB modality data is recorded as
Figure BDA0001641455200000052
depth modal data note
Figure BDA0001641455200000053
Step two, identifying and classifying the collected RGB-D pictures by manpower, calibrating the category of each picture by manpower, c*E {1, 2.., C }, where C represents the total number of categories of pictures we captured;
and step three, transforming the acquired picture by using four transformation operations of T ═ T, s, r and c, wherein T operation represents that the picture is vertically and horizontally translated, s represents a scale factor, the size of the picture is multiplied by the scale factor to achieve the transformation operation of the size of the picture, r represents that the picture is rotated, and c represents a color transformation operation. Through the four operations, an agent class is created for each picture, and an RGB modal agent class training set is obtained
Figure BDA0001641455200000054
And depth modal proxy class training set
Figure BDA0001641455200000055
The process of creating an agent class for each captured picture in this step is as follows: a first operation of t, a horizontal and vertical transformation of size t e (-0.2,0.2), a second operation of S, a picture size transformation of interval S e (0.5,1), a third operation of r, an image rotation (-20,20) a random value, a fourth operation of c, a color transformation (converting an RGB picture into an HSV color space, for S and V components, pow (x, a ') exponentiation b' + c ', where pow represents an exponential operation, x represents the component values of the current S and V, a' is a random number between (0.25, 4), b 'is a random number between (0.7,2.1), c' is a random number between (-0.25,0.25), for H component y d '+ e', where y represents the current H component, d 'is a value between (0.7,1.4), e' is a value (-0.1.1), 0.1) of the total).
The step mainly solves the problem that the small sample data set is easy to be over-fitted, and the existing RGB-D data set is combined by the four transformations provided by the invention, so that an agent class can be generated for each picture, and each agent class shares one label data, thereby solving the problem of data set expansion on the basis of not increasing the labor cost. In the step, the influence of the number of the pictures in the proxy class generated by each picture on network training is mainly considered, if the number of the pictures in the proxy class is too small, the phenomenon of over-fitting is still easily caused, and if the number of the pictures in the proxy class is too large, the similarity between the pictures is too high, the information redundancy between data is easily caused, and the extraction of the effective features of the pictures is not beneficial;
step four, network training process, utilizing the collected RGB modal data
Figure BDA0001641455200000061
Created proxy class
Figure BDA0001641455200000062
An RGB network for object recognition is trained. Because the real situation is more complicated than the training data sampled by people, in order to enable the RGB network trained by people to well process the complicated situation in the real situation, the pictures input into the RGB training network are preprocessed, the processed pictures are input into the network to train the RGB network by selectively shielding the most distinguishing region of the pictures input into the network;
the embodiment adopts the most intuitive and simple method for artificially manufacturing the training difficult sample, and of course, other methods can be adopted, for example, the most discriminant area of the feature map in the specific layer is shielded in the running process.
Step five, network training process, utilizing collected depth modal data
Figure BDA0001641455200000063
Created proxy class
Figure BDA0001641455200000064
Training a depth network of object recognition, adopting the same preprocessing operation as RGB (red, green and blue) modal data for depth modal data, and inputting the processed pictures into the depth training network for training the depth network;
step six, a network training process, namely fusing the RGB network and the depth network together by a classifier fusion method to form an RGB-D object recognition network;
seventhly, a network reasoning process, namely extracting the characteristics of RGB modal data by utilizing an RGB network in an RGB-D object recognition network;
firstly, the collected pictures are normalized to be s × s (s represents a fixed value and has no practical meaning, in the embodiment, the pictures are normalized to be 257 × 257), then the normalized pictures are sent to a 5-layer convolution network, the size of a first convolution layer kernel is 11 × 11, the number of convolution kernels is 96, the size of a second convolution kernel is 5 × 5, the number of convolution kernels is 256, the size of a third convolution kernel is 3 × 3, the size of a convolution kernel is 384, the size of a fourth convolution kernel is 3 × 3, and the size of a convolution kernel is 384The fifth convolution kernel size is 3 × 3 and the convolution kernel size is 256, and this layer of convolution is followed by a pooling layer (pooling), resulting in a feature map denoted fmrgbWill yield fmrgbInputting the data into a three-layer fully-connected network, wherein the output size of a first fully-connected layer is 4096, the output size of a second fully-connected layer is 4096, and a characteristic diagram obtained after the data passes through the two layers of fully-connected networks is frgb
Step eight, extracting features of depth modal data by using a depth network in the RGB-D object recognition network;
firstly, normalizing the collected pictures into the pictures with the same size of s multiplied by s, then sending the normalized pictures into a convolution network with 5 layers, wherein the size of a first convolution kernel is 11 multiplied by 11, the number of convolution kernels is 96, the size of a second convolution kernel is 5 multiplied by 5, the number of convolution kernels is 256, the size of a third convolution kernel is 3 multiplied by 3, the size of a convolution kernel is 384, the size of a fourth convolution kernel is 3 multiplied by 3, the size of a convolution kernel is 384, the size of a fifth convolution kernel is 3 multiplied by 3, and the size of a convolution kernel is 256, inputting the characteristic diagram obtained by the layers into the convolution network with 5 layers, the convolution kernel of each layer is 3 multiplied by 3, obtaining the heat map area of the characteristic diagram input into the network by a multilayer perceptron with the 5 layers, shielding by randomly selecting one third area in the heat map, inputting the shielded picture into the next pooling layer, get the characteristic map as fmdepthWill yield fmrgbInputting the data into a two-layer fully-connected network, wherein the output size of a first fully-connected layer is 4096, the output size of a second fully-connected layer is 4096, and a characteristic diagram obtained after the data passes through the two-layer fully-connected network is fdepth
Step nine, fusing the extracted RGB features and depth features together through fusion of the classifier layer, and recording the fused features as frgbdThe method comprises the following steps:
subjecting the obtained f torgbAnd fdepthPieced together according to channel dimensions, i.e. assuming the feature map f obtained by step sevenrgbThe dimension of (a) is n x h x w x c, wherein n represents the number of pictures input into the network at one time, and h represents the number of pictures input into the networkThe length of the characteristic diagram, w represents the width of the characteristic diagram input into the layer network, and c represents the number of channels of the characteristic diagram input into the layer network. For the first layer of the network, i.e. the input layer of the network, where n is set according to its hardware conditions (n ≧ 1), h and w are determined according to the size of the picture in the input network, c is 3 if the input network is an RGB image, and c is 1 if it is a grayscale or depth picture. Then n of each layer is kept unchanged, the size of c is determined by the number of convolution kernels of the previous layer, h and w are determined according to the property of the previous network layer, and if the network layer is a convolution layer, the formula is used:
Figure BDA0001641455200000071
to calculate. Wherein l denotes a hierarchy of hierarchylThe first layer is shown as a characteristic diagram, stridelThe step size of the convolution kernel shift is indicated. If it is a pooling layer, then the formula is used:
Figure BDA0001641455200000081
where kernel denotes the pooled kernel size of the pooled layer. h and w are calculated features respectivelyl+1The second dimension and the third dimension. Will f isrgbAnd fdepthConnected according to a fourth dimension.
Step ten, fusing the characteristics frgbdSending the data to a classifierrgbdThe identification of the object is performed.
And inputting the characteristic diagram obtained in the ninth step into a SoftMax classifier for training the classifier. Thus far, an object recognition classifier based on RGB-D is constructed. The classification results are calculated as follows: and finally, comparing which numerical value is the largest of the C numerical values, wherein the category corresponding to the numerical value with the largest numerical value is the category to which the test sample belongs.
The execution environment of the invention adopts a 3.3 central processing unit and a core computer with 8 Gbyte memory, and simultaneously, in order to accelerate the training and reasoning process of an object recognition network, 4 blocks of Yingwei GeForce GTX 1080TI GPU display cards are adopted for accelerating the calculation. Meanwhile, the C + + and python languages are adopted to compile a construction program based on the RGB-D object recognition classifier, and other execution environments can be adopted, which are not described herein again.
The method mainly researches how to train the object recognition deep neural network on the small-scale RGB-D data set without fitting, and generates an agent class for each sample data by providing a series of transformation rules and then applying the transformation rules to the training sample image blocks so as to support the deep neural network to train on the small-scale data set. Aiming at the problem of long tail distribution in a processing data set, namely that some samples are too rare, the number of the rare samples is not enough to support deep neural network learning in a training set, and the problem of low identification accuracy of hard examples in the data set is solved by providing an antagonistic learning network. The accuracy and robustness of the constructed object identification network are improved by the two methods.
The above description is only a preferred embodiment of the present invention, and not intended to limit the present invention in other forms, and any person skilled in the art may apply the above modifications or changes to the equivalent embodiments with equivalent changes, without departing from the technical spirit of the present invention, and any simple modification, equivalent change and change made to the above embodiments according to the technical spirit of the present invention still belong to the protection scope of the technical spirit of the present invention.

Claims (3)

1. A construction method of an object recognition classifier based on RGB-D is characterized by comprising the following steps:
step one, constructing an RGB-D object recognition database
Figure RE-FDA0003296540830000011
Wherein the RGB modality data is recorded as
Figure RE-FDA0003296540830000012
depth modal data note
Figure RE-FDA0003296540830000013
Step two, identifying and classifying the collected RGB-D pictures, manually calibrating the category of each picture, c*E {1, 2.., C }, where C represents the total number of categories of pictures we captured;
step three, converting the acquired picture by using four conversion operations of T ═ T, s, r and c; creating an agent class for each picture to obtain an RGB modal agent class training set
Figure RE-FDA0003296540830000014
And depth modal proxy class training set
Figure RE-FDA0003296540830000015
Wherein t represents the vertical and horizontal movement of the picture, s represents the size conversion operation of the picture, r represents the rotation operation of the picture, and c represents the color conversion operation of the picture;
step four, network training process, utilizing the collected RGB modal data
Figure RE-FDA0003296540830000016
Created proxy class
Figure RE-FDA0003296540830000017
Training an RGB network for object recognition; preprocessing the picture input into the RGB training network, and inputting the processed picture into the network to train the RGB network by selectively shielding the most distinguishing area in the picture input into the network;
step five, network training process, utilizingAcquired depth modal data
Figure RE-FDA0003296540830000018
Created proxy class
Figure RE-FDA0003296540830000019
Training a depth network of object recognition, adopting the same preprocessing operation as RGB (red, green and blue) modal data for depth modal data, and inputting the processed pictures into the depth training network for training the depth network;
step six, a network training process, namely fusing the RGB network and the depth network together by a classifier fusion method to form an RGB-D object recognition network;
seventhly, a network reasoning process, namely extracting the characteristics of RGB modal data by utilizing an RGB network in an RGB-D object recognition network;
step eight, extracting features of depth modal data by using a depth network in the RGB-D object recognition network;
step nine, fusing the extracted RGB features and depth features together through fusion of the classifier layer, and recording the fused features as frgbd
Step ten, fusing the characteristics frgbdSending the data to a classifierrgbdIdentifying the object;
the process of extracting the features of the RGB modal data by using the RGB network in the seventh step is as follows: firstly, normalizing the collected pictures, then sending the normalized pictures into a 5-layer convolution network, obtaining a characteristic diagram after convolution and then connecting a pooling layer, and inputting the obtained characteristic diagram into a three-layer full-connection network to obtain a characteristic diagram frgb
The step eight, the process of extracting the features of depth modal data by using the depth network is as follows: firstly, normalizing the collected pictures into the same size, then sending the normalized pictures into a 5-layer convolutional network, obtaining a heat map area of a characteristic map input into the network through a 5-layer multilayer perceptron, and randomly selecting one third of the heat mapsThe shielded picture is input into a pooling layer to obtain a characteristic diagram, and the obtained characteristic diagram is input into a two-layer full-connection network to obtain a characteristic diagram fdepth
2. The RGB-D based object recognition classifier construction method of claim 1, wherein: in the ninth step, the extracted two modal characteristics are fused to construct a fusion characteristic frgbdThe fusion method comprises the following steps: f obtainedrgbAnd fdepthSplicing the materials together according to the channel dimension, and if the materials are convolutional layers, utilizing the formula:
Figure RE-FDA0003296540830000021
to calculate h and w of the feature map after convolution layer, where l represents the l-th layer network, featurelThe first layer is shown as a characteristic diagram, stridelThe step size of the convolution kernel shift is indicated; if it is a pooling layer, then the formula is used
Figure RE-FDA0003296540830000022
H and w of the feature graph after the pooling layer are calculated, wherein kernel represents the pooling kernel size of the pooling layer; h represents the length of the feature map input into the layer network, and w represents the width of the feature map input into the layer network.
3. The RGB-D based object recognition classifier construction method of claim 2, wherein: in the step ten, the classification of the classifier is calculated as follows: the fusion features of a test sample are extracted and input into a classifier, the trained classifier returns C numerical values to the input RGB-D object image through SoftMax, and then the class of the object is predicted by calculating which numerical value is the largest.
CN201810383002.1A 2018-04-26 2018-04-26 Object recognition classifier construction method based on RGB-D Active CN108596256B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810383002.1A CN108596256B (en) 2018-04-26 2018-04-26 Object recognition classifier construction method based on RGB-D

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810383002.1A CN108596256B (en) 2018-04-26 2018-04-26 Object recognition classifier construction method based on RGB-D

Publications (2)

Publication Number Publication Date
CN108596256A CN108596256A (en) 2018-09-28
CN108596256B true CN108596256B (en) 2022-04-01

Family

ID=63609380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810383002.1A Active CN108596256B (en) 2018-04-26 2018-04-26 Object recognition classifier construction method based on RGB-D

Country Status (1)

Country Link
CN (1) CN108596256B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084141B (en) * 2019-04-08 2021-02-09 南京邮电大学 Cross-domain scene recognition method based on private information
CN110348333A (en) * 2019-06-21 2019-10-18 深圳前海达闼云端智能科技有限公司 Object detecting method, device, storage medium and electronic equipment
CN111401426B (en) * 2020-03-11 2022-04-08 西北工业大学 Small sample hyperspectral image classification method based on pseudo label learning
CN115240106B (en) * 2022-07-12 2023-06-20 北京交通大学 Task self-adaptive small sample behavior recognition method and system
CN115496077B (en) * 2022-11-18 2023-04-18 之江实验室 Multimode emotion analysis method and device based on modal observation and grading

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224942A (en) * 2015-07-09 2016-01-06 华南农业大学 A kind of RGB-D image classification method and system
CN106228177A (en) * 2016-06-30 2016-12-14 浙江大学 Daily life subject image recognition methods based on convolutional neural networks
CN106778810A (en) * 2016-11-23 2017-05-31 北京联合大学 Original image layer fusion method and system based on RGB feature Yu depth characteristic
WO2017088125A1 (en) * 2015-11-25 2017-06-01 中国科学院自动化研究所 Dense matching relation-based rgb-d object recognition method using adaptive similarity measurement, and device
CN107194380A (en) * 2017-07-03 2017-09-22 上海荷福人工智能科技(集团)有限公司 The depth convolutional network and learning method of a kind of complex scene human face identification
CN107341440A (en) * 2017-05-08 2017-11-10 西安电子科技大学昆山创新研究院 Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN107423698A (en) * 2017-07-14 2017-12-01 华中科技大学 A kind of gesture method of estimation based on convolutional neural networks in parallel
CN107424205A (en) * 2017-07-11 2017-12-01 北京航空航天大学 A kind of joint estimating method estimated based on image round the clock carrying out three-dimensional facade layout
CN107480704A (en) * 2017-07-24 2017-12-15 南开大学 It is a kind of that there is the real-time vision method for tracking target for blocking perception mechanism
WO2018045363A1 (en) * 2016-09-02 2018-03-08 Gargeya Rishab Screening method for automated detection of vision-degenerative diseases from color fundus images
CN107909065A (en) * 2017-12-29 2018-04-13 百度在线网络技术(北京)有限公司 The method and device blocked for detecting face

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778676A (en) * 2014-01-09 2015-07-15 中国科学院大学 Depth ranging-based moving target detection method and system
CN107085731B (en) * 2017-05-11 2020-03-10 湘潭大学 Image classification method based on RGB-D fusion features and sparse coding
CN107578060B (en) * 2017-08-14 2020-12-29 电子科技大学 Method for classifying dish images based on depth neural network capable of distinguishing areas

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224942A (en) * 2015-07-09 2016-01-06 华南农业大学 A kind of RGB-D image classification method and system
WO2017088125A1 (en) * 2015-11-25 2017-06-01 中国科学院自动化研究所 Dense matching relation-based rgb-d object recognition method using adaptive similarity measurement, and device
CN106228177A (en) * 2016-06-30 2016-12-14 浙江大学 Daily life subject image recognition methods based on convolutional neural networks
WO2018045363A1 (en) * 2016-09-02 2018-03-08 Gargeya Rishab Screening method for automated detection of vision-degenerative diseases from color fundus images
CN106778810A (en) * 2016-11-23 2017-05-31 北京联合大学 Original image layer fusion method and system based on RGB feature Yu depth characteristic
CN107341440A (en) * 2017-05-08 2017-11-10 西安电子科技大学昆山创新研究院 Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN107194380A (en) * 2017-07-03 2017-09-22 上海荷福人工智能科技(集团)有限公司 The depth convolutional network and learning method of a kind of complex scene human face identification
CN107424205A (en) * 2017-07-11 2017-12-01 北京航空航天大学 A kind of joint estimating method estimated based on image round the clock carrying out three-dimensional facade layout
CN107423698A (en) * 2017-07-14 2017-12-01 华中科技大学 A kind of gesture method of estimation based on convolutional neural networks in parallel
CN107480704A (en) * 2017-07-24 2017-12-15 南开大学 It is a kind of that there is the real-time vision method for tracking target for blocking perception mechanism
CN107909065A (en) * 2017-12-29 2018-04-13 百度在线网络技术(北京)有限公司 The method and device blocked for detecting face

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features》;Max Schwarz等;《2015 IEEE International Conference on Robotics and Automation (ICRA)》;20150702;第1050-4729页 *
《RGB-D图像分类方法研究综述》;涂淑琴 等;《激光与光电子学进展》;20161231;第53卷(第06期);第35-48页 *
《自适应红外隐身系统的背景投影建模》;张冬晓 等;《红外与激光工程》;20160325;第45卷(第03期);第36-42页 *

Also Published As

Publication number Publication date
CN108596256A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108596256B (en) Object recognition classifier construction method based on RGB-D
Mascarenhas et al. A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for Image Classification
Jain et al. Hybrid deep neural networks for face emotion recognition
Shao et al. Performance evaluation of deep feature learning for RGB-D image/video classification
Cruz et al. Detection of grapevine yellows symptoms in Vitis vinifera L. with artificial intelligence
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN112784763B (en) Expression recognition method and system based on local and overall feature adaptive fusion
CN108596102B (en) RGB-D-based indoor scene object segmentation classifier construction method
Gammulle et al. Fine-grained action segmentation using the semi-supervised action GAN
CN114821014B (en) Multi-mode and countermeasure learning-based multi-task target detection and identification method and device
CN108009560B (en) Commodity image similarity category judgment method and device
Gonçalves et al. Carcass image segmentation using CNN-based methods
Su et al. LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images
Naseer et al. Pixels to precision: features fusion and random forests over labelled-based segmentation
Ahmed et al. Robust Object Recognition with Genetic Algorithm and Composite Saliency Map
US20240232627A1 (en) Systems and Methods to Train A Cell Object Detector
Fujii et al. Hierarchical group-level emotion recognition in the wild
Ruan et al. Facial expression recognition in facial occlusion scenarios: A path selection multi-network
CN112800979A (en) Dynamic expression recognition method and system based on characterization flow embedded network
CN111612090A (en) Image emotion classification method based on content color cross correlation
Kumar et al. Deep Learning-Based Web Application for Real-Time Apple Leaf Disease Detection and Classification
Wang et al. Strawberry ripeness classification method in facility environment based on red color ratio of fruit rind
CN113591797A (en) Deep video behavior identification method
Huo et al. Modality-convolutions: Multi-modal gesture recognition based on convolutional neural network
Zhang et al. DCNet: Weakly Supervised Saliency Guided Dual Coding Network for Visual Sentiment Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant