CN109903323A

CN109903323A - Training method, device, storage medium and terminal for transparent substance identification

Info

Publication number: CN109903323A
Application number: CN201910167767.6A
Authority: CN
Inventors: 张�成; 龙宇; 王语诗; 蔡自立; 郑子璇; 吉守龙
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-03-06
Filing date: 2019-03-06
Publication date: 2019-06-18
Anticipated expiration: 2039-03-06
Also published as: CN109903323B

Abstract

The present invention provides a kind of training method, device, storage medium and terminals for transparent substance identification, method includes the following steps: S1, first data set of the foundation with multiple RGB images and the second data set with multiple depth images, multiple RGB images are corresponded with multiple depth images respectively；S2, the depth convolutional neural networks structure N1 for establishing multi-modal fusion；S3, multi-modal shared depth convolutional network structure N2 is established, N2 described in the fisrt feature information and the second feature information input is subjected to Fusion training, to export the sorting parameter information of object and location coordinate information and obtain network weight model M 2；S4, other multipair RGB images and depth image are re-entered parameter regulations is carried out to the network weight model M 1 and 2 pairs of the network weight model M with the network weight model M 11 and M22 after being optimized.

Description

Training method, device, storage medium and terminal for transparent substance identification

Technical field

The present invention relates to image identification technical field, in particular to a kind of training method for transparent substance identification, dress It sets, storage medium and terminal.

Background technique

Nowadays scientific and technological high speed development, the universal of industrial robot have not only liberated labour, have also speeded up speed of production, mentioned The high quality of production.Wherein, the introducing of machine vision more makes robot grab improved efficiency.But for certain special Article such as transparent substance, there is also accuracy of identification is not high or the difficulties such as time-consuming for machine vision.

Since the image of transparent substance is easy to be influenced by different factors etc., these factors affect to a certain extent The stability and accuracy of single mode object identification system.Currently used method is that it is main to increase object by changing environment Feature, but that this kind of methods are directed to and translucent etc object, such as patent CN104180772A, the requirement of identification It is the rough surface of transparent substance, and can only identifies plate transparent substance.Meanwhile often these method equipment configratioin requirements are high, or Person's calculating complexity is all difficult to meet industrial existence requirement, and such as patent CN102753933B, setting environment is harsh, needs to shield External light source does not have practical application value industrially, can not adapt to changeable complex environment.

Therefore, the prior art is defective, needs to improve.

Summary of the invention

The embodiment of the present invention provides a kind of training method, device, storage medium and terminal for transparent substance identification, can To improve the accuracy and efficiency of transparent substance identification.

The embodiment of the present invention provides a kind of training method for transparent substance identification, comprising the following steps:

S1, the first data set with multiple RGB images and the second data set with multiple depth images are established, this multiple RGB image is corresponded with multiple depth images respectively；

S2, depth convolutional neural networks the structure N1, the N1 for establishing multi-modal fusion are carried out individually for extracting multiple RGB images It trains and extracts multiple depth images and individually trained, to extract the fisrt feature information and depth of RGB image respectively The second feature information of image and obtain network weight model M 1；

S3, multi-modal shared depth convolutional network structure N2 is established, by the fisrt feature information and the second feature N2 described in information input carries out Fusion training, to export the sorting parameter information of object and location coordinate information and obtain network Weight model M2；

S4, other multipair RGB images and depth image are re-entered to the network weight model M 1 and the network weight 2 pairs of model M carry out parameter regulations with the network weight model M 11 and M22 after being optimized.

In the training method of the present invention for transparent substance identification, described establish has multiple RGB images First data set and the second data set with multiple depth images, multiple RGB images respectively with multiple depth images The step of one-to-one correspondence includes:

Acquire the RGB image and depth image of multiple objects to be trained, multiple RGB images respectively with multiple depth images It corresponds；

Boundary demarcation is carried out to the object to be trained in the RGB image, and the first classification ginseng of the object to be trained is set The first position coordinate information of number information and the object to be trained in the RGB image；

First number with multiple RGB images is established according to the first sorting parameter information, the first position coordinate information According to collection；

Side is carried out to the object to be trained in the depth image according to the corresponding relationship of the RGB image and the depth image Boundary mark is fixed, and the second sorting parameter information that the object to be trained is arranged and the object to be trained are in the depth image In second position coordinate information；

Second number with multiple depth images is established according to the second sorting parameter information, the second position coordinate information According to collection.

In the training method of the present invention for transparent substance identification, the depth volume for establishing multi-modal fusion Product neural network structure N1, the N1 carry out individually training and extracting the progress of multiple depth images for extracting multiple RGB images Individually training, with extract respectively RGB image fisrt feature information and depth image second feature information and obtain network The step of weight model M1 includes:

Depth convolutional neural networks the structure N1, the N1 for establishing multi-modal fusion include two independent convolutional neural networks point Branch, this two independent convolutional neural networks branches are for being individually trained the RGB image and depth image；Its In, in training, mutual corresponding RGB image and depth image are randomly selected from the first data set and the second data set every time As input, the fisrt feature information of RGB image and the second feature of depth image are extracted respectively using convolutional neural networks Information and obtain network weight model M 1.

In the training method of the present invention for transparent substance identification, the mutual corresponding RGB image and depth Degree image is the image that the same object of colored RGB camera and depth camera acquisition is respectively adopted.

In the training method of the present invention for transparent substance identification, in the step S2, returned using reversed Propagation algorithm and the error by returning loss layer update each layer of parameter, so that network weight model is able to update optimization, Final convergence.

A kind of training device for transparent substance identification, comprising:

First establishes module, for establishing the first data set with multiple RGB images and with multiple depth images Two data sets, multiple RGB images are corresponded with multiple depth images respectively；

Second establishes module, for establishing depth convolutional neural networks the structure N1, the N1 of multi-modal fusion for extracting multiple RGB image individually train and extract multiple depth images individually being trained, special with extract RGB image respectively first Reference breath and depth image second feature information and obtain network weight model M 1；

Third establishes module, for establishing multi-modal shared depth convolutional network structure N2, by the fisrt feature information with And N2 described in the second feature information input carries out Fusion training, to export the sorting parameter information and position coordinates of object Information simultaneously obtains network weight model M 2；

Optimization module, for re-entering other multipair RGB images and depth image to the network weight model M 1 and institute It states 2 pairs of network weight model M and carries out parameter regulations with the network weight model M 11 and M22 after being optimized.

In the training device of the present invention for transparent substance identification, described first, which establishes module, includes:

Acquisition unit, for acquiring the RGB image and depth image of multiple objects to be trained, multiple RGB images respectively with Multiple depth images correspond；

First calibration unit for carrying out boundary demarcation to the object to be trained in the RGB image, and is arranged described wait train First position coordinate information of the first sorting parameter information and the object to be trained of object in the RGB image；

First establishing unit, for being established according to the first sorting parameter information, the first position coordinate information with more Open the first data set of RGB image；

Second calibration unit, for the corresponding relationship according to the RGB image and the depth image in the depth image Object to be trained carry out boundary demarcation, and the second sorting parameter information of the object to be trained and described wait train is set Second position coordinate information of the object in the depth image；

Second establishes unit, for being established according to the second sorting parameter information, the second position coordinate information with more Open the second data set of depth image.

In the training device of the present invention for transparent substance identification, the mutual corresponding RGB image and depth Degree image is the image that the same object of colored RGB camera and depth camera acquisition is respectively adopted.

A kind of storage medium is stored with computer program in the storage medium, when the computer program is in computer When upper operation, so that the computer executes method described in any of the above embodiments.

A kind of terminal, including processor and memory, computer program is stored in the memory, and the processor is logical The computer program for calling and storing in the memory is crossed, for executing method described in any of the above embodiments.

The present invention passes through a series of nerves by allowing the data (RGB image and depth image) of different modalities individually to train Network, study arrive the feature of mode itself, are then got up by fusion connection, by a series of shared convolutional layers, to each mode Feature carry out complementary study, the fusion of RGB information and depth information, which can achieve, promotes the effect that transparent substance identifies.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described.It should be evident that drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is the flow diagram of the training method provided in an embodiment of the present invention for transparent substance identification.

Fig. 2 is the structural schematic diagram of the training device provided in an embodiment of the present invention for transparent substance identification.

Fig. 3 is the structural schematic diagram of terminal provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description.Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained under that premise of not paying creative labor Example, belongs to protection scope of the present invention.

Description and claims of this specification and term " first " in above-mentioned attached drawing, " second ", " third " etc. (if present) is to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be appreciated that this The object of sample description is interchangeable under appropriate circumstances.In addition, term " includes " and " having " and their any deformation, meaning Figure, which is to cover, non-exclusive includes.For example, containing the process, method of series of steps or containing a series of modules or list The device of member, advanced DAS (Driver Assistant System), system those of are not necessarily limited to be clearly listed step or module or unit, can be with Include the steps that being not clearly listed or module or unit, also may include for these process, methods, device, advanced auxiliary The other steps or module or unit for helping control loop or system intrinsic.

With reference to Fig. 1, Fig. 1 is a kind of flow chart of training method for transparent substance identification.This is used for transparent substance knowledge Other training method, comprising the following steps:

S1, the first data set with multiple RGB images and the second data set with multiple depth images are established, this multiple RGB image is corresponded with multiple depth images respectively.

Wherein, use the picture to training image that shoots in real scene as training sample.

Specifically, step S1 includes:

S11, the RGB image and depth image for acquiring multiple objects to be trained, multiple RGB images respectively with multiple depth Image corresponds；S12, boundary demarcation is carried out to the object to be trained in the RGB image, and the object to be trained is set First position coordinate information in the RGB image of the first sorting parameter information and the object to be trained；S13, root First data set with multiple RGB images is established according to the first sorting parameter information, the first position coordinate information； S14, side is carried out to the object to be trained in the depth image according to the corresponding relationship of the RGB image and the depth image Boundary mark is fixed, and the second sorting parameter information that the object to be trained is arranged and the object to be trained are in the depth image In second position coordinate information；S15, tool is established according to the second sorting parameter information, the second position coordinate information There is the second data set of multiple depth images.

Wherein, mutual corresponding RGB image and depth image are that colored RGB camera is respectively adopted to acquire with depth camera The image of the same object.

In step sl, colored RGB block and depth image module are in the different location of sensor.So even In the same time to same object, acquired image information is all different.Since we need to unitize to bounding box, So needing color RGB image and deep image information carrying out matrixing, correspond its coordinate.Here, it needs to examine Consider two matrixes, one is translation matrix, and one is spin matrix.

It is assumed that depth image certain point is (x, y), corresponding point is that (X, Y) therefore passes through on color RGB image Translation matrix, we are available X=x+dx, and Y=y+dy is expressed as follows with matrix.

Dx and dy is x, the distance that y is moved in the direction respectively.Translation matrix is expressed as follows:

Meanwhile there are one spin matrixs, if certain point and origin line and X-axis angle are b degree, using origin as the center of circle, counterclockwise A degree is turned over, origin and the wire length are R, and [x, y] is depth image coordinate, and [X, Y] is color RGB image coordinate, because This, can obtain:

；

。

It is consequently possible to calculate it is as follows to obtain spin matrix:

。

S2, depth convolutional neural networks the structure N1, the N1 of multi-modal fusion are established for extracting the progress of multiple RGB images Individually training and extract multiple depth images and individually trained, with extract respectively RGB image fisrt feature information and The second feature information of depth image and obtain network weight model M 1.

Wherein, step S2 includes: the depth convolutional neural networks structure N1 for establishing multi-modal fusion, and the N1 includes two A independent convolutional neural networks branch, this two independent convolutional neural networks branches are for individually scheming the RGB Picture and depth image are trained；Wherein, it in training, is randomly selected from the first data set and the second data set every time mutually Corresponding RGB image and depth image extract the fisrt feature information of RGB image using convolutional neural networks as input respectively And depth image second feature information and obtain network weight model M 1.

Wherein it is possible to update each layer of parameter using reversed passback algorithm and the error by returning loss layer, make Network weight model is obtained to be able to update optimization, it is final to restrain.

S3, multi-modal shared depth convolutional network structure N2 is established, by the fisrt feature information and described second Characteristic information inputs the N2 and carries out Fusion training, to export the sorting parameter information of object and location coordinate information and obtain Network weight model M 2.

Wherein, which includes multiple convolutional neural networks, then connects multiple fully-connected networks, and output includes two ginsengs Number, one be object coordinate position parameter, one be object sorting parameter.

S4, other multipair RGB images and depth image are re-entered to the network weight model M 1 and the network Weight model M2 is to carrying out parameter regulation with the network weight model M 11 and M22 after being optimized.

Using trained network weight model M 1 and M2, new data input network, weight are extracted from data set again Small parameter perturbations newly are carried out to global network, realize the hiding relationship for finding input and output.It is separately trained by two parts network, Training time cost can be reduced.

In this application, the network structure of N1 and N2, including it is singly not limited to convolutional layer, pond layer, nonlinear function layer, entirely Articulamentum normalizes layer, and includes but is not limited to any combination of these layers, and the structure of network is not the range of protection.

Referring to figure 2., a kind of training device for transparent substance identification, comprising: first establishes module 201, second builds Formwork erection block 202, third establish module 203 and optimization module 204.

Wherein, this first establishes module 201 for establishing the first data set with multiple RGB images and with multiple Second data set of depth image, multiple RGB images are corresponded with multiple depth images respectively.Wherein, it corresponds to each other RGB image and depth image be respectively adopted colored RGB camera and depth camera acquisition the same object image.

Wherein, this first to establish module include: acquisition unit, for acquire multiple objects to be trained RGB image and Depth image, multiple RGB images are corresponded with multiple depth images respectively；First calibration unit, for the RGB Object to be trained in image carries out boundary demarcation, and the first sorting parameter information of the object to be trained and described is arranged First position coordinate information of the object to be trained in the RGB image；First establishing unit, for according to first classification Parameter information, the first position coordinate information establish first data set with multiple RGB images；Second calibration unit, is used In the corresponding relationship according to the RGB image and the depth image to object the to be trained progress boundary in the depth image Calibration, and the second sorting parameter information that the object to be trained is arranged and the object to be trained are in the depth image Second position coordinate information；Second establishes unit, for according to the second sorting parameter information, the second position coordinate Information establishes second data set with multiple depth images.

Wherein, this second establishes module 202 for establishing depth convolutional neural networks the structure N1, the N1 of multi-modal fusion Individually train and extract multiple depth images individually being trained for extracting multiple RGB images, to extract RGB respectively The fisrt feature information of image and the second feature information of depth image and obtain network weight model M 1.Wherein, this second Establishing module 202 and establishing depth convolutional neural networks the structure N1, the N1 of multi-modal fusion includes two independent convolution minds Through network branches, this two independent convolutional neural networks branches for individually to the RGB image and depth image into Row training；Wherein, in training, mutual corresponding RGB image is randomly selected from the first data set and the second data set every time With depth image as inputting, the fisrt feature information and depth image of RGB image are extracted respectively using convolutional neural networks Second feature information and obtain network weight model M 1.

Wherein, which establishes module 203 for establishing multi-modal shared depth convolutional network structure N2, by described the N2 described in one characteristic information and the second feature information input carries out Fusion training, to export the sorting parameter information of object And location coordinate information and obtain network weight model M 2.Wherein, which includes multiple convolutional neural networks, is then connected more A fully-connected network, output include two parameters, one be object coordinate position parameter, one be object sorting parameter.

Wherein, the optimization module 204 is for re-entering other multipair RGB images and depth image to the network weight Model M 1 and 2 pairs of the network weight model M carry out parameter regulations with after being optimized network weight model M 11 and M22。

Finally, after being optimized network weight model M 11 and M22 after, can use the network weight model M 11 And M22 identifies transparent substance, and has high accuracy and high efficiency.

The present invention also provides a kind of storage medium, it is stored with computer program in the storage medium, when the calculating When machine program is run on computers, identified described in any of the above-described embodiment for transparent substance so that the computer executes Training.

Referring to figure 3., the present invention also provides a kind of terminal, terminal includes processor 301 and memory 302.Wherein, locate It manages device 301 and memory 302 is electrically connected.

Processor 301 is the control centre of terminal, using the various pieces of various interfaces and the entire terminal of connection, is led to It crosses operation or calls the computer program being stored in memory 302, and call the data being stored in memory 302, hold The various functions and processing data of row terminal, to carry out integral monitoring to terminal.

In the present embodiment, the processor 301 in terminal can be according to following step, by one or more calculating The corresponding instruction of the process of machine program is loaded into memory 302, and runs storage in the memory 302 by processor 301 Computer program, to realize various functions: establishing the first data set with multiple RGB images and with multiple depth Second data set of image, multiple RGB images are corresponded with multiple depth images respectively；Establish the depth of multi-modal fusion Convolutional neural networks structure N1 is spent, which carries out individually training and extracting multiple depth images for extracting multiple RGB images Individually trained, with extract respectively RGB image fisrt feature information and depth image second feature information and obtain Network weight model M 1；Multi-modal shared depth convolutional network structure N2 is established, by the fisrt feature information and described N2 described in second feature information input carry out Fusion training, with export object sorting parameter information and location coordinate information simultaneously Obtain network weight model M 2；Re-enter other multipair RGB images and depth image to the network weight model M 1 and 2 pairs of the network weight model M carry out parameter regulations with the network weight model M 11 and M22 after being optimized.

It should be noted that those of ordinary skill in the art will appreciate that whole in the various methods of above-described embodiment or Part steps are relevant hardware can be instructed to complete by program, which can store in computer-readable storage medium In matter, which be can include but is not limited to: read-only memory (ROM, Read Only Memory), random access memory Device (RAM, Random Access Memory), disk or CD etc..

Be provided for the embodiments of the invention above based on it is advanced drive auxiliary based reminding method, device, storage medium and Advanced DAS (Driver Assistant System) is described in detail, specific case used herein to the principle of the present invention and embodiment into Elaboration is gone, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas；Meanwhile for this The technical staff in field, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up Described, the contents of this specification are not to be construed as limiting the invention.

Claims

1. a kind of training method for transparent substance identification, which comprises the following steps:

2. the training method according to claim 1 for transparent substance identification, which is characterized in that described to establish with more It opens the first data set of RGB image and the second data set with multiple depth images, multiple RGB images is more with this respectively Opening the step of depth image corresponds includes:

3. it is according to claim 1 for transparent substance identification training method, which is characterized in that it is described establish it is multi-modal Depth convolutional neural networks the structure N1, the N1 of fusion individually train and extract multiple for extracting multiple RGB images Depth image is individually trained, to extract the fisrt feature information of RGB image and the second feature letter of depth image respectively Ceasing the step of obtaining network weight model M 1 includes:

4. the training method according to claim 1 for transparent substance identification, which is characterized in that described mutual corresponding RGB image and depth image are the image that the same object of colored RGB camera and depth camera acquisition is respectively adopted.

5. the training method according to claim 1 for transparent substance identification, which is characterized in that in the step S2 In, each layer of parameter is updated using reversed passback algorithm and the error by returning loss layer, so that network weight model It is able to update optimization, it is final to restrain.

6. a kind of training device for transparent substance identification characterized by comprising

7. the training device according to claim 6 for transparent substance identification, which is characterized in that described first establishes mould Block includes:

8. the training device according to claim 6 for transparent substance identification, which is characterized in that described mutual corresponding RGB image and depth image are the image that the same object of colored RGB camera and depth camera acquisition is respectively adopted.

9. a kind of storage medium, which is characterized in that computer program is stored in the storage medium, when the computer program When running on computers, so that the computer perform claim requires 1 to 5 described in any item methods.

10. a kind of terminal, which is characterized in that including processor and memory, computer program, institute are stored in the memory Processor is stated by calling the computer program stored in the memory, requires any one of 1 to 5 institute for perform claim The method stated.