CN109903323B

CN109903323B - Training method and device for transparent object recognition, storage medium and terminal

Info

Publication number: CN109903323B
Application number: CN201910167767.6A
Authority: CN
Inventors: 张�成; 龙宇; 王语诗; 蔡自立; 郑子璇; 吉守龙
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-03-06
Filing date: 2019-03-06
Publication date: 2022-11-18
Anticipated expiration: 2039-03-06
Also published as: CN109903323A

Abstract

The invention provides a training method, a device, a storage medium and a terminal for transparent object identification, wherein the method comprises the following steps: s1, establishing a first data set with a plurality of RGB images and a second data set with a plurality of depth images, wherein the RGB images are respectively in one-to-one correspondence with the depth images; s2, establishing a multi-mode fused deep convolutional neural network structure N1; s3, establishing a multi-mode shared deep convolutional network structure N2, inputting the first characteristic information and the second characteristic information into the N2 for fusion training, so as to output classification parameter information and position coordinate information of the object and obtain a network weight model M2; and S4, inputting other multiple pairs of RGB images and depth images again to carry out parameter adjustment on the network weight model M1 and the network weight model M2 so as to obtain optimized network weight models M11 and M22.

Description

Training method and device for transparent object recognition, storage medium and terminal

Technical Field

The invention relates to the technical field of image recognition, in particular to a training method, a training device, a storage medium and a terminal for transparent object recognition.

Background

Nowadays, science and technology are developed at a high speed, and the popularization of industrial robots not only liberates labor force, but also accelerates production speed and improves production quality. Wherein, the introduction of machine vision makes the robot snatch efficiency promotion more. However, for some special articles such as transparent objects, machine vision also has difficulties such as low recognition accuracy or long time consumption.

Since the image of the transparent object is easily influenced by different factors and the like, the stability and accuracy of the single-mode object recognition system are influenced to a certain extent by the factors. The current common method is to increase the main characteristics of the object by changing the environment, but this method is also for objects such as translucency, for example, patent CN104180772a, the identification requirement is the rough surface of the transparent object, and only the flat transparent object can be identified. Meanwhile, the methods are often high in equipment configuration requirements, or complicated in calculation, and cannot meet industrial survival requirements, for example, patent CN102753933B is harsh in setting environment, needs to shield an external light source, has no practical application value in industry, and cannot adapt to variable and complex environments.

Therefore, the prior art has defects and needs to be improved urgently.

Disclosure of Invention

The embodiment of the invention provides a training method, a training device, a storage medium and a terminal for transparent object identification, which can improve the accuracy and efficiency of transparent object identification.

The embodiment of the invention provides a training method for transparent object identification, which comprises the following steps:

s1, establishing a first data set with a plurality of RGB images and a second data set with a plurality of depth images, wherein the RGB images are in one-to-one correspondence with the depth images respectively;

s2, establishing a multi-mode fusion depth convolution neural network structure N1, wherein the N1 is used for extracting a plurality of RGB images for independent training and extracting a plurality of depth images for independent training so as to respectively extract first characteristic information of the RGB images and second characteristic information of the depth images to obtain a network weight model M1;

s3, establishing a multi-mode shared deep convolutional network structure N2, inputting the first characteristic information and the second characteristic information into the N2 for fusion training, so as to output classification parameter information and position coordinate information of the object and obtain a network weight model M2;

and S4, inputting other multiple pairs of RGB images and depth images again to carry out parameter adjustment on the network weight model M1 and the network weight model M2 so as to obtain optimized network weight models M11 and M22.

In the training method for transparent object recognition according to the present invention, the step of establishing a first data set having a plurality of RGB images and a second data set having a plurality of depth images, where the plurality of RGB images respectively correspond to the plurality of depth images one to one includes:

collecting RGB images and depth images of a plurality of objects to be trained, wherein the RGB images correspond to the depth images one by one respectively;

carrying out boundary calibration on an object to be trained in the RGB image, and setting first classification parameter information of the object to be trained and first position coordinate information of the object to be trained in the RGB image;

establishing a first data set with a plurality of RGB images according to the first classification parameter information and the first position coordinate information;

performing boundary calibration on an object to be trained in the depth image according to the corresponding relation between the RGB image and the depth image, and setting second classification parameter information of the object to be trained and second position coordinate information of the object to be trained in the depth image;

and establishing a second data set with a plurality of depth images according to the second classification parameter information and the second position coordinate information.

In the training method for transparent object recognition according to the present invention, the step of establishing a multi-modal fused deep convolutional neural network structure N1, where N1 is used to extract a plurality of RGB images for individual training and a plurality of depth images for individual training, so as to extract first feature information of the RGB images and second feature information of the depth images respectively to obtain a network weight model M1, includes:

establishing a multimode fusion deep convolutional neural network structure N1, wherein the N1 comprises two independent convolutional neural network branches, and the two independent convolutional neural network branches are used for respectively and independently training the RGB image and the depth image; during training, the RGB images and the depth images which correspond to each other are randomly extracted from the first data set and the second data set as input each time, and the first characteristic information of the RGB images and the second characteristic information of the depth images are respectively extracted by using the convolutional neural network to obtain a network weight model M1.

In the training method for transparent object recognition according to the present invention, the RGB image and the depth image corresponding to each other are images of the same object acquired by using a color RGB camera and a depth camera, respectively.

In the training method for transparent object recognition according to the present invention, in step S2, the parameters of each layer are updated by using a reverse pass-back algorithm and by passing back the error of the loss layer, so that the network weight model is updated and optimized, and finally converged.

A training apparatus for transparent object recognition, comprising:

the device comprises a first establishing module, a second establishing module and a third establishing module, wherein the first establishing module is used for establishing a first data set with a plurality of RGB images and a second data set with a plurality of depth images, and the RGB images are respectively in one-to-one correspondence with the depth images;

the second establishing module is used for establishing a multi-mode fused depth convolution neural network structure N1, wherein the N1 is used for extracting a plurality of RGB images for independent training and extracting a plurality of depth images for independent training so as to respectively extract first characteristic information of the RGB images and second characteristic information of the depth images to obtain a network weight model M1;

the third establishing module is used for establishing a multi-modal shared deep convolutional network structure N2, inputting the first characteristic information and the second characteristic information into the N2 for fusion training, so as to output classification parameter information and position coordinate information of an object and obtain a network weight model M2;

and the optimization module is used for re-inputting other pairs of RGB images and depth images to perform parameter adjustment on the network weight model M1 and the network weight model M2 so as to obtain optimized network weight models M11 and M22.

In the training apparatus for transparent object recognition according to the present invention, the first establishing module includes:

the device comprises an acquisition unit, a training unit and a control unit, wherein the acquisition unit is used for acquiring RGB images and depth images of a plurality of objects to be trained, and the RGB images correspond to the depth images one to one respectively;

the first calibration unit is used for carrying out boundary calibration on the object to be trained in the RGB image, and setting first classification parameter information of the object to be trained and first position coordinate information of the object to be trained in the RGB image;

the first establishing unit is used for establishing a first data set with a plurality of RGB images according to the first classification parameter information and the first position coordinate information;

the second calibration unit is used for performing boundary calibration on the object to be trained in the depth image according to the corresponding relation between the RGB image and the depth image, and setting second classification parameter information of the object to be trained and second position coordinate information of the object to be trained in the depth image;

and the second establishing unit is used for establishing a second data set with a plurality of depth images according to the second classification parameter information and the second position coordinate information.

In the training device for transparent object recognition according to the present invention, the RGB image and the depth image corresponding to each other are images of the same object acquired by using a color RGB camera and a depth camera, respectively.

A storage medium having stored therein a computer program which, when run on a computer, causes the computer to perform any of the methods described above.

A terminal comprising a processor and a memory, the memory having stored therein a computer program, the processor being adapted to perform the method of any preceding claim by invoking the computer program stored in the memory.

According to the method, data (RGB images and depth images) of different modes are trained independently, the characteristics of the modes are learned through a series of neural networks, then the characteristics of the modes are complementarily learned through fusion connection and a series of shared convolution layers, and the fusion of the RGB information and the depth information can achieve the effect of improving the transparent object recognition.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can also be derived from them without inventive effort.

Fig. 1 is a schematic flowchart of a training method for transparent object recognition according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a training device for transparent object recognition according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, belong to the scope of protection of the present invention.

The terms "first," "second," "third," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so described are interchangeable under appropriate circumstances. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, or apparatus that includes a series of steps, an advanced driver assistance system, or a system that includes a series of modules or elements is not necessarily limited to those steps or modules or elements expressly listed, may include steps or modules or elements not expressly listed, and may include other steps or modules or elements inherent to such process, method, apparatus, advanced driver assistance system, or system.

Referring to fig. 1, fig. 1 is a flow chart of a training method for transparent object recognition. The training method for transparent object recognition comprises the following steps:

s1, establishing a first data set with a plurality of RGB images and a second data set with a plurality of depth images, wherein the RGB images are in one-to-one correspondence with the depth images.

Wherein pictures of images to be trained taken in a real scene are used as training samples.

Specifically, the step S1 includes:

s11, collecting RGB images and depth images of a plurality of objects to be trained, wherein the RGB images correspond to the depth images one by one; s12, performing boundary calibration on the object to be trained in the RGB image, and setting first classification parameter information of the object to be trained and first position coordinate information of the object to be trained in the RGB image; s13, establishing a first data set with a plurality of RGB images according to the first classification parameter information and the first position coordinate information; s14, performing boundary calibration on the object to be trained in the depth image according to the corresponding relation between the RGB image and the depth image, and setting second classification parameter information of the object to be trained and second position coordinate information of the object to be trained in the depth image; and S15, establishing a second data set with a plurality of depth images according to the second classification parameter information and the second position coordinate information.

The RGB image and the depth image which correspond to each other are images of the same object which are acquired by a color RGB camera and a depth camera respectively.

In step S1, the color RGB module and the depth image module are at different locations of the sensor. Therefore, the acquired image information is different even for the same object at the same time. Since we need to unify the bounding box, we need to perform matrix transformation on the color RGB image and the depth image information to make their coordinates in one-to-one correspondence. Here, two matrices need to be considered, one translation and one rotation.

Assuming that a certain point of the depth image is (X, Y), and its corresponding point on the color RGB image is (X, Y), we can get X = X + dx and Y = Y + dy by translating the matrix, which is expressed as follows.

dx and dy are the distances in which x, y move in their directions, respectively. The translation matrix is represented as follows:

meanwhile, a rotation matrix is also provided, an included angle between a connecting line of a certain point and an original point and an X axis is set as b degrees, the rotation matrix rotates anticlockwise by a degrees by taking the original point as a circle center, the length of the connecting line of the original point and the point is R, [ X, Y ] is a depth image coordinate, and [ X, Y ] is a color RGB image coordinate, so that the following can be obtained:

；

；

。

thus, the rotation matrix can be calculated as follows:

。

s2, establishing a multi-mode fusion deep convolution neural network structure N1, wherein the N1 is used for extracting a plurality of RGB images for independent training and extracting a plurality of depth images for independent training so as to respectively extract first feature information of the RGB images and second feature information of the depth images to obtain a network weight model M1.

Wherein, the parameters of each layer can be updated by adopting a reverse return algorithm and the errors of return loss layers, so that the network weight model is updated and optimized and finally converged.

And S3, establishing a multi-mode shared deep convolutional network structure N2, inputting the first characteristic information and the second characteristic information into the N2 for fusion training, so as to output classification parameter information and position coordinate information of the object and obtain a network weight model M2.

Wherein, the N2 comprises a plurality of convolutional neural networks, then a plurality of fully-connected networks are connected, and the output comprises two parameters, one is the coordinate position parameter of the object, and the other is the classification parameter of the object.

And (3) extracting new data input networks from the data set again by using the trained network weight models M1 and M2, and carrying out parameter fine adjustment on the global network again to realize the hidden relation of input and output. By separately training the two parts of the network, the training time cost can be reduced.

In the present application, the network structure of N1 and N2, including but not limited to convolutional layer, pooling layer, nonlinear function layer, fully-connected layer, normalization layer, and including but not limited to any combination of these layers, is not a scope of protection.

Referring to fig. 2, a training apparatus for transparent object recognition includes: a first establishing module 201, a second establishing module 202, a third establishing module 203 and an optimizing module 204.

The first establishing module 201 is configured to establish a first data set having a plurality of RGB images and a second data set having a plurality of depth images, where the plurality of RGB images are respectively in one-to-one correspondence with the plurality of depth images. The RGB images and the depth images which correspond to each other are images of the same object which are acquired by the color RGB camera and the depth camera respectively.

Wherein, the first establishing module comprises: the device comprises an acquisition unit, a training unit and a control unit, wherein the acquisition unit is used for acquiring RGB images and depth images of a plurality of objects to be trained, and the RGB images correspond to the depth images one by one; the first calibration unit is used for carrying out boundary calibration on the object to be trained in the RGB image, and setting first classification parameter information of the object to be trained and first position coordinate information of the object to be trained in the RGB image; a first establishing unit, configured to establish a first data set with multiple RGB images according to the first classification parameter information and the first position coordinate information; the second calibration unit is used for performing boundary calibration on the object to be trained in the depth image according to the corresponding relation between the RGB image and the depth image, and setting second classification parameter information of the object to be trained and second position coordinate information of the object to be trained in the depth image; and the second establishing unit is used for establishing a second data set with a plurality of depth images according to the second classification parameter information and the second position coordinate information.

The second establishing module 202 is configured to establish a multi-modal fused deep convolutional neural network structure N1, where the N1 is configured to extract a plurality of RGB images for individual training and a plurality of depth images for individual training, so as to extract first feature information of the RGB images and second feature information of the depth images respectively to obtain a network weight model M1. The second establishing module 202 establishes a multi-modal fused deep convolutional neural network structure N1, where N1 includes two independent convolutional neural network branches, and the two independent convolutional neural network branches are used to train the RGB image and the depth image separately; during training, the RGB images and the depth images which correspond to each other are randomly extracted from the first data set and the second data set as input each time, and the first characteristic information of the RGB images and the second characteristic information of the depth images are respectively extracted by using the convolutional neural network to obtain a network weight model M1.

The third establishing module 203 is configured to establish a multi-modal shared deep convolutional network structure N2, input the first feature information and the second feature information into the N2, perform fusion training, output classification parameter information and position coordinate information of an object, and obtain a network weight model M2. Wherein, the N2 comprises a plurality of convolutional neural networks, then a plurality of fully-connected networks are connected, and the output comprises two parameters, one is the coordinate position parameter of the object, and the other is the classification parameter of the object.

The optimization module 204 is configured to re-input other pairs of RGB images and depth images to perform parameter adjustment on the network weight model M1 and the network weight model M2 to obtain optimized network weight models M11 and M22.

Finally, after obtaining the optimized network weight models M11 and M22, the transparent objects can be identified by using the network weight models M11 and M22, and the method has high accuracy and efficiency.

The present invention also provides a storage medium having stored therein a computer program which, when run on a computer, causes the computer to perform the training for transparent object recognition described in any of the above embodiments.

Referring to fig. 3, the present invention further provides a terminal, which includes a processor 301 and a memory 302. The processor 301 is electrically connected to the memory 302.

The processor 301 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by running or calling a computer program stored in the memory 302 and calling data stored in the memory 302, thereby performing overall monitoring of the terminal.

In this embodiment, the processor 301 in the terminal loads instructions corresponding to one or more processes of the computer program into the memory 302 according to the following steps, and the processor 301 runs the computer program stored in the memory 302, thereby implementing various functions: establishing a first data set with a plurality of RGB images and a second data set with a plurality of depth images, wherein the RGB images are respectively in one-to-one correspondence with the depth images; establishing a multi-mode fused deep convolutional neural network structure N1, wherein the N1 is used for extracting a plurality of RGB images for individual training and extracting a plurality of depth images for individual training so as to respectively extract first characteristic information of the RGB images and second characteristic information of the depth images to obtain a network weight model M1; establishing a multi-mode shared deep convolutional network structure N2, inputting the first characteristic information and the second characteristic information into the N2 for fusion training, so as to output classification parameter information and position coordinate information of the object and obtain a network weight model M2; and re-inputting other pairs of RGB images and depth images to perform parameter adjustment on the network weight model M1 and the network weight model M2 to obtain optimized network weight models M11 and M22.

It should be noted that, a person skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be accomplished by related hardware through instructions of a program, and the program can be stored in a computer-readable storage medium, which can include but is not limited to: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, and the like.

The advanced driving assistance-based reminding method, the advanced driving assistance-based reminding device, the advanced driving assistance-based reminding medium and the advanced driving assistance system provided by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A training method for transparent object recognition, comprising the steps of:

s1, establishing a first data set with a plurality of RGB images and a second data set with a plurality of depth images, wherein the RGB images are respectively in one-to-one correspondence with the depth images;

2. The method as claimed in claim 1, wherein the step of creating a first data set with a plurality of RGB images and a second data set with a plurality of depth images, the RGB images corresponding to the depth images one-to-one respectively comprises:

3. The training method for transparent object recognition according to claim 1, wherein the step of establishing a multi-modal fused deep convolutional neural network structure N1, where N1 is used for extracting a plurality of RGB images for individual training and a plurality of depth images for individual training, so as to respectively extract first feature information of the RGB images and second feature information of the depth images to obtain a network weight model M1, includes:

establishing a multi-mode fused deep convolutional neural network structure N1, wherein the N1 comprises two independent convolutional neural network branches, and the two independent convolutional neural network branches are used for respectively and independently training the RGB image and the depth image; during training, the RGB images and the depth images which correspond to each other are randomly extracted from the first data set and the second data set as input each time, and the first characteristic information of the RGB images and the second characteristic information of the depth images are respectively extracted by using the convolutional neural network to obtain a network weight model M1.

4. A training method for transparent object recognition according to claim 1, wherein the mutually corresponding RGB image and depth image are images of the same object captured by a color RGB camera and a depth camera, respectively.

5. Training method for transparent object recognition according to claim 1, characterized in that in step S2, the parameters of each layer are updated by the feedback lost layer error using the inverse feedback algorithm, so that the network weight model is updated and optimized, and finally converged.

6. A training apparatus for transparent object recognition, comprising:

7. The training device for transparent object recognition according to claim 6, wherein the first establishing module comprises:

the device comprises an acquisition unit, a training unit and a control unit, wherein the acquisition unit is used for acquiring RGB images and depth images of a plurality of objects to be trained, and the RGB images correspond to the depth images one by one;

the first calibration unit is used for calibrating the boundary of an object to be trained in the RGB image, and setting first classification parameter information of the object to be trained and first position coordinate information of the object to be trained in the RGB image;

8. Training device for transparent object recognition according to claim 6, wherein the mutually corresponding RGB images and depth images are images of the same object captured with a color RGB camera and a depth camera, respectively.

9. A storage medium, having stored thereon a computer program which, when run on a computer, causes the computer to carry out the method of any one of claims 1 to 5.

10. A terminal, characterized in that it comprises a processor and a memory, in which a computer program is stored, the processor being adapted to carry out the method of any one of claims 1 to 5 by calling the computer program stored in the memory.