CN116310622A - Method and system for accurately identifying tray based on deep learning - Google Patents

Method and system for accurately identifying tray based on deep learning Download PDF

Info

Publication number
CN116310622A
CN116310622A CN202211616543.7A CN202211616543A CN116310622A CN 116310622 A CN116310622 A CN 116310622A CN 202211616543 A CN202211616543 A CN 202211616543A CN 116310622 A CN116310622 A CN 116310622A
Authority
CN
China
Prior art keywords
tray
layer
point cloud
deep learning
trays
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211616543.7A
Other languages
Chinese (zh)
Inventor
邹家帅
昝学彦
李发频
李飞军
张四龙
李家钧
蒋干胜
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Makerwit Technology Co ltd
Original Assignee
Zhuhai Makerwit Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Makerwit Technology Co ltd filed Critical Zhuhai Makerwit Technology Co ltd
Priority to CN202211616543.7A priority Critical patent/CN116310622A/en
Publication of CN116310622A publication Critical patent/CN116310622A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for accurately identifying a tray based on deep learning, wherein the method comprises the following steps: collecting the trays by using an image collecting device to obtain depth images and color images of a plurality of trays and aligning the depth images and the color images; the position of the tray in the color image is marked and then used as a deep learning training data set and is input into a neural network; the neural network recognizes the coordinates of the tray in the color image in a deep learning mode, and obtains the position of the tray in the depth image, and the outline dimension of the tray is input at the position to construct a standard tray point cloud set; and performing ICP point cloud matching on the actual tray point cloud set and the standard tray point cloud set which are currently acquired by the image acquisition device, and acquiring the position and angle of the target tray relative to the virtual tray, thereby obtaining the pose of the target tray relative to the image acquisition device. The method is used for solving the technical problem that misjudgment is easy to occur when the tray is identified by adopting the detection method based on the point cloud plane contour matching.

Description

Method and system for accurately identifying tray based on deep learning
Technical Field
The invention relates to the technical field of image recognition, in particular to a method and a system for accurately recognizing a tray based on deep learning.
Background
The detection of the tray is a key step of carrying goods by the storage robot, and aiming at the problems that the current detection method is low in illumination robustness, is constrained by the relative pose between the tray and the sensor and the like, a detection method based on point cloud plane contour matching is provided. According to the method, a TOF (Time-of-Flight) camera is used for collecting point clouds, the point clouds are preprocessed, a region growing algorithm with normal lines as constraints is used for carrying out plane segmentation, a grid diagram is generated through projection along the direction of the main normal line of the point clouds, the problem that the point clouds are constrained by relative pose is solved, finally after contour extraction is carried out on the grid diagram, matching of a target and a template is carried out by utilizing contour features fused with Hu invariant moment and scale proportion features, and detection of a tray is achieved.
However, since the TOF camera outputs depth point cloud data, the constructed image is a gray and black combined image, misjudgment is easy to occur by simply using the depth camera, for example, a person standing on the edge of a tray may identify the person's leg and the tray together, so that an incorrect tray pose is calculated, and an object other than the tray may be identified as the tray.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method and a system for accurately identifying a tray based on deep learning, which are used for solving the technical problem that misjudgment is easy to occur when the tray is identified by adopting a detection method based on point cloud plane contour matching, thereby achieving the purpose of improving the accuracy when the tray is identified.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
a method for accurately identifying a tray based on deep learning comprises the following steps:
acquiring the trays by using an image acquisition device to obtain depth images and color images of a plurality of trays, and aligning the depth images and the color images;
after marking the positions of the trays in the color images, taking the marked color images as a deep learning training data set, and inputting the deep learning training data set into a neural network;
the neural network identifies the coordinates of the tray in the color image in a deep learning mode;
obtaining the position of the tray in the depth image according to the coordinates of the tray, inputting the shape size of the tray to construct a standard tray point cloud set;
performing ICP point cloud matching on the actual tray point cloud set and the standard tray point cloud set which are currently acquired by the image acquisition device, and acquiring the position and the angle of a target tray relative to a virtual tray, so as to obtain the pose of the target tray relative to the image acquisition device;
the actual tray point cloud set comprises target trays to be identified, and the standard tray point cloud set comprises virtual trays constructed according to the positions of the trays and the outline dimensions of the trays.
As a preferred embodiment of the present invention, when the neural network recognizes the coordinates of the tray in the color image by means of deep learning, the method includes:
aligning an input color image into RGB pictures with 640 x 640 size through an input layer, and inputting the RGB pictures into a backlight layer;
the backup layer performs feature extraction on the RGB picture and outputs three layers of feature images with different size to the head layer;
the head layer extracts and detects the features of the three feature images with different size again to obtain the coordinates of the target tray;
the neural network comprises an input layer, a backup layer and a head layer.
As a preferred embodiment of the present invention, when the input layer aligns an input color image, it includes:
and performing self-adaptive size processing on the input deep learning training data set, adjusting the RGB picture with the size of 1280 x 1280, reducing the size of the deep learning training data set to 640 x 640 by using a 16-layer convolution module, performing normalization processing and alignment, activating by an activation function, and then sending the deep learning training data set to the backstone layer.
As a preferred embodiment of the present invention, when the backup layer performs feature extraction on the RGB picture, the method includes:
the BConv layer receives the RGB picture, performs feature extraction through a convolution layer, performs acceleration convergence by utilizing a BN layer, and inputs the RGB picture into an alternate E-ELAN layer and an MPConv layer after being activated by adopting an activation function, and outputs three feature images with different size through the alternate E-ELAN layer and the MPConv layer;
the backup layer comprises a BConv layer, an E-ELAN layer and an MPConv layer, wherein the BConv layer consists of a convolution layer, a BN layer and an activation function.
As a preferred embodiment of the present invention, when the head layer performs feature extraction and detection, the method includes:
and the head layer carries out feature extraction on the three feature images with different sizes output by the back bone layer through the SPPCPC layer, the plurality of BConv layers, the plurality of MPConv layers and the plurality of Catconv layers, and outputs the three feature images with different sizes again, and after detection is carried out on the three RepVGG block layers and the three conv layers respectively, the coordinates of the target tray are obtained.
As a preferred embodiment of the present invention, when acquiring a position of a target tray with respect to a virtual tray, the method includes:
the standard tray point cloud set and the actual tray point cloud set are constrained according to a certain constraint condition, and the constraint method is specifically shown as a formula 1 and a formula 2:
Figure BDA0004000425510000031
Figure BDA0004000425510000032
in the method, in the process of the invention,
Figure BDA0004000425510000041
for a single point of the standard tray point cloud, +.>
Figure BDA0004000425510000042
For standard tray point clouds, +.>
Figure BDA0004000425510000043
Is->
Figure BDA0004000425510000044
Centroid of->
Figure BDA0004000425510000045
For a single point of the actual tray point cloud, +.>
Figure BDA0004000425510000046
For the actual tray point cloud, +.>
Figure BDA0004000425510000047
Is->
Figure BDA0004000425510000048
Is a centroid of (c).
In a preferred embodiment of the present invention, when acquiring the position of the target tray relative to the virtual tray, the method further includes:
according to the constraint condition, a first loss function equation is established, and the first loss function equation is specifically shown as a formula 3:
Figure BDA0004000425510000049
wherein R is a rotation matrix, and t is a translation matrix;
let N be the total number of point clouds |P s And (3) deriving the first loss function equation to enable the derivative to be 0, and obtaining a coordinate equation, wherein the coordinate equation is specifically shown as a formula 4:
Figure BDA00040004255100000410
an optimal t, i.e. the coordinates (X, Y, Z) of the target tray with respect to the virtual tray, is obtained from the coordinate equation.
As a preferred embodiment of the present invention, when acquiring an angle of a target tray with respect to a virtual tray, the method includes:
without considering translation, a second loss function equation is established as shown in equation 5:
Figure BDA00040004255100000411
wherein R is a rotation matrix,
Figure BDA00040004255100000412
centroid for standard tray point cloud, +.>
Figure BDA00040004255100000413
Centroid that is the actual tray point cloud;
in the second loss function equation by relation 6 and relation 7
Figure BDA00040004255100000414
And simplifying to obtain a simplified relational expression, wherein the simplified relational expression is specifically shown in a formula 8:
R T R=I (6);
Figure BDA00040004255100000415
Figure BDA0004000425510000051
wherein, the superscript T is the transposed matrix of the matrix, and I is R T Itself, the method comprises the steps of;
since the coordinates (X, Y, Z) of the pallet are determined independently of R, by finding
Figure BDA0004000425510000052
Minimizing the second loss function equation, as shown in equation 9:
Figure BDA0004000425510000053
as a preferred embodiment of the present invention, when acquiring the angle of the target tray with respect to the virtual tray, further comprising:
the equation 9 is transformed according to equation 10 as shown in equation 11:
Figure BDA0004000425510000054
R * =argmax R trace(P t T RP s ) (11);
by taking advantage of the properties of trace, the trace (P t T RP s ) The conversion is performed as shown in formula 12:
Figure BDA0004000425510000055
wherein V is P s P t T Is the orthogonal matrix of U is P t T P s R is a diagonal matrix, V T RU is P t T RP s Is a matrix of orthogonality;
the formula 12 is converted by using a matrix relation 13, wherein the matrix relation is shown in formula 13, and the conversion process is shown in formula 14:
Figure BDA0004000425510000061
trace(∑V T RU)=trace(∑M)
=σ 1 m 112 m 221 m 33 (14):
wherein M is a feature vector matrix;
let M be a unit array, and maximize trace (ΣM) to obtain the angle of the target tray relative to the virtual tray, specifically as shown in formula 15, formula 16 and formula 17:
V T RU=I (15);
R=VU T (16);
R * =VU T (17)。
wherein R is * Is the angle of the target tray relative to the virtual tray.
A system for accurately identifying a tray based on deep learning, comprising:
training data set construction unit: the image acquisition device is used for acquiring the trays to obtain depth images and color images of a plurality of trays, and aligning the depth images and the color images; after marking the positions of the trays in the color images, taking the marked color images as a deep learning training data set, and inputting the deep learning training data set into a neural network;
standard tray point cloud set construction unit: the method comprises the steps that coordinates of a tray in a color image are identified through a neural network in a deep learning mode; obtaining the position of the tray in the depth image according to the coordinates of the tray, inputting the shape size of the tray to construct a standard tray point cloud set;
tray identification unit: the method comprises the steps of performing ICP point cloud matching on an actual tray point cloud set and a standard tray point cloud set which are currently acquired by an image acquisition device, and acquiring the position and the angle of a target tray relative to a virtual tray, so that the pose of the target tray relative to the image acquisition device is obtained;
the actual tray point cloud set comprises target trays to be identified, and the standard tray point cloud set comprises virtual trays constructed according to the positions of the trays and the outline dimensions of the trays.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention eliminates the problem of false tray identification caused by the pure depth map by using a neural network to carry out deep learning;
(2) In the process of identifying and positioning the tray, the coordinates of the tray in the color image are identified by the color image through a deep learning mode, so that misidentification caused by human interference or other objects is eliminated, the position of the tray in the depth image can be roughly known after the position of the color image is known, and the accuracy of tray identification is improved.
The invention is described in further detail below with reference to the drawings and the detailed description.
Drawings
FIG. 1 is a diagram of the steps of a method for accurately identifying a tray based on deep learning in accordance with an embodiment of the present invention;
FIG. 2-is a network architecture diagram of a YOLOv7 neural network of an embodiment of the present invention;
FIG. 3-is a network architecture diagram of a Yolov7 neural network backbone layer, according to an embodiment of the present invention;
FIG. 4-is a network structure diagram of the BConv layer of the backhaul layer of an embodiment of the present invention;
FIG. 5-is a network architecture diagram of an E-ELAN layer of a backhaul layer according to an embodiment of the present invention;
FIG. 6-is a network structure diagram of the MPConv layer of the backhaul layer of an embodiment of the present invention;
FIG. 7-is a network architecture diagram of the YOLOv7 neural network head layer of an embodiment of the invention;
FIG. 8-is a network structure diagram of SPPCSPC layer of the head layer of an embodiment of the present invention;
FIG. 9-is a network structure diagram of the Catconv layer of the head layer of an embodiment of the present invention;
FIG. 10 is a network architecture diagram of the RepVGG block layer of the head layer of an embodiment of the invention.
Detailed Description
The method for accurately identifying the tray based on the deep learning provided by the invention, as shown in fig. 1, comprises the following steps:
step S1: acquiring the trays by using an image acquisition device to obtain depth images and color images of a plurality of trays, and aligning the depth images and the color images;
step S2: after marking the positions of the trays in the color images, taking the marked color images as a deep learning training data set, and inputting the deep learning training data set into a neural network;
step S3: the neural network identifies the coordinates of the tray in the color image in a deep learning mode;
step S4: obtaining the position of the tray in the depth image according to the coordinates of the tray, inputting the outline dimension of the tray into the position of the tray in the depth image, and constructing a standard tray point cloud set;
step S5: performing ICP point cloud matching on the actual tray point cloud set and the standard tray point cloud set which are currently acquired by the image acquisition device, and acquiring the position and the angle of the target tray relative to the virtual tray, so as to obtain the pose of the target tray relative to the image acquisition device;
the actual tray point cloud set comprises target trays to be identified, and the standard tray point cloud set comprises virtual trays constructed according to the positions of the trays and the outline dimensions of the trays.
In the above steps S1 and S5, the image capturing device is a depth camera, and the tray is identified by the depth camera, so that the depth image data can be output, and the normal color image data can be output.
Further, the depth camera is a intel Realsense D455 depth camera.
In the step S2, when marking the position of the tray in the color image, the method includes: and manually controlling the forklift to insert and take the tray, opening the depth camera to record pictures in the inserting and taking process as samples all the time, and calling marking software to mark each frame of pictures in the samples, wherein the position of the tray in each frame of pictures is marked as a data set of the deep learning training.
Further, the labeling software is labelmg software.
In the above step S3, as shown in fig. 2, when the neural network recognizes the coordinates of the tray in the color image by the deep learning method, the method includes:
aligning an input color image into RGB pictures with 640 x 640 size through an input layer, and inputting the RGB pictures into a backlight layer;
the back bone layer performs feature extraction on the RGB picture and outputs three layers of feature pictures with different size to the head layer;
the head layer extracts and detects the features of the three layers of feature images with different size again to obtain the coordinates of the target tray;
the neural network comprises an input layer, a backup layer and a head layer.
Further, the neural network is a YOLOv7 neural network.
Further, when the input layer aligns the input color image, the method includes:
and performing self-adaptive size processing on the input deep learning training data set, adjusting the RGB picture with the size of 1280 x 1280, reducing the size of the deep learning training data set to 640 x 640 by utilizing a 16-layer convolution module, performing normalization processing and alignment, activating through an activation function, and then sending the activated deep learning training data set to a backstone layer.
Further, when the back plane layer performs feature extraction on the RGB picture, the method includes:
after the BConv layer receives the RGB picture, the characteristic extraction is carried out through the convolution layer, the acceleration convergence is carried out by utilizing the BN layer, the activation function is adopted for activation, the activation function is input into the alternating E-ELAN layer and the MPConv layer, and the characteristic diagrams with three layers of different sizes are output through the alternating E-ELAN layer and the MPConv layer;
the backup layer comprises a BConv layer, an E-ELAN layer and an MPConv layer, wherein the BConv layer consists of a convolution layer, a BN layer and an activation function.
Still further, the activation function is ReakyReLu.
Specifically, the backlight layer of YOLOv7 is shown in fig. 3, and is composed of several BConv layers, E-ELAN layers, and MPConv layers, wherein the BConv layers are composed of a convolution layer+bn layer+an activation function, as shown in fig. 4.
In fig. 4, bconv of different colors indicates that the convolutions are different (k indicates the size of the kernel length and width, s indicates stride, o is outcannel, i is inchannel, where o=i indicates that outcannel=inchannel, o+.i indicates that outcannel has no correlation with inchannel and is not necessarily equal in value), the first is the convolution of the convolution kernel (k=1, s=1), the length and width of the input and output are unchanged, the second is the convolution of the convolution kernel (k=3, s=1), the length and width of the output and output are unchanged, and the third is s=2, and the length and width of the output are half of the input. The Bconv of the different colors is mainly used for distinguishing k and s, and is not used for distinguishing input and output channels.
Specifically, the E-ELAN layer is also spliced by different convolutions, as shown in FIG. 5. The input and output length and width of the whole E-EL AN layer are unchanged, o=2i on channels, wherein 2i is formed by splicing output concatees with i/2 of 4 conv layer output channels.
Specifically, as shown in fig. 6, the MPConv layer has the same input/output channels, the output length and width are half of the input length and width, the upper branch is halved by maxpooling, the channel is halved by BConv, the lower branch is halved by the first BConv, the second k=3, s=2 BConv halved the length and width, and then the upper branch cat and the lower branch cat are combined to obtain the output with the length and width halved, and o=i.
Overview the entire backhaul layer was alternately halved in length and width, doubled in channels, and features extracted from several BConv layers, E-ELAN layers, and MPConv layers.
Further, as shown in fig. 7, when the head layer performs feature extraction and detection, the method includes:
the head layer carries out feature extraction again on three feature graphs with different sizes output by the back bone layer through the SPPCPC layer, the plurality of BConv layers, the plurality of MPConv layers and the plurality of Catconv layers, and then outputs the feature graphs with different sizes of the three layers again, and the coordinates of the target tray are obtained after the three RepVGGblock layers and the three conv layers are respectively detected.
Specifically, as shown in fig. 8, the output layer channel of the whole SPPCSPC layer is an out_channel, and in the calculation process, a hidden_channel=int (2×e×out_channel) is calculated, which is used for expanding the hidden_channel (hereinafter collectively referred to as hc), and in general, e=0.5 is taken, where hc=out_channel.
Specifically, the Catconv layer operates substantially the same as the E-ELAN layer, as shown in FIG. 9. The length and width of the input and output of the whole Catconv layer are unchanged, o=2i on channels, wherein 2i is formed by splicing output concatemers with i/2 of 6 conv layer output channels.
Specifically, the RepVGG block (REP) layer is shown in FIG. 10. REP is different in structure during training and deployment, 1*1 convolution branches are added by 3*3 convolution during training, meanwhile if the channel of input and output and the size of h and w are consistent, one BN branch is added, and three branches are added for output, and during deployment, parameters of the branches are re-parameterized to a main branch for convenient deployment, and 3*3 main branch convolution output is taken.
In the step S5, when the position of the target tray with respect to the virtual tray is acquired, the method includes:
the standard tray point cloud set and the actual tray point cloud set are constrained according to a certain constraint condition, and the constraint method is specifically shown as a formula 1 and a formula 2:
Figure BDA0004000425510000111
Figure BDA0004000425510000112
in the method, in the process of the invention,
Figure BDA0004000425510000113
for a single point of the standard tray point cloud, +.>
Figure BDA0004000425510000114
For standard tray point clouds, +.>
Figure BDA0004000425510000115
Is->
Figure BDA0004000425510000116
Centroid of->
Figure BDA0004000425510000117
For a single point of the actual tray point cloud, +.>
Figure BDA0004000425510000118
For the actual tray point cloud, +.>
Figure BDA0004000425510000119
Is->
Figure BDA00040004255100001110
Is a centroid of (c).
Further, when the position of the target tray relative to the virtual tray is acquired, the method further comprises:
according to the constraint condition, a first loss function equation is established, and the first loss function equation is specifically shown as a formula 3:
Figure BDA00040004255100001111
wherein R is a rotation matrix, and t is a translation matrix;
let N be the total number of point clouds |P s And (3) deriving the first loss function equation to enable the derivative to be 0, and obtaining a coordinate equation, wherein the coordinate equation is specifically shown as a formula 4:
Figure BDA00040004255100001112
the optimal t, i.e. the coordinates (X, Y, Z) of the target tray relative to the virtual tray, can be obtained according to the coordinate equation, where the first loss function equation is also minimal.
Specifically, the z-axis height of the tray is equal to the ground height plus the tray height, the tray X, Y axis is near the center of the depth camera, the tray is placed on the front face, and the tray is 0.6-2.2 meters away from the depth camera.
In the step S5, when acquiring the angle of the target tray with respect to the virtual tray, the method includes:
without considering translation, a second loss function equation is established as shown in equation 5:
Figure BDA0004000425510000121
wherein R is a rotation matrix,
Figure BDA0004000425510000122
centroid for standard tray point cloud, +.>
Figure BDA0004000425510000123
Centroid that is the actual tray point cloud;
in the second loss function equation by relation 6 and relation 7
Figure BDA0004000425510000124
And simplifying to obtain a simplified relational expression, wherein the simplified relational expression is specifically shown in a formula 8:
R T R=I (6);
Figure BDA0004000425510000125
Figure BDA0004000425510000126
wherein, the superscript T is the transposed matrix of the matrix, and I is R T Itself, the method comprises the steps of;
since the coordinates (X, Y, Z) of the tray are determined independently of R, by finding
Figure BDA0004000425510000127
And minimizes the second loss function equation, as shown in equation 9:
Figure BDA0004000425510000128
specifically, the above-described relation 6 and relation 7 are obtained by utilizing the property that the transpose of the scalar is equal to itself.
Further, when acquiring the angle of the target tray relative to the virtual tray, the method further comprises:
the conversion of equation 9 is performed according to equation 10, as shown in equation 11:
Figure BDA0004000425510000131
R * =argmax R trace(P t T RP s ) (11);
using the properties of trace, the trace (P t T RP s ) The conversion is performed as shown in formula 12:
Figure BDA0004000425510000132
wherein V is P s P t T Is the orthogonal matrix of U is P t T P s R is a diagonal matrix, V T RU is P t T RP s Is a matrix of orthogonality;
equation 12 is transformed using a matrix relationship 13, the matrix relationship is shown in equation 13, and the transformation process is shown in equation 14:
Figure BDA0004000425510000133
trace(∑V T RU)=trace(∑M)
=σ 1 m 112 m 221 m 33 (14):
wherein M is a feature vector matrix;
let M be the unit matrix, make trace (Sigma M) maximum, get the angle of the target tray relative to the virtual tray, specifically as shown in formula 15, formula 16 and formula 17:
V T RU=I (15);
R=VU T (16);
R * =VU T (17)。
wherein R is * Is the angle of the target tray relative to the virtual tray.
Specifically, the above relation 10 is obtained by matrix multiplication and the definition of trace, and the property of trace is specifically shown in the formula 18:
trace(AB)=trace(BA) (18)。
specifically, from the non-negative nature of the singular value and the nature of the orthogonal matrix (the absolute value of the element in the orthogonal matrix is not more than 1), it is easy to prove that trace (Σm) is maximum only when M is the unit matrix.
The invention provides a system for accurately identifying a tray based on deep learning, which comprises the following components:
training data set construction unit: the image acquisition device is used for acquiring the trays to obtain depth images and color images of a plurality of trays, and aligning the depth images and the color images; after marking the positions of the trays in the color images, taking the marked color images as a deep learning training data set, and inputting the deep learning training data set into a neural network;
standard tray point cloud set construction unit: the method comprises the steps that coordinates of a tray in a color image are identified through a neural network in a deep learning mode; obtaining the position of the tray in the depth image according to the coordinates of the tray, inputting the outline dimension of the tray in the position of the tray in the depth image, and constructing a standard tray point cloud set;
tray identification unit: the method comprises the steps of performing ICP point cloud matching on an actual tray point cloud set and a standard tray point cloud set which are currently acquired by an image acquisition device, and acquiring the position and the angle of a target tray relative to a virtual tray, so that the pose of the target tray relative to the image acquisition device is obtained;
the actual tray point cloud set comprises target trays to be identified, and the standard tray point cloud set comprises virtual trays constructed according to the positions of the trays and the outline dimensions of the trays.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention eliminates the problem of false tray identification caused by the pure depth map by using a neural network to carry out deep learning;
(2) In the process of identifying and positioning the tray, the coordinates of the tray in the color image are identified by the color image through a deep learning mode, so that misidentification caused by human interference or other objects is eliminated, the position of the tray in the depth image can be roughly known after the position of the color image is known, and the accuracy of tray identification is improved.
The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims (10)

1. The method for accurately identifying the tray based on deep learning is characterized by comprising the following steps of:
acquiring the trays by using an image acquisition device to obtain depth images and color images of a plurality of trays, and aligning the depth images and the color images;
after marking the positions of the trays in the color images, taking the marked color images as a deep learning training data set, and inputting the deep learning training data set into a neural network;
the neural network identifies the coordinates of the tray in the color image in a deep learning mode;
obtaining the position of the tray in the depth image according to the coordinates of the tray, inputting the shape size of the tray to construct a standard tray point cloud set;
performing ICP point cloud matching on the actual tray point cloud set and the standard tray point cloud set which are currently acquired by the image acquisition device, and acquiring the position and the angle of a target tray relative to a virtual tray, so as to obtain the pose of the target tray relative to the image acquisition device;
the actual tray point cloud set comprises target trays to be identified, and the standard tray point cloud set comprises virtual trays constructed according to the positions of the trays and the outline dimensions of the trays.
2. The method for accurately identifying the tray based on the deep learning according to claim 1, wherein when the neural network identifies the coordinates of the tray in the color image by means of the deep learning, the method comprises:
aligning an input color image into RGB pictures with 640 x 640 size through an input layer, and inputting the RGB pictures into a backlight layer;
the backup layer performs feature extraction on the RGB picture and outputs three layers of feature images with different size to the head layer;
the head layer extracts and detects the features of the three feature images with different size again to obtain the coordinates of the target tray;
the neural network comprises an input layer, a backup layer and a head layer.
3. The method for accurately recognizing a tray based on deep learning according to claim 2, wherein when the input layer aligns the input color image, comprising:
and performing self-adaptive size processing on the input deep learning training data set, adjusting the RGB picture with the size of 1280 x 1280, reducing the size of the deep learning training data set to 640 x 640 by using a 16-layer convolution module, performing normalization processing and alignment, activating by an activation function, and then sending the deep learning training data set to the backstone layer.
4. The method for accurately identifying a tray based on deep learning according to claim 2, wherein when the back plane layer performs feature extraction on the RGB picture, the method comprises:
the BConv layer receives the RGB picture, performs feature extraction through a convolution layer, performs acceleration convergence by utilizing a BN layer, and inputs the RGB picture into an alternate E-ELAN layer and an MPConv layer after being activated by adopting an activation function, and outputs three feature images with different size through the alternate E-ELAN layer and the MPConv layer;
the backup layer comprises a BConv layer, an E-ELAN layer and an MPConv layer, wherein the BConv layer consists of a convolution layer, a BN layer and an activation function.
5. The method for accurately identifying a tray based on deep learning according to claim 2, wherein when the head layer performs feature extraction and detection, the method comprises:
and the head layer carries out feature extraction on the three feature images with different sizes output by the back bone layer through the SPPCPC layer, the plurality of BConv layers, the plurality of MPConv layers and the plurality of Catconv layers, and outputs the three feature images with different sizes again, and after detection is carried out on the three RepVGGblock layers and the three conv layers respectively, the coordinates of the target tray are obtained.
6. The method for accurately identifying a tray based on deep learning according to claim 1, wherein when acquiring the position of a target tray with respect to a virtual tray, comprising:
the standard tray point cloud set and the actual tray point cloud set are constrained according to a certain constraint condition, and the constraint method is specifically shown as a formula 1 and a formula 2:
Figure FDA0004000425500000031
Figure FDA0004000425500000032
in the method, in the process of the invention,
Figure FDA0004000425500000033
for a single point of the standard tray point cloud, +.>
Figure FDA0004000425500000034
For standard tray point clouds, +.>
Figure FDA0004000425500000035
Is->
Figure FDA0004000425500000036
Centroid of->
Figure FDA0004000425500000037
For a single point of the actual tray point cloud, +.>
Figure FDA0004000425500000038
For the actual tray point cloud, +.>
Figure FDA0004000425500000039
Is->
Figure FDA00040004255000000310
Is a centroid of (c).
7. The method for accurately identifying a tray based on deep learning of claim 6, further comprising, when acquiring the position of the target tray relative to the virtual tray:
according to the constraint condition, a first loss function equation is established, and the first loss function equation is specifically shown as a formula 3:
Figure FDA00040004255000000311
wherein R is a rotation matrix, and t is a translation matrix;
let N be the total number of point clouds |P s And (3) deriving the first loss function equation to enable the derivative to be 0, and obtaining a coordinate equation, wherein the coordinate equation is specifically shown as a formula 4:
Figure FDA00040004255000000312
an optimal t, i.e. the coordinates (X, Y, Z) of the target tray with respect to the virtual tray, is obtained from the coordinate equation.
8. The method for accurately identifying a tray based on deep learning according to claim 1, wherein when acquiring an angle of a target tray with respect to a virtual tray, comprising:
without considering translation, a second loss function equation is established as shown in equation 5:
Figure FDA00040004255000000313
wherein R is a rotation matrix,
Figure FDA00040004255000000314
centroid for standard tray point cloud, +.>
Figure FDA00040004255000000315
Centroid that is the actual tray point cloud;
in the second loss function equation by relation 6 and relation 7
Figure FDA0004000425500000041
And simplifying to obtain a simplified relational expression, wherein the simplified relational expression is specifically shown in a formula 8:
R T R=I (6);
Figure FDA0004000425500000042
Figure FDA0004000425500000043
wherein, the superscript T is the transposed matrix of the matrix, and I is R T Itself, the method comprises the steps of;
since the coordinates (X, Y, Z) of the pallet are determined independently of R, by finding
Figure FDA0004000425500000044
Minimizing the second loss function equation, as shown in equation 9:
Figure FDA0004000425500000045
9. the method for accurately identifying a tray based on deep learning of claim 8, further comprising, when acquiring an angle of a target tray with respect to a virtual tray:
the equation 9 is transformed according to equation 10 as shown in equation 11:
Figure FDA0004000425500000046
Figure FDA0004000425500000047
by utilizing the property of trace, for the formula 11
Figure FDA0004000425500000048
The conversion is performed as shown in formula 12:
Figure FDA0004000425500000049
wherein V is
Figure FDA00040004255000000410
Is U is->
Figure FDA00040004255000000411
R is a diagonal matrix, V T RU is->
Figure FDA00040004255000000412
Is a matrix of orthogonality;
the formula 12 is converted by using a matrix relation 13, wherein the matrix relation is shown in formula 13, and the conversion process is shown in formula 14:
Figure FDA0004000425500000051
trace(∑V T RU)=trace(∑M)
=σ 1 m 112 m 221 m 33 (14):
wherein M is a feature vector matrix;
let M be a unit array, and maximize trace (ΣM) to obtain the angle of the target tray relative to the virtual tray, specifically as shown in formula 15, formula 16 and formula 17:
V T RU=I (15);
R=VU T (16);
R * =VU T (17)。
wherein R is * Is the angle of the target tray relative to the virtual tray.
10. A system for accurately identifying a tray based on deep learning, comprising:
training data set construction unit: the image acquisition device is used for acquiring the trays to obtain depth images and color images of a plurality of trays, and aligning the depth images and the color images; after marking the positions of the trays in the color images, taking the marked color images as a deep learning training data set, and inputting the deep learning training data set into a neural network;
standard tray point cloud set construction unit: the method comprises the steps that coordinates of a tray in a color image are identified through a neural network in a deep learning mode; obtaining the position of the tray in the depth image according to the coordinates of the tray, inputting the shape size of the tray to construct a standard tray point cloud set;
tray identification unit: the method comprises the steps of performing ICP point cloud matching on an actual tray point cloud set and a standard tray point cloud set which are currently acquired by an image acquisition device, and acquiring the position and the angle of a target tray relative to a virtual tray, so that the pose of the target tray relative to the image acquisition device is obtained;
the actual tray point cloud set comprises target trays to be identified, and the standard tray point cloud set comprises virtual trays constructed according to the positions of the trays and the outline dimensions of the trays.
CN202211616543.7A 2022-12-15 2022-12-15 Method and system for accurately identifying tray based on deep learning Pending CN116310622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211616543.7A CN116310622A (en) 2022-12-15 2022-12-15 Method and system for accurately identifying tray based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211616543.7A CN116310622A (en) 2022-12-15 2022-12-15 Method and system for accurately identifying tray based on deep learning

Publications (1)

Publication Number Publication Date
CN116310622A true CN116310622A (en) 2023-06-23

Family

ID=86815565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211616543.7A Pending CN116310622A (en) 2022-12-15 2022-12-15 Method and system for accurately identifying tray based on deep learning

Country Status (1)

Country Link
CN (1) CN116310622A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612357A (en) * 2023-07-11 2023-08-18 睿尔曼智能科技(北京)有限公司 Method, system and storage medium for constructing unsupervised RGBD multi-mode data set

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989366A (en) * 2021-12-27 2022-01-28 机科发展科技股份有限公司 Tray positioning method and device
CN114170521A (en) * 2022-02-11 2022-03-11 杭州蓝芯科技有限公司 Forklift pallet butt joint identification positioning method
CN114694134A (en) * 2022-03-23 2022-07-01 成都睿芯行科技有限公司 Tray identification and positioning method based on depth camera point cloud data
CN114972968A (en) * 2022-05-19 2022-08-30 长春市大众物流装配有限责任公司 Tray identification and pose estimation method based on multiple neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989366A (en) * 2021-12-27 2022-01-28 机科发展科技股份有限公司 Tray positioning method and device
CN114170521A (en) * 2022-02-11 2022-03-11 杭州蓝芯科技有限公司 Forklift pallet butt joint identification positioning method
CN114694134A (en) * 2022-03-23 2022-07-01 成都睿芯行科技有限公司 Tray identification and positioning method based on depth camera point cloud data
CN114972968A (en) * 2022-05-19 2022-08-30 长春市大众物流装配有限责任公司 Tray identification and pose estimation method based on multiple neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
徐斌等: "融合图像与点云处理的托盘定位系统", 《制造业自动化》, pages 2 - 3 *
问夏: "YoloV7:又快又准,Yolov4团队力作", "ZHUANLAN.ZHIHU.COM/P/554769215?UTM_ID=0", pages 3 - 9 *
陈学坤: "零件精度检测中基于球特征点云配准技术研究", 《万方硕士学位论文数据库》, pages 42 - 44 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612357A (en) * 2023-07-11 2023-08-18 睿尔曼智能科技(北京)有限公司 Method, system and storage medium for constructing unsupervised RGBD multi-mode data set
CN116612357B (en) * 2023-07-11 2023-11-24 睿尔曼智能科技(北京)有限公司 Method, system and storage medium for constructing unsupervised RGBD multi-mode data set

Similar Documents

Publication Publication Date Title
US10198623B2 (en) Three-dimensional facial recognition method and system
US8374422B2 (en) Face expressions identification
KR102667740B1 (en) Device and method for matching image
US9928405B2 (en) System and method for detecting and tracking facial features in images
US8467596B2 (en) Method and apparatus for object pose estimation
CN105740780B (en) Method and device for detecting living human face
Keller et al. A new benchmark for stereo-based pedestrian detection
Zhu et al. Discriminative 3D morphable model fitting
US20220157047A1 (en) Feature Point Detection
WO2019071664A1 (en) Human face recognition method and apparatus combined with depth information, and storage medium
US20130202161A1 (en) Enhanced face detection using depth information
US20110227923A1 (en) Image synthesis method
US9767383B2 (en) Method and apparatus for detecting incorrect associations between keypoints of a first image and keypoints of a second image
US20150324659A1 (en) Method for detecting objects in stereo images
JP2010267231A (en) Device and method for estimating positional orientation
US11380010B2 (en) Image processing device, image processing method, and image processing program
KR101865253B1 (en) Apparatus for age and gender estimation using region-sift and discriminant svm classifier and method thereof
CN110458041A (en) A kind of face identification method and system based on RGB-D camera
CN116310622A (en) Method and system for accurately identifying tray based on deep learning
CN109919128A (en) Acquisition methods, device and the electronic equipment of control instruction
Segundo et al. Real-time scale-invariant face detection on range images
CN110956664A (en) Real-time camera position repositioning method for handheld three-dimensional scanning system
CN112084840A (en) Finger vein identification method based on three-dimensional NMI
CN108694348B (en) Tracking registration method and device based on natural features
CN110070490A (en) Image split-joint method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination