CN116061187B - Method for identifying, positioning and grabbing goods on goods shelves by composite robot - Google Patents

Method for identifying, positioning and grabbing goods on goods shelves by composite robot Download PDF

Info

Publication number
CN116061187B
CN116061187B CN202310206998.XA CN202310206998A CN116061187B CN 116061187 B CN116061187 B CN 116061187B CN 202310206998 A CN202310206998 A CN 202310206998A CN 116061187 B CN116061187 B CN 116061187B
Authority
CN
China
Prior art keywords
goods
commodity
image
shelf
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310206998.XA
Other languages
Chinese (zh)
Other versions
CN116061187A (en
Inventor
吴波
张春生
董芹鹏
郑随兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruiman Intelligent Technology Jiangsu Co ltd
Original Assignee
Ruiman Intelligent Technology Jiangsu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruiman Intelligent Technology Jiangsu Co ltd filed Critical Ruiman Intelligent Technology Jiangsu Co ltd
Priority to CN202310206998.XA priority Critical patent/CN116061187B/en
Publication of CN116061187A publication Critical patent/CN116061187A/en
Application granted granted Critical
Publication of CN116061187B publication Critical patent/CN116061187B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J18/00Arms
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/04Viewing devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/161Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1661Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention relates to a method for identifying, positioning and grabbing goods on a goods shelf by a composite robot, and belongs to the technical field of robot control. The invention collects goods images of goods shelves through a binocular structured light infrared camera arranged at the tail end of a mechanical arm, takes goods of each specification of each brand of each goods as a category, builds a target detection network by using a deep learning method, uses the trained target detection network to infer images collected by the camera in real time, uses a mean Hash algorithm and a three-histogram algorithm to verify the prediction result of the target detection network, and when a target goods is found, converts the goods position coordinates obtained from RGB images into a world coordinate system by combining the depth values of the goods obtained from the depth images, and transmits the goods position coordinates to the mechanical arm to execute a target goods grabbing task. The method can effectively identify and detect goods on the shelves with various types and dense arrangement, is insensitive to illumination change by verification, and has high detection accuracy for goods on the shelves with small appearance difference.

Description

Method for identifying, positioning and grabbing goods on goods shelves by composite robot
Technical Field
The invention belongs to the technical field of robot control, and particularly relates to a method for identifying, positioning and grabbing goods on a goods shelf by a composite robot.
Background
Grabbing the target object is a common action in production and is a basic function required by a robot, and correct identification and accurate positioning of the target object are the precondition of successful grabbing. In the identification and positioning of a target object, the conventional method is used for grabbing objects in a fixed area according to a program flow written in advance, the types of the objects are few, and when the actual placement position of the objects deviates from a program setting position, the situation of grabbing failure easily occurs.
Goods on the goods shelves are various in variety and densely arranged, and the goods with different tastes or different capacities of the same brand are very little in difference in appearance, and only the corresponding tastes and capacities are noted on the packages. The existing robot grabbing method aims at various goods, and goods on a goods shelf with unfixed placement positions cannot be accurately grabbed.
Disclosure of Invention
Based on the method, the invention provides a composite robot goods shelf goods identification, positioning and grabbing method which is used for accurately grabbing goods shelf goods with various types and unfixed placing positions.
In order to achieve the above purpose, the present invention provides the following technical solutions: the invention provides a method for identifying, positioning and grabbing goods on a goods shelf by a compound robot, which comprises the following steps:
s1: the tail end of the mechanical arm of the compound robot is provided with a binocular structured light infrared camera and a two-finger hand-grabbing device; acquiring goods shelf image data sets of different types, different light environments and different visual angles by using a binocular structured light infrared camera to generate a training set and a testing set;
each sample comprises an RGB image, a depth image and an image labeling result of the goods shelf, the RGB image labeling result is stored as a txt file, and each row in the txt file comprises a category code number of the target goods, a u coordinate of a center point of the target goods, a v coordinate of the center point of the target goods, a horizontal length proportion and a vertical height proportion of the target goods in the images; taking each goods shelf commodity of each specification of each brand of each commodity as a category and having a unique category code number;
s2: building a target detection network by using a deep learning method and training;
the input of the target detection network is an RGB image of a goods shelf commodity, and the class code and the commodity position of the commodity are output, wherein the commodity position is the coordinates (u, v) of a commodity center in the image;
the target detection network selects a yolov5 network, and deletes a Resize function for changing the commodity size in a data set loading class DataLoader of the target detection network;
s3: building a target detection reasoning frame, and reasoning by using a trained target detection network;
the target detection reasoning framework is connected with the input of the binocular structured light infrared camera and the target detection network which is trained to obtain the optimal weight, the RGB image and the depth image which are acquired by the binocular structured light infrared camera are acquired in real time, and the RGB image is input into the target detection network to predict the commodity category and the position of the goods shelf;
s4: verifying the prediction result of the target detection network by using a mean value hash algorithm and a three-histogram algorithm, when both the mean value hash algorithm and the three-histogram algorithm pass verification, indicating that the prediction is accurate, continuing the next step, if the mean value hash algorithm and the three-histogram algorithm do not pass verification, indicating that the prediction is wrong, and turning to S3 to continue detecting the next frame of image;
acquiring a front RGB image as a template of each type of goods according to each type of goods;
the mean hash algorithm verification means that a mean hash algorithm is used for calculating the Hamming distance of a current RGB image and a goods shelf commodity template of a target detection network prediction type, when the Hamming distance is smaller than 4, the prediction is correct, otherwise, the prediction is incorrect;
the three-histogram algorithm verification means that three-histogram algorithm is used for calculating the identity of the current RGB image and the goods shelf templates of the target detection network prediction category, the average value of the Babbitt coefficients of three channels is used as a similarity value, when the similarity is larger than 0.8, the prediction is correct, and otherwise, the prediction is wrong;
s5: judging whether the current goods shelf goods are target goods required by the user, if not, continuing to transfer to S3 to detect the next frame of image; if so, acquiring a depth value Z of the goods on the goods shelf according to the current depth image, converting the coordinates of the goods on the goods shelf in the image into a world coordinate system, transmitting the coordinates to a mechanical arm of the compound robot, and executing a target goods grabbing task by a two-finger manual grabbing device.
In step S3, the compound robot obtains the types of the goods placed on each layer on each shelf in advance, when the compound robot receives the demands of the goods of the user, the corresponding shelf and the layer where the goods are located are determined according to the types of the target goods, the compound robot is moved to the corresponding shelf, the mechanical arm of the compound robot is moved, the binocular structured light infrared camera is used for shooting the images of the goods in the shelf layer where the target goods are located, and the target goods are searched.
In the step S2, when training the target detection network, a loss function is set to be composed of three parts, namely a bounding box regression loss, a target confidence loss and a category loss; the method comprises the steps that a bounding box regression Loss CIoULSs is calculated by using a CIoU Loss function, a target confidence Loss BCELoss is calculated by using a binary cross entropy Loss function, and a category Loss Focalloss is calculated by using a Focal Loss function; the total Loss function Loss is: loss=ciouloss+bceloss+focalloss.
In the step S5, the internal reference of the binocular structured light infrared camera is obtained in advance
Figure SMS_1
,/>
Figure SMS_2
,/>
Figure SMS_3
,/>
Figure SMS_4
Then converting coordinates (u, v) of the goods on the shelf in the image into a camera coordinate system to obtain coordinates (X, Y, Z), wherein Z is obtained from the acquired depth values, and the conversion matrix is as follows:
Figure SMS_5
converting coordinates (X, Y, Z) of the goods on the shelf under a camera coordinate system to a world coordinate system, and transmitting the coordinates to the mechanical arm; the origin of the world coordinate system is set at the center of the loading plane of the mechanical arm, the z-axis vertical loading plane is outwards, and the y-axis vertical horizontal plane is upwards.
In summary, the invention has the following advantages:
according to the method, goods on the goods shelf are grabbed based on machine vision, the coordinates of the target goods in the images are output by combining the deep learning method and the traditional image processing method, the depth value of the target goods in the images is output by using the depth images, the accurate coordinates of the target goods in the world coordinate system are output to be grabbed by the mechanical arm through coordinate conversion, the situation that grabbing fails when the actual placement position of the goods deviates from the programmed position is effectively avoided, and the grabbing accuracy of the mechanical arm is greatly improved;
according to the method, aiming at the characteristics of the goods to be identified, the target identification is carried out by combining the deep learning method and the traditional image processing method, so that goods with various goods on shelves and dense placement can be effectively identified and detected, the goods on the shelves are insensitive to illumination change and have high detection accuracy aiming at small appearance difference;
according to the method, the composite robot lifting mobile platform is used for loading the mechanical arm, so that the tasks of lifting of the mechanical arm in a long stroke, placing and operating of the mechanical arm in a horizontal direction after moving, steering and grabbing to a target position can be realized, and the working efficiency is further improved. The tail end grabbing mechanism simulates a device for grabbing objects by two fingers of a human body, adopts an independent high-precision motor to drive and feed back grabbing force, can grab goods on shelves of different sizes and shapes and is not harmful to the goods.
Drawings
FIG. 1 is a schematic diagram of a compound robot of the present invention performing merchandise capture;
FIG. 2 is a schematic view of a compound robotic positioning and gripping mechanism of the present invention;
FIG. 3 is a schematic diagram of a process for detecting and positioning a commodity by the composite robot;
FIG. 4 is a schematic diagram of the relationship of a pixel coordinate system, a camera coordinate system and a world coordinate system used in the present invention.
In the figure: 1. a goods shelf; 2. a composite robot lifting platform; 3. a composite robot moving platform; 4. a mechanical arm; 5. a binocular structured light infrared camera; 6. a two-finger hand grip.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
As shown in fig. 1 and 2, the composite robot for identifying, positioning and grabbing goods on a goods shelf 1 according to the method for identifying, positioning and grabbing goods on the goods shelf comprises a lifting platform 2, a moving platform 3 and a mechanical arm 4, wherein the mechanical arm 4 is arranged on the lifting platform 2, the lifting platform 2 realizes the up-down space movement of the mechanical arm 4, the lifting platform 2 is carried on the moving platform 3, and the moving platform 3 moves to realize the movement of the mechanical arm in the front-back left-right space. The tail end of the mechanical arm 4 is provided with a binocular structured light infrared camera 5 and a two-finger hand-grabbing device 6.
The method for identifying, positioning and grabbing goods on the goods shelf by the composite robot is realized by a main process shown in fig. 3, and is described in 5 steps.
Step 1: and acquiring goods shelf commodity image data sets with different types, different light environments and different visual angles by using a binocular infrared camera in advance.
The binocular infrared camera of the embodiment of the present invention uses a D435i camera under the RealSense series, which is proposed by Intel corporation.
The embodiment of the invention collects the goods shelf commodity image data sets with different types, different light environments and different visual angles, and is concretely implemented as follows: the D435i camera under the RealSense series, which is proposed by Intel corporation, is used for collecting two front sides, one side, one back side RGB image and the corresponding depth image of different kinds of goods on shelves under the environments of sufficient light and insufficient light respectively. Using LabelImg labeling software to select different types of goods shelves and commodity frames in the RGB image and respectively giving numbers: 0.1, 2.n-1, n is the total category number of the commodity, and the result of each RGB image label is derived into a txt file. In the invention, commodities with different specifications of each brand of commodity are taken as a category, and different code numbers are assigned. Each row in the txt file represents a commodity in the image, and the contents of each row are respectively: the category code of the commodity, the u coordinate of the commodity center point, the v coordinate of the commodity center point, the proportion of the horizontal length of the commodity in the image and the proportion of the vertical height of the commodity in the image. The invention acquires the (u, v) coordinates of the commodity in the image from the RGB image, and acquires the z coordinates of the commodity in the image from the depth image.
In the embodiment of the invention, each layer of shelves is used for placing different brands of the same kind of goods, and different layers of shelves are used for placing different kinds of goods. In the embodiment of the invention, 10 goods on a goods shelf are sampled, 20 goods with the same brand and the same specification are selected for each type of goods, 5 RGB images and 5 depth images are sampled for each goods under the environment of sufficient light and insufficient light, a goods shelf goods data set consisting of 2000 images is obtained, and the goods shelf data set is divided into 1800 images of a training set and 200 images of a test set. In the embodiment of the invention, in the image acquisition process, the distance between the camera and the commodity is kept 30cm in the horizontal direction, and the distance between the camera and the commodity is 15cm higher than the plane of a goods shelf on each layer in the vertical direction.
Step 2: the target detection network is built and trained using a deep learning method.
In the embodiment of the invention, the goods shelf image data set manufactured in the step 1 is directly input to a target detection network for training, wherein the target detection network selects 6.0 version of yolov 5. The input of the target detection network is RGB image and depth image, and the category code and the commodity position of the goods shelf commodity are output, wherein the commodity position is the coordinates (u, v) of the goods shelf commodity center in the image.
Because the convolutional neural network has the characteristic of scale invariance, and different commodities often have the same appearance and different capacities, the size Resize function of the commodities in the training data set is deleted and transformed in the data set loading class DataLoader of the target detection network, and the detection accuracy of the model on the commodities with different sizes is improved. In the model training process, finding out the critical point of the model overfitting according to the accuracy-loss curve recorded by the visualization tool TensorBoard, and storing the training weight stored by the critical point as the optimal weight for the reasoning process of the target detection network in the step 3.
In the embodiment of the invention, when the target detection network is trained, the loss function consists of three parts, namely, a boundary box regression loss, a target confidence coefficient loss and a category loss, wherein the boundary box regression loss is calculated by using a CIoU loss function, and the target confidence coefficient loss is calculated by using a binary cross entropy loss function. Because the goods of the goods shelf are huge in category number and are mixed with a plurality of samples which are similar in appearance and difficult to distinguish, the method uses the Focal Loss function to replace the binary cross entropy Loss function to calculate category Loss, so that the category Loss is focused on the samples which are difficult to distinguish, and the overall performance of the model is improved. The calculation formula of Focal Loss is as follows:
Figure SMS_6
focal Loss can also be expressed as:
Figure SMS_7
wherein Lfl represents a Focal Loss function value;
Figure SMS_8
the predicted probability size; y is a label, in the two classifications y=0 represents a negative sample, and y=1 represents a positive sample; a is category weight, which is used for balancing the problem of unbalance of positive and negative samples, and the unbalance of the number of the positive and negative samples can be restrained by adjusting a; />
Figure SMS_9
Representing the weight of the refractory sample for measuring the refractory sample and the easily separable sample by adjusting +.>
Figure SMS_10
Simple/indistinguishable sample size imbalance can be controlled; probability->
Figure SMS_11
Reflects the proximity to the real class y, < ->
Figure SMS_12
The larger the specification the more accurate the classification.
The total Loss function Loss of the invention is as follows:
Loss=CIoULoss+BCELoss+FocalLoss
wherein CIoUloss is a bounding box regression loss, BCELoss is a target confidence loss, and FocalLoss is a category loss.
In the embodiment of the invention, the total iteration time is set to 90 epochs when the target detection network is trained, the batch size is set to 16, and the optimizer selects Adam. The target detection model uses the wakeup preheating training in the training process, so that the damage to the original weight caused by the overlarge learning rate at the beginning of the training is avoided, and the stability of the model is ensured. The specific process is as follows: the learning rate of the bias layer is rapidly reduced from 0.1 to 0.01 by the first 5 epochs of training, the learning rate of other parameters is slowly increased from 0 to 0.01, and the learning rate is updated by using a cosine annealing learning algorithm from the 6 th epochs, so that the learning rate is changed according to a cosine curve.
Step 3: and (3) constructing a target detection reasoning framework, and reasoning by using a trained target detection network.
In the embodiment of the invention, a target detection reasoning frame is built based on a detection script of yolov5, a python SDK (python software development kit) of a D435i camera under a RealSense series which is proposed by Intel corporation is downloaded, and the python SDK is accessed into the target detection reasoning frame, so that the reasoning frame can acquire RGB images and depth images acquired by the camera in real time.
In the embodiment of the invention, the specific implementation of reasoning by using the trained target detection network is as follows: firstly, loading the optimal weight obtained by training in the step 2 in a target detection reasoning frame, directly inputting the RGB image acquired by the binocular camera in real time into the target detection reasoning frame, and obtaining the predicted category of the current goods shelf commodity and the two-dimensional coordinates (u, v) of the current goods shelf commodity in the RGB image through reasoning of the target detection network.
In the embodiment of the invention, each layer of shelves is provided with different brands of different specifications of the same type, for example, one layer of shelves is mineral water, mineral water with different sizes of different brands is provided, the compound robot acquires the type information of the goods placed on each layer of each shelf in advance, for example, the first layer of a certain shelf is mineral water, the second layer is carbonated beverage and the like, when the compound robot receives the demands of the goods of a user, the corresponding shelf and the layer where the goods are positioned are firstly acquired according to the type of the goods which are required, the compound robot is moved to the corresponding shelf, each commodity image of the layer where the goods are required is shot by using the binocular structured light infrared camera 5, and when each commodity image is shot, the binocular structured light infrared camera 5 is enabled to reach a set shooting angle, namely, the horizontal direction of the camera keeps a distance of 30cm with the goods, and the vertical direction is 15cm higher than the plane of the shot shelf layer.
Step 4: and verifying the detection result of the target detection network by using a traditional image processing method. And if and only if the detection result passes the verification of the mean hash algorithm and the three-histogram algorithm, the detection result is correct. And when the detection result is wrong only through verification of the mean hash algorithm or both algorithms are not passed, the system continues to detect the next frame of image.
Because the convolution nature has translation invariance, the goods on the goods shelf are rotated by a certain angle, and the goods on the goods shelf can not be identified by the network after the light color is changed, the detection result of the target detection network is verified by using the traditional image processing method, so that the network model has higher robustness.
In the embodiment of the invention, after the step 2 is finished, a front RGB image is collected for each goods shelf commodity to be used as a template of the commodity, and the similarity degree of the template image of the commodity and the RGB image of the current commodity is used as a main basis for judging whether the detection result is correct. In the network model reasoning process, when the target detection network identifies the category of the goods on the goods shelf, a mean value hash algorithm is used for calculating the similarity between the RGB image of the goods on the goods shelf and the corresponding template image, when the Hamming distance is smaller than 4, the prediction is correct, and otherwise, the prediction is incorrect. In the embodiment of the invention, the mean Hash algorithm comprises the following calculation steps: firstly, scaling two images into images with 8 x 8 pixels, setting the length of the images, converting the images into gray level images, calculating the average value of the pixels of the gray level images, marking the pixel value of the gray level images as 1 which is greater than or equal to the average value and 0 which is smaller than the average value, and counting how many digits in the two images are different, thus obtaining the Hamming distance.
In the embodiment of the invention, in the process of reasoning of the network model, after the goods on the goods shelf are verified by the mean hash algorithm, the similarity between the goods on the goods shelf and the corresponding template photos is calculated by using a three-histogram algorithm, and when the similarity of the two images is larger than 0.8, the prediction is correct, namely the detection result of the network passes the verification of the mean Hash algorithm and the three-histogram algorithm. Similarity is calculated using the Pasteur coefficient ρ, with the following formula:
Figure SMS_13
wherein the method comprises the steps of
Figure SMS_14
Image histogram data representing the source image and the candidate image respectively, i representing the position of each pixel point, and N representing the total number of pixel points in the image. And adding the data point products of the same pixel point positions i after square division, wherein the obtained result is the image similarity value of the Papanicolaou coefficient, and the range is between 0 and 1.
The three-histogram algorithm comprises the following calculation steps: separating RGB channels of the two images, counting the histogram of each channel, calculating the Pasteur coefficient of the histogram of the two images under each channel, and taking the average value of the Pasteur coefficients of the three channels as the acquaintance value of the two images.
Step 5: if the goods shelf commodity identified by the current target detection network is the target commodity required by the user, carrying out coordinate conversion, otherwise, continuing to transfer to the step 3, and shooting the next commodity image by using the binocular structured light infrared camera 5. The coordinate conversion is to convert the two-dimensional coordinates and depth values of the goods on the shelf in the image into coordinates in the world coordinate system, and transmit the coordinates to the mechanical arm, and the two-finger hand-grabbing device 6 executes the grabbing task of the target goods.
The application scene of the invention relates to three coordinate systems, namely a pixel coordinate system, a camera coordinate system and a world coordinate system, as shown in fig. 4. The pixel coordinate system is built in an image shot by the camera, the origin of the u-axis and the v-axis of the pixel coordinate system is positioned at the upper left corner of the image, the u-axis is horizontal to the right, and the v-axis is vertical to the lower; the camera coordinate system is a three-dimensional rectangular coordinate system O-XYZ which is established by taking the focusing center of the binocular structured light infrared camera as an origin O and taking the optical axis as a Z axis; the origin o of the world coordinate system is set at the center of a loading plane of the mechanical arm, the z-axis is vertical to the loading plane outwards, the y-axis is vertical to the horizontal plane upwards, and the x-axis, the z-axis and the y-axis form a right-hand system. The world coordinate system is used to define the objective position of the stereoscopic space and is a reference for measuring other points or other coordinate systems in the stereoscopic space.
The invention uses the commodity center position identified from RGB image by the target detection network as the coordinate (u, v) in the pixel coordinate system, and obtains the depth value of the target commodity from the depth image, and carries out the conversion under the pixel coordinate system and the camera coordinate system, as follows:
Figure SMS_15
Figure SMS_16
the form written in matrix is as follows:
Figure SMS_17
wherein (X, Y, Z) is the coordinates of the target goods shelf commodity under the camera coordinate system,
Figure SMS_19
for pixel coordinates at +.>
Figure SMS_20
Scaling factor on axis,/>
Figure SMS_21
For pixel coordinates at +.>
Figure SMS_22
Scaling factor on axis,/>
Figure SMS_23
,/>
Figure SMS_24
,/>
Figure SMS_25
,/>
Figure SMS_18
Are all internal parameters of the camera. The coordinate Z in the camera coordinate system is directly obtained from the acquired depth values.
And converting the coordinates (u, v) of the target goods shelf commodity under the pixel coordinate system into the coordinates (X, Y, Z) under the camera coordinate system, converting the coordinates into the coordinates (X, Y, Z) under the world coordinate system, and transmitting the converted coordinates to the mechanical arm to execute the target goods grabbing task.
When the gesture of grabbing goods on the goods shelf is not concerned, converting matrix from camera coordinate system to world coordinate system
Figure SMS_26
Represented by the formula: />
Figure SMS_27
Conversion matrix from camera coordinate system to world coordinate system when attention is paid to grabbing goods posture of goods on goods shelves
Figure SMS_28
Represented by the formula: />
Figure SMS_29
Wherein a represents the world coordinate system and B represents the camera coordinate system. When (when)The coordinate transformation matrix from the camera coordinate system to the world coordinate system can be obtained by
Figure SMS_35
Representation of->
Figure SMS_36
Representing the coordinates of the goods on the shelf in the world coordinate system, < + >>
Figure SMS_37
Representing the coordinates of the goods on the shelf under the camera coordinate system, < + >>
Figure SMS_38
For the pose matrix of the camera coordinate system relative to the world coordinate system, < >>
Figure SMS_39
Is a matrix of locations of the camera coordinate system relative to the world coordinate system. When the pose of the grabbed goods is considered, the target goods are regarded as an object coordinate system C, and the coordinate transformation matrix from the camera coordinate system to the world coordinate system is formed by +.>
Figure SMS_40
Representation, i.e. coordinate transformation matrix of camera coordinate system to world coordinate system>
Figure SMS_41
Coordinate transformation matrix from object coordinate system to camera coordinate system>
Figure SMS_30
Is a product of (a) and (b). In the present example, the gesture matrix +.>
Figure SMS_31
、/>
Figure SMS_32
And position matrix->
Figure SMS_33
、/>
Figure SMS_34
And updating in real time by using a correlation function built in the mechanical arm.
The test platform and the experimental environment of the embodiment of the invention are as follows: windows 10 professional operating system, NVIDIA GeForce RTX 3060 Ti video card, 8GB video memory size, CPU configured as Intel Kui ™ i5-12400 processor, CUDA version 11.3.1, pytorch version 1.12.0, python language environment 3.8.1.
In order to verify the effectiveness of the method, the method is tested on a scene where 10 different types of commodities are placed and the placement positions are not fixed by using a deep learning method and a conventional image processing method, and the performance index comparison results of the different methods are shown in the following table 1.
Table 1 comparison of the identification effect of different methods
Method Identification accuracy/% FPS Times/s
Conventional methods use only conventional image processing methods and use only deep learning methods the methods herein 20408099 —58.644.230.4 —0.0180.0260.043
As can be seen from the table, compared with the traditional method, the method only uses the deep learning method and the traditional image processing method, the method has the highest identification accuracy, and all 10 different types of commodities are correctly identified in the test scene. The method comprises a deep learning method and a traditional image processing method, so that the recognition speed is lower than that of other methods, compared with the method which only uses the deep learning method, the FPS is reduced by 13.8, the time for processing each image is increased by 17ms, but the increased time is not long and the user experience is not influenced on the premise of realizing accurate grabbing from goods on shelves with various goods and unfixed placement positions.
Although embodiments of the invention have been shown and described, the detailed description is to be construed as exemplary only and is not limiting of the invention as the particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples, and modifications, substitutions, variations, etc. may be made in the embodiments as desired by those skilled in the art without departing from the principles and spirit of the invention, provided that such modifications are within the scope of the appended claims.

Claims (5)

1. The method for identifying, positioning and grabbing goods on the goods shelf by the composite robot is characterized by comprising the following steps:
s1: the tail end of the mechanical arm of the compound robot is provided with a binocular structured light infrared camera and a two-finger hand-grabbing device; acquiring goods shelf image data sets of different types, different light environments and different visual angles by using a binocular structured light infrared camera to generate a training set and a testing set;
each sample comprises an RGB image, a depth image and an image marking result of the goods shelf, the RGB image marking result is stored as a txt file, and each row in the txt file comprises a category code number of the goods shelf, a u coordinate of a goods shelf center point, a v coordinate of the goods shelf center point, a horizontal length proportion and a vertical height proportion of the goods shelf in the image; taking each specification commodity of each brand of each commodity as a category and having a unique category code number;
s2: building a target detection network by using a deep learning method and training;
the input of the target detection network is an RGB image of a goods shelf commodity, and the class code and the commodity position of the commodity are output, wherein the commodity position is the coordinates (u, v) of a commodity center in the image;
the target detection network selects a yolov5 network, and deletes a Resize function for changing the commodity size in a data set loading class DataLoader of the target detection network;
s3: building a target detection reasoning frame, and reasoning by using a trained target detection network;
the target detection reasoning framework is connected with the input of the binocular structured light infrared camera and the target detection network which is trained to obtain the optimal weight, the RGB image and the depth image which are acquired by the binocular structured light infrared camera are acquired in real time, and the RGB image is input into the target detection network to predict the commodity category and the position of the goods shelf;
s4: verifying the prediction result of the target detection network by using a mean value hash algorithm and a three-histogram algorithm, when both the mean value hash algorithm and the three-histogram algorithm pass verification, indicating that the prediction is accurate, continuing the next step, if the mean value hash algorithm and the three-histogram algorithm do not pass verification, indicating that the prediction is wrong, and turning to S3 to continue detecting the next frame of image;
acquiring a front RGB image as a template of each type of goods according to each type of goods;
the mean hash algorithm verification means that a mean hash algorithm is used for calculating the Hamming distance of a current RGB image and a goods shelf commodity template of a target detection network prediction type, when the Hamming distance is smaller than 4, the prediction is correct, otherwise, the prediction is incorrect;
the three-histogram algorithm verification means that three-histogram algorithm is used for calculating the identity of the current RGB image and the goods shelf templates of the target detection network prediction category, the average value of the Babbitt coefficients of three channels is used as a similarity value, when the similarity is larger than 0.8, the prediction is correct, and otherwise, the prediction is wrong;
s5: judging whether the current goods shelf goods are target goods required by the user, if not, continuing to transfer to S3 to detect the next frame of image; if so, acquiring a depth value Z of the goods on the goods shelf according to the current depth image, converting the coordinates of the goods on the goods shelf in the image into a world coordinate system, transmitting the coordinates to a mechanical arm of the compound robot, and executing a target goods grabbing task by a two-finger manual grabbing device.
2. The method according to claim 1, wherein when the binocular structured light infrared camera is arranged to collect images, the horizontal direction of the camera is kept at a distance of 30cm from the commodity, and the vertical direction is 15cm higher than the plane of the shelf layer where the commodity is currently photographed.
3. The method according to claim 1 or 2, wherein in the step S3, the compound robot obtains the type of the commodity placed on each layer on each shelf in advance, when the compound robot receives the commodity demand of the user, the compound robot first determines the corresponding shelf and the layer where the commodity is located according to the type of the target commodity, moves the compound robot to the corresponding shelf, moves the mechanical arm of the compound robot, and uses the binocular structured light infrared camera to capture the commodity image in the shelf layer where the target commodity is located.
4. The method according to claim 1, wherein in the step S2, when training the target detection network, the set loss function is composed of three parts, namely, a bounding box regression loss, a target confidence loss and a category loss; the method comprises the steps that a bounding box regression Loss CIoULSs is calculated by using a CIoU Loss function, a target confidence Loss BCELoss is calculated by using a binary cross entropy Loss function, and a category Loss Focalloss is calculated by using a Focal Loss function; the total Loss function Loss is: loss=ciouloss+bceloss+focalloss.
5. The method according to claim 1 or 2, wherein in step S5, the internal reference of the binocular structured optical infrared camera is obtained in advance
Figure QLYQS_1
,/>
Figure QLYQS_2
,/>
Figure QLYQS_3
,/>
Figure QLYQS_4
Then the coordinates of goods on the goods shelf in the image are calculatedu,v) Converting into a camera coordinate system to obtain a coordinate [ (x-ray) of the cameraX,Y,Z) Wherein Z is derived from the acquired depth values, the transformation matrix is as follows:
Figure QLYQS_5
converting coordinates (X, Y, Z) of the goods on the shelf under a camera coordinate system to a world coordinate system, and transmitting the coordinates to the mechanical arm; the origin of the world coordinate system is set at the center of the loading plane of the mechanical arm, the z-axis vertical loading plane is outwards, and the y-axis vertical horizontal plane is upwards.
CN202310206998.XA 2023-03-07 2023-03-07 Method for identifying, positioning and grabbing goods on goods shelves by composite robot Active CN116061187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310206998.XA CN116061187B (en) 2023-03-07 2023-03-07 Method for identifying, positioning and grabbing goods on goods shelves by composite robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310206998.XA CN116061187B (en) 2023-03-07 2023-03-07 Method for identifying, positioning and grabbing goods on goods shelves by composite robot

Publications (2)

Publication Number Publication Date
CN116061187A CN116061187A (en) 2023-05-05
CN116061187B true CN116061187B (en) 2023-06-16

Family

ID=86176963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310206998.XA Active CN116061187B (en) 2023-03-07 2023-03-07 Method for identifying, positioning and grabbing goods on goods shelves by composite robot

Country Status (1)

Country Link
CN (1) CN116061187B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152540B (en) * 2023-10-27 2024-01-09 浙江由由科技有限公司 Intelligent pricing method for fresh goods considering display position, sales volume and classification precision

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
JP2018106268A (en) * 2016-12-22 2018-07-05 東芝テック株式会社 Image processing device and image processing method
CN109685141A (en) * 2018-12-25 2019-04-26 哈工大机器人(合肥)国际创新研究院 A kind of robotic article sorting visible detection method based on deep neural network
CN110026987A (en) * 2019-05-28 2019-07-19 广东工业大学 Generation method, device, equipment and the storage medium of a kind of mechanical arm crawl track
CN110377033A (en) * 2019-07-08 2019-10-25 浙江大学 A kind of soccer robot identification based on RGBD information and tracking grasping means
CN111914921A (en) * 2020-07-24 2020-11-10 山东工商学院 Similarity image retrieval method and system based on multi-feature fusion
CN112170233A (en) * 2020-09-01 2021-01-05 燕山大学 Small part sorting method and system based on deep learning
CN112476434A (en) * 2020-11-24 2021-03-12 新拓三维技术(深圳)有限公司 Visual 3D pick-and-place method and system based on cooperative robot
AU2021101646A4 (en) * 2021-03-30 2021-05-20 Tianjin Sino-German University Of Applied Sciences Man-machine cooperative safe operation method based on cooperative trajectory evaluation
CN112927297A (en) * 2021-02-20 2021-06-08 华南理工大学 Target detection and visual positioning method based on YOLO series

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI20195670A1 (en) * 2019-08-12 2021-02-13 3R Cycle Oy Method and device for disassembling electronics
US11854255B2 (en) * 2021-07-27 2023-12-26 Ubkang (Qingdao) Technology Co., Ltd. Human-object scene recognition method, device and computer-readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018106268A (en) * 2016-12-22 2018-07-05 東芝テック株式会社 Image processing device and image processing method
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN109685141A (en) * 2018-12-25 2019-04-26 哈工大机器人(合肥)国际创新研究院 A kind of robotic article sorting visible detection method based on deep neural network
CN110026987A (en) * 2019-05-28 2019-07-19 广东工业大学 Generation method, device, equipment and the storage medium of a kind of mechanical arm crawl track
CN110377033A (en) * 2019-07-08 2019-10-25 浙江大学 A kind of soccer robot identification based on RGBD information and tracking grasping means
CN111914921A (en) * 2020-07-24 2020-11-10 山东工商学院 Similarity image retrieval method and system based on multi-feature fusion
CN112170233A (en) * 2020-09-01 2021-01-05 燕山大学 Small part sorting method and system based on deep learning
CN112476434A (en) * 2020-11-24 2021-03-12 新拓三维技术(深圳)有限公司 Visual 3D pick-and-place method and system based on cooperative robot
CN112927297A (en) * 2021-02-20 2021-06-08 华南理工大学 Target detection and visual positioning method based on YOLO series
AU2021101646A4 (en) * 2021-03-30 2021-05-20 Tianjin Sino-German University Of Applied Sciences Man-machine cooperative safe operation method based on cooperative trajectory evaluation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于机器视觉的目标定位与机器人规划系统研究;杨三永;曾碧;;计算机测量与控制(12);全文 *
基于深度神经网络的货架商品识别方法;刘照邦;袁明辉;;包装工程(01);全文 *
面向物流分拣任务的自主抓取机器人系统;马灼明;朱笑笑;孙明镜;曹其新;;机械设计与研究(第06期);全文 *

Also Published As

Publication number Publication date
CN116061187A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN110314854B (en) Workpiece detecting and sorting device and method based on visual robot
WO2020177432A1 (en) Multi-tag object detection method and system based on target detection network, and apparatuses
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
CN106530297A (en) Object grabbing region positioning method based on point cloud registering
Kasaei et al. Interactive open-ended learning for 3d object recognition: An approach and experiments
US10713530B2 (en) Image processing apparatus, image processing method, and image processing program
CN110929795B (en) Method for quickly identifying and positioning welding spot of high-speed wire welding machine
CN110610210B (en) Multi-target detection method
CN116061187B (en) Method for identifying, positioning and grabbing goods on goods shelves by composite robot
CN111428731A (en) Multi-class target identification and positioning method, device and equipment based on machine vision
CN113222982A (en) Wafer surface defect detection method and system based on improved YOLO network
CN112801988A (en) Object grabbing pose detection method based on RGBD and deep neural network
Hu et al. Trajectory image based dynamic gesture recognition with convolutional neural networks
RU2361273C2 (en) Method and device for identifying object images
CN113505629A (en) Intelligent storage article recognition device based on light weight network
CN111598172B (en) Dynamic target grabbing gesture rapid detection method based on heterogeneous depth network fusion
CN111240195A (en) Automatic control model training and target object recycling method and device based on machine vision
Shi et al. A fast workpiece detection method based on multi-feature fused SSD
Schwan et al. A three-step model for the detection of stable grasp points with machine learning
WO2018135326A1 (en) Image processing device, image processing system, image processing program, and image processing method
CN115319739A (en) Workpiece grabbing method based on visual mechanical arm
Moreno et al. Learning to grasp from point clouds
CN110728222B (en) Pose estimation method for target object in mechanical arm grabbing system
Daqi et al. An industrial intelligent grasping system based on convolutional neural network
CN112200762A (en) Diode glass bulb defect detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant