CN114495109A - Grabbing robot based on matching of target and scene characters and grabbing method and system - Google Patents

Grabbing robot based on matching of target and scene characters and grabbing method and system Download PDF

Info

Publication number
CN114495109A
CN114495109A CN202210081494.5A CN202210081494A CN114495109A CN 114495109 A CN114495109 A CN 114495109A CN 202210081494 A CN202210081494 A CN 202210081494A CN 114495109 A CN114495109 A CN 114495109A
Authority
CN
China
Prior art keywords
target
detection
grabbing
text
coordinate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210081494.5A
Other languages
Chinese (zh)
Inventor
许庆阳
刘志超
丁凯旋
宋勇
李贻斌
张承进
袁宪锋
庞豹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210081494.5A priority Critical patent/CN114495109A/en
Publication of CN114495109A publication Critical patent/CN114495109A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of intelligent robots, and provides a grabbing robot based on matching of a target and scene characters, a grabbing method and a grabbing system, wherein according to a target image to be grabbed and a target detection model obtained by a camera, CNN is used for carrying out feature extraction, and classification results and a boundary frame of the target to be grabbed are obtained through regression; for targets with the same classification result, extracting characters in a target detection box area by adopting a text detection and identification model for detection and identification, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target; and positioning the specific grabbing target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.

Description

Grabbing robot based on matching of target and scene characters and grabbing method and system
Technical Field
The invention belongs to the field of intelligent robots, and particularly relates to a grabbing robot based on matching of a target and scene characters, and a grabbing method and a grabbing system of the grabbing robot.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In the prior art, a robot grabbing detection algorithm mostly directly grabs and detects a single object, or distinguishes a plurality of objects by adopting a complex neural network to perform methods such as segmentation, classification and marking. However, when a large number of object objects exist in the grabbing scene, and information such as appearance colors of the objects is consistent or the objects belong to the same category, the above detection algorithm cannot finely distinguish the objects, which directly affects grabbing judgment of the robot, and results in insufficient grabbing precision.
Disclosure of Invention
In order to solve at least one technical problem in the background technology, the invention provides a grabbing robot based on matching of a target and scene characters, a grabbing method and a grabbing system.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a grabbing robot based on matching of an object and scene characters, which comprises: the system comprises a depth camera, a chassis, a mechanical arm and a controller;
the controller comprises a preliminary detection module of an object to be grabbed, a text detection and identification module and an object grabbing module;
the preliminary detection module of the object to be grabbed is configured to: according to the image of the target to be captured and the target detection model obtained by the camera, performing feature extraction by using CNN (CNN), and performing regression to obtain a classification result and a boundary frame of the target to be captured;
the text detection recognition module is configured to: for targets with the same classification result, extracting characters in a target detection box area by adopting a text detection and identification model for detection and identification, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target;
the object grabbing module is configured to: and positioning the specific grabbing target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
The second aspect of the present invention provides a capture method based on matching of a target and a scene text, including the following steps:
acquiring an image of a target to be captured;
performing feature extraction by using CNN according to the image of the target to be captured and the target detection model, and performing regression to obtain a classification result and a boundary frame of the target to be captured;
for targets with the same classification result, extracting characters in a target detection box area by adopting a text detection and identification model for detection and identification, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target;
and positioning the specific grabbing target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
The third aspect of the present invention provides a capture system based on matching of a target and a scene text, including:
the robot comprises a preliminary detection module of an object to be grabbed, a text detection and identification module and an object grabbing module;
the preliminary detection module of the target to be grabbed is used for acquiring an image of the target to be grabbed; performing feature extraction by using CNN according to the image of the target to be captured and the target detection model, and performing regression to obtain a classification result and a boundary frame of the target to be captured;
the text detection and identification module is used for extracting characters in a target detection frame area for detection and identification by adopting a text detection and identification model for targets with the same classification result, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target;
the target grabbing module is used for positioning the specific grabbed target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, the lightweight target detection model NanoDet is used for carrying out target detection on the object to be grabbed, and then the image in the detection frame area is subjected to enhancement processing, so that adverse factors such as the target area is too small are overcome. And carrying out character detection and recognition on the enhanced detection box area by utilizing a character detection and recognition model PP-OCR, and extracting character information. And fusing the target information provided by the two models to realize matching of the character recognition result and the object target detection frame, and finishing accurate positioning of the object to be grabbed. The real-time tracking of a specific target is realized through a KCF tracking algorithm, so that the accurate grabbing control of the robot is realized.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.
FIG. 1 is a schematic overall flow chart of a target capture monitoring and positioning method according to an embodiment of the present invention;
FIG. 2 is a structure diagram of a NanoDet according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a process of enhancing an image of a detection frame region according to an embodiment of the present invention;
FIG. 4 is a schematic view of a PP-OCR detection flow according to an embodiment of the present invention;
FIG. 5 is a diagram of a CRNN structure according to an embodiment of the present invention;
FIGS. 6(a) -6 (b) illustrate an IOU calculation process according to an embodiment of the present invention;
FIGS. 7(a) -7 (d) illustrate a target tracking process according to one embodiment of the present invention;
8(a) -8 (c) illustrate a depth camera calibration and registration process according to an embodiment of the present invention;
FIGS. 9(a) -9 (c) illustrate a robot grasping action according to an embodiment of the present invention;
fig. 10(a) -10 (e) are diagrams illustrating the character detection and recognition effect according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
The invention provides a method for constructing an accurate detection system for a specific captured target by utilizing character information of a captured object and fusing related information provided by a target detection and character detection identification algorithm, so as to realize accurate identification and positioning of the target object; meanwhile, the light-weight model is adopted, the real-time effect of the system is guaranteed, the grabbing task is convenient to deploy in the robot controller, and the problem that similar objects cannot be carefully distinguished in the current grabbing target detection algorithm is solved.
As shown in fig. 1, the present embodiment provides a grabbing robot based on matching of a target and scene characters, including a depth camera, a chassis, a mechanical arm, and a controller;
the controller comprises a preliminary detection module of an object to be grabbed and a text detection and identification module;
the depth camera is used for capturing an image of an object to be grabbed, and the preliminary detection module of the object to be grabbed is configured to: and performing feature extraction by using the CNN according to the image of the target to be captured and the target detection model, and regressing to obtain the classification result category and the boundary frame of the target to be captured.
The text detection recognition module is configured to: and for the targets with the same classification result, extracting characters in a target detection frame area by adopting a text detection and recognition model for detection and recognition, obtaining an initial three-dimensional coordinate after the character recognition result is successfully matched with the specific target, positioning the specific captured target detection frame by utilizing a target tracking algorithm to obtain a real-time captured coordinate, and controlling the chassis motion and the mechanical arm motion to capture the specific target according to the captured coordinate.
In this embodiment, the target detection model is a NanoDet, which is a high-speed and lightweight anchor-free target detection model, and can provide performance close to the YOLO series, and is also convenient for training and transplantation.
The network structure of the target detection model is shown in fig. 2. The NanoDet is a FCOS (fuzzy conditional One-Stage Object Detection) style Detection network, and the model can be divided into three parts: the system comprises a backbone network, a feature fusion layer and a detection head. In order to ensure that the model parameter volume is as small as possible, the backbone network adopts ShuffleNet V2.0, removes the last layer of convolution of the ShuffleNet V2.0, extracts the 8, 16 and 32 times of downsampled features and inputs the features into PAN for multi-scale feature fusion.
And the feature fusion layer part adopts PAN which increases a mode from bottom to top, performs down-sampling on the low-order feature mapping, and then adds a down-sampling result to the high-order features. The detection head part of the NanoDet adopts two 96-channel convolution layers, and the same set of convolution calculation is used for frame regression and classification.
The target detection module can obtain the detection frame of the object in real time and classify the image target in the detection frame, thereby positioning the two-dimensional position of each object in the image, and for the objects with the same classification result, the detection system further distinguishes the objects by extracting the character information on the objects.
Because the camera shoots the whole scene image, the area of the region where the characters are located is small, and the region is often influenced by other factors such as illumination, and if the whole image is input to the character detection network, the characteristics of the character region cannot be fully extracted, so that the text region detection effect is poor.
In order to improve the accuracy of subsequent text detection and recognition, character area image enhancement operation is performed before character detection.
As shown in fig. 3, the controller further includes a text region image enhancement module configured to: image cropping, image enlargement and filling, gray processing and image sharpening.
(1) Image cutting: cutting each target object from the whole image according to a target boundary frame generated by target detection;
(2) image enlargement and filling: because each cut target area is too small, the text amplifies the cut area to be twice of the original area by using a bicubic interpolation method, carries out boundary filling on the amplified image, and fills each detection frame area into a square image with the same length-width ratio;
(3) gray level processing: carrying out graying processing on the amplified and filled picture, removing the influence of variables such as color illumination and the like, and then carrying out histogram equalization transformation on the grayscale picture to increase the text area contrast;
(4) image sharpening: and finally, enhancing the edge part of the characters in the image by adopting an image sharpening method to make the characters clearer.
Through the processing, the enhanced picture with the same shape of each target detection frame area is obtained and is used as the input of the text detection recognition model.
As shown in fig. 4, the text detection and recognition model includes a text detection module, a detection box correction module, and a text recognition module.
The text detection module is configured to:
locating text in an imageIn this region, DB-Net is used as a text detector, and the DB-Net performs binarization processing in a segmentation network and standard binarization processing Bi,jModified to differentiable binarization processing function
Figure BDA0003486085940000071
Figure BDA0003486085940000072
Figure BDA0003486085940000073
In the two formulae, Bi,j
Figure BDA0003486085940000074
Is a binary map, Pi,jIs a probability map, T, Ti,jTo set the threshold.
The problem that the gradient of standard binarization is not differentiable in training is solved by using differentiable binarization. In order to further improve the efficiency, six strategies are adopted in PP-OCR to reduce the weight of DB-Net.
The detection frame rectification module is configured to:
before the text of the detection box is recognized, the detection box needs to be corrected, a text direction classifier is designed in PP-OCR, the text detection box is firstly converted into a horizontal rectangular box through geometric transformation, then the converted text direction is judged, and if the text box is reverse, the text box is further turned over. And meanwhile, four strategies are adopted to enhance the model capability and reduce the model volume.
The text recognition module is configured to: the CRNN is used as a text recognizer, the CRNN network structure is shown in fig. 5, the CRNN integrates feature extraction and sequence modeling, and sequence alignment is performed by using CTC (Connectionist Temporal Classification) loss. In order to enhance the text recognition capability and reduce the model volume, nine strategies are adopted for processing the text.
In this embodiment, the text detection and recognition model adopts an ultra-lightweight PP-OCR character detection and recognition network; making it easier to deploy to the mobile end. Through PP-OCR character detection and recognition, character information contained in all target objects in target detection is obtained, and even if the target objects are classified into the same class by the target detection module, the object objects can be further distinguished according to the recognized character information.
The obtaining of the initial three-dimensional coordinates is configured to: and determining whether the given character information contained in the recognition result text belongs to the corresponding actual object in the detection box, if so, completing the matching of the characters and the target object, and combining the coordinates of the matched target boundary box with the camera depth information to obtain the initial three-dimensional coordinates of the captured target.
The preliminary detection module and the text detection and identification module for the target to be grabbed are carried out by taking an image frame as a unit and setting a current frame of image FiExtracting multiple regions D with the target detection frame as the unit through the above process1,D2,…,DnD is the corresponding actual object in the detection frame1,d2,…,dnGiving specific character information T to be matched in the task, and carrying out character detection and identification on n detection frame areas, wherein the identification result is [ T1,T2,…,Tn]If at a certain recognition result, the text TtIf the given character information t is included, the following determination can be made:
Figure BDA0003486085940000081
Figure BDA0003486085940000082
by the above formula, it is determined that the character information t belongs to the object dtAnd then matching the characters with the target object, and combining the detection position of the target object at the moment with the depth information to obtain the initial three-dimensional coordinate of the captured target.
According to the physical characteristics of the mobile service robot, the robot takes the initial three-dimensional coordinates as the first coordinate input, chassis movement is carried out, and grabbing actions are prepared. The moving of the robot chassis can cause the real-time change of the picture captured by the camera, so that the relative offset of the detection frame positioned according to the character recognition result at the previous moment is generated, the robot needs to receive the new coordinate position of the object in real time, if each frame carries out character detection and recognition on the new detection frame area, the network calculation amount is huge, the real-time effect of the whole system is deteriorated, and the robot grabbing efficiency is directly influenced. Therefore, a tracking algorithm is adopted to track the target in real time.
In order to solve the problem of target position transformation caused by robot movement, a target tracking algorithm KCF (Kernel Correlation Filter) tracking algorithm based on a Kernel Correlation Filter is introduced, and is configured to:
a cyclic matrix is constructed for the collected image blocks to represent samples for densely sampling the target and the background thereof, so that a large number of training sets are constructed. After the first frame image is detected in two stages, a target object detection frame to be grabbed is positioned, at the moment, a KCF algorithm is utilized to track the area of the positioned target detection frame in real time, and the training of a tracker is to find a filter omega which enables a target function to be minimum;
wherein, the step of solving omega is as follows:
(1) a ridge regression equation is constructed:
Figure BDA0003486085940000091
Xtfor a single training sample taken, ytλ is the regularization parameter for the corresponding confidence sample label, preventing overfitting of the regression.
The cyclic shift of a single training sample constitutes a sample set X, which is a cyclic matrix as follows:
Figure BDA0003486085940000092
(2) regression on LingIn the equation, f (X) ═ ωTX, taking the derivative of equation with respect to ω, we can obtain:
ω=(XTX+λI)-1XTY
wherein, XTIs the transpose of a training sample X, I is an identity matrix, Y is a column vector, represented by the label YtAnd (4) forming.
The circulant matrix X has the property of being diagonalized in fourier space, substituting the fourier diagonalized equation into a ridge regression as follows:
Figure BDA0003486085940000101
wherein F is a discrete Fourier matrix, X represents the Fourier transformed value of the first row matrix of X,
Figure BDA0003486085940000102
FHis the conjugate transpose of F.
After a series of transformations, the following results are obtained:
Figure BDA0003486085940000103
according to the Fourier space transformation, there are:
Figure BDA0003486085940000104
in the formula, F-1Is an inverse fourier transform.
Positioning the specific grabbed target detection box by using a target tracking algorithm comprises:
in the process that the robot executes the grabbing task movement, considering that the KCF tracking algorithm drifts when errors accumulate for a long time, the embodiment finds the target detection frame with the largest calculation result by calculating the intersection ratio (IOU) between the tracking frame in each frame of image and all the current target detection frames, and can position the boundary frame of the target to be grabbed in each frame of image.
The IOU calculation diagram is shown in FIG. 6(a) -FIG. 6(b), and the IOU calculation formula is:
Figure BDA0003486085940000105
suppose that n detection frames with the same classification label result are generated in the target detection at this time, and are respectively A1,A2,……,An
Through character detection and identification, the position is located to AtThe object in the detection frame contains specific character information, namely AtDetecting the target object in the frame as the target to be grabbed, and using a KCF tracking algorithm to pair A at the momenttThe target in the process is sampled to generate a tracking frame T, the target is tracked in real time in the moving process of the robot, and T and A are calculated in the whole processi(i-1, 2, …, n) at each instant, where the IOU value is maximized is atThe calculation formula is as follows:
Figure BDA0003486085940000111
fig. 7(a) -7 (d) show the complete process of target detection frame tracking from time T0 to time T3.
The detection frame with the largest IOU can be found and tracked, so that the real-time positioning of the detection frame for the grabbed target can be realized, the real-time position of the grabbed target can be updated, and the robot finishes the grabbing task according to the grabbing position. The positioning strategy for grabbing the target by the robot can realize real-time positioning of the detection frame only by once character detection and identification, so that the overall calculation amount is reduced, and the real-time property is ensured.
As shown in fig. 8(a) -8 (c), for the depth camera calibration and registration process, configured to:
using 8 multiplied by 11 checkerboard to calibrate RGB and depth maps of the depth camera by Zhang-Zhengyou calibration method, and obtaining that the internal reference matrixes of the RGB camera and the depth camera are respectively HrgbAnd HirThe external reference matrix consists of a rotation matrix and a translation vectorAre each Rrgb、TrgbAnd Rir、Tir
Let PrgbAnd PirThe spatial coordinates of a certain point under the coordinates of the RGB camera and the depth camera are respectively, and because the coordinates of the depth camera and the coordinates of the RGB camera are different, the left relation between the two can be related by a rotation matrix and a translation vector:
Prgb=RPir+T
by computational derivation, the rotation matrix R and the translation vector T can be represented as:
Figure BDA0003486085940000112
Figure BDA0003486085940000113
and performing camera coordinate conversion by using the rotation matrix and the translation vector obtained by calculation to align the RGB-D images, and manually fine-tuning the translation vector between the two cameras according to the actual alignment condition to obtain a better alignment effect.
As shown in fig. 9(a) -9 (c), for the robot arm grasping action, after positioning the specific grasping object detecting frame, the robot arm grasping action is configured to:
and using the two-dimensional coordinates of the central area of the positioned grabbing detection frame and the depth information of the area corresponding to the registered depth map as original three-dimensional coordinate information of the grabbing object, calculating a transformation matrix of a camera coordinate system and a mechanical arm coordinate system, and mapping the three-dimensional coordinates obtained by the camera to the mechanical arm coordinate system, namely the grabbing coordinates of the robot. The robot moves to the reach range of the mechanical arm according to the grabbing coordinates received in real time, the mechanical arm executes grabbing actions, and the robot finishes grabbing tasks.
The invention integrates two detection algorithms of target detection and character detection identification, integrates character information on the basis that the target detection algorithm provides position information, and realizes accurate detection of a specific target object. The detection system is built by adopting a lightweight deep learning model, the deployment is easy to carry out on the robot controller, the real-time effect is achieved at the robot end, and the design experiment proves that the method provided by the invention has higher feasibility aiming at the medicine bottle grabbing scene of patients in hospitals, the robot finishes the real-time detection and positioning of a specific target by identifying the specific character information on the medicine bottle, and realizes the intelligent medicine bottle grabbing task of the robot in the hospital scene.
Example two
The embodiment provides a capture method based on matching of a target and scene characters, which comprises the following steps:
step 1: acquiring an image of a target to be captured;
step 2: performing feature extraction by using CNN according to the image of the target to be captured and the target detection model, and performing regression to obtain a classification result and a boundary frame of the target to be captured;
and step 3: for targets with the same classification result, extracting characters in a target detection frame area by adopting a text detection and recognition model for detection and recognition, and obtaining an initial three-dimensional coordinate after the character recognition result is successfully matched with a specific target;
and 4, step 4: and positioning the specific grabbing target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
EXAMPLE III
The embodiment provides a capture system based on matching of a target and a scene text, which includes: the robot is used for receiving a grabbing instruction issued by the terminal;
the robot comprises a preliminary detection module of an object to be grabbed, a text detection and identification module and an object grabbing module;
the preliminary detection module of the target to be grabbed is used for acquiring an image of the target to be grabbed; performing feature extraction by using CNN according to the image of the target to be captured and the target detection model, and performing regression to obtain a classification result and a boundary frame of the target to be captured;
the text detection and identification module is used for extracting characters in a target detection frame area for detection and identification by adopting a text detection and identification model for targets with the same classification result, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target;
the target grabbing module is used for positioning the specific grabbed target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
Taking a grabbing scene of a service robot in a medical environment as an example, firstly, a command for grabbing a medicine bottle is given to the robot, namely, name information of a specific patient is sent to the robot. The target detection module detects and frames all medicine bottles in the robot visual field to obtain the position of a boundary frame of each medicine bottle, then the image enhancement operation extracts a target image in the detection frame area and performs enhancement processing, the enhanced image is sent to the character detection and identification module for character detection and identification, and finally the given patient name information is matched according to the character identification result. The character detection effect is shown in fig. 10(a) to 10 (e).
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. Snatch robot based on target and scene characters match, its characterized in that includes: the system comprises a depth camera, a chassis, a mechanical arm and a controller; the controller comprises a preliminary detection module of the object to be grabbed, a text detection and identification module and an object grabbing module;
the preliminary detection module of the object to be grabbed is configured to: according to the target image to be captured and the target detection model obtained by the depth camera, performing feature extraction by using CNN (CNN), and performing regression to obtain a classification result and a boundary frame of the target to be captured;
the text detection recognition module is configured to: for targets with the same classification result, extracting characters in a target detection box area by adopting a text detection and identification model for detection and identification, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target;
the object grabbing module is configured to: and positioning the specific grabbing target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
2. The capture robot based on target and scene literal matching of claim 1, wherein the target tracking algorithm is configured to: a target tracking algorithm KCF tracking algorithm based on a kernel correlation filter is introduced, a cyclic matrix is constructed for an acquired image block to represent samples for densely sampling a target and a background thereof, a large number of training sets are constructed, training is carried out, and a filter with the minimum target function is searched.
3. The capture robot based on object and scene literal matching of claim 1, wherein the object detection model is configured to: the method adopts a NanoDet network and comprises a backbone network, a feature fusion layer and a detection head, wherein the backbone network adopts ShuffleNet V2.0, and the feature fusion layer adopts PAN.
4. The target and scene text matching-based crawling robot of claim 1, wherein the text detection recognition model employs a PP-OCR text detection recognition network.
5. The object-grabbing robot based on object-and-scene-literal matching according to claim 1, where said locating a specific grabbing object detection box is configured to: and taking the two-dimensional coordinates of the central area of the positioned grabbing detection frame and the depth information of the area corresponding to the registered depth map as original three-dimensional coordinate information of the grabbing object, calculating a transformation matrix of a camera coordinate system and a mechanical arm coordinate system, and mapping the three-dimensional coordinates obtained by the camera to the mechanical arm coordinate system, namely the final grabbing coordinates.
6. The capture robot based on object and scene literal matching of claim 5, wherein the depth map is derived by a depth camera calibration and registration process configured to:
calibrating RGB and a depth map of the depth camera by using a Zhang Zhengyou calibration method to obtain an internal reference matrix and an external reference matrix of the RGB camera and the depth camera, wherein the external reference matrix consists of a rotation matrix and a translation vector;
and performing camera coordinate conversion according to the obtained rotation matrix and translation vector to obtain a depth map.
7. The target-and-scene-text-matching-based crawling robot of claim 1, wherein the controller further comprises a text region image enhancement module configured to: and performing image cutting, image amplification and filling, gray processing and image sharpening on the target image to be captured.
8. The target-to-scene text matching-based crawling robot of claim 1, wherein the matching of the text recognition result to a specific target comprises: and comparing a plurality of areas taking the target detection frame as a unit with corresponding actual objects in the detection frame according to the given specific character information to be matched, judging whether the given character information contained in the recognition result text belongs to the corresponding actual object objects in the detection frame, and if so, completing the matching of the characters and the target objects.
9. The grabbing method based on matching of the target and the scene characters is characterized in that the method is applied to a robot and comprises the following steps:
acquiring an image of a target to be captured;
according to the image of the target to be grabbed and the target detection model, performing feature extraction by using CNN (convolutional neural network), and regressing to obtain a classification result and a bounding box of the target to be grabbed;
for targets with the same classification result, extracting characters in a target detection box area by adopting a text detection and identification model for detection and identification, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target;
and positioning the specific grabbing target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
10. Grabbing system based on target and scene characters match, characterized in that, the system is applied to the robot, includes: the system comprises a to-be-grabbed target preliminary detection module, a text detection and identification module and a target grabbing module;
the preliminary detection module of the target to be grabbed is used for acquiring an image of the target to be grabbed; according to the image of the target to be grabbed and the target detection model, performing feature extraction by using CNN (convolutional neural network), and regressing to obtain a classification result and a bounding box of the target to be grabbed;
the text detection and identification module is used for extracting characters in a target detection frame area for detection and identification by adopting a text detection and identification model for targets with the same classification result, and obtaining an initial three-dimensional coordinate after the character identification result is successfully matched with a specific target;
the target grabbing module is used for positioning the specific grabbed target detection frame by using a target tracking algorithm to obtain a final grabbing coordinate, and controlling the chassis motion and the mechanical arm motion to grab the specific target according to the grabbing coordinate.
CN202210081494.5A 2022-01-24 2022-01-24 Grabbing robot based on matching of target and scene characters and grabbing method and system Pending CN114495109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210081494.5A CN114495109A (en) 2022-01-24 2022-01-24 Grabbing robot based on matching of target and scene characters and grabbing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210081494.5A CN114495109A (en) 2022-01-24 2022-01-24 Grabbing robot based on matching of target and scene characters and grabbing method and system

Publications (1)

Publication Number Publication Date
CN114495109A true CN114495109A (en) 2022-05-13

Family

ID=81474528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210081494.5A Pending CN114495109A (en) 2022-01-24 2022-01-24 Grabbing robot based on matching of target and scene characters and grabbing method and system

Country Status (1)

Country Link
CN (1) CN114495109A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019202A (en) * 2022-05-26 2022-09-06 北京化工大学 Step-by-step grabbing detection method applied to service type mobile mechanical arm
CN115219852A (en) * 2022-09-19 2022-10-21 国网江西省电力有限公司电力科学研究院 Intelligent fault studying and judging method for distribution line of unmanned aerial vehicle

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145200A (en) * 2007-10-26 2008-03-19 浙江工业大学 Inner river ship automatic identification system of multiple vision sensor information fusion
CN107967473A (en) * 2016-10-20 2018-04-27 南京万云信息技术有限公司 Based on picture and text identification and semantic robot autonomous localization and navigation
CN108256523A (en) * 2018-01-11 2018-07-06 上海展扬通信技术有限公司 Recognition methods, device and computer readable storage medium based on mobile terminal
CN109599105A (en) * 2018-11-30 2019-04-09 广州富港万嘉智能科技有限公司 Dish method, system and storage medium are taken based on image and the automatic of speech recognition
CN109822561A (en) * 2018-11-30 2019-05-31 广州富港万嘉智能科技有限公司 It is a kind of that dish method, system and storage medium are taken based on speech recognition automatically
CN109948416A (en) * 2018-12-31 2019-06-28 上海眼控科技股份有限公司 A kind of illegal occupancy bus zone automatic auditing method based on deep learning
CN110992422A (en) * 2019-11-04 2020-04-10 浙江工业大学 Medicine box posture estimation method based on 3D vision
CN111482967A (en) * 2020-06-08 2020-08-04 河北工业大学 Intelligent detection and capture method based on ROS platform
CN111823236A (en) * 2020-07-25 2020-10-27 湘潭大学 Library management robot and control method thereof
CN112258161A (en) * 2020-11-03 2021-01-22 苏州市龙测智能科技有限公司 Intelligent software testing system and testing method based on robot
WO2021076205A1 (en) * 2019-10-14 2021-04-22 UiPath Inc. Systems and methods of activity target selection for robotic process automation
CN113220818A (en) * 2021-05-27 2021-08-06 南昌智能新能源汽车研究院 Automatic mapping and high-precision positioning method for parking lot
CN113344967A (en) * 2021-06-07 2021-09-03 哈尔滨理工大学 Dynamic target identification tracking method under complex background
CN113450408A (en) * 2021-06-23 2021-09-28 中国人民解放军63653部队 Irregular object pose estimation method and device based on depth camera
CN113555087A (en) * 2021-07-19 2021-10-26 吉林大学第一医院 Artificial intelligence film reading method based on convolutional neural network algorithm
US20220016766A1 (en) * 2020-07-14 2022-01-20 Vicarious Fpc, Inc. Method and system for grasping an object

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145200A (en) * 2007-10-26 2008-03-19 浙江工业大学 Inner river ship automatic identification system of multiple vision sensor information fusion
CN107967473A (en) * 2016-10-20 2018-04-27 南京万云信息技术有限公司 Based on picture and text identification and semantic robot autonomous localization and navigation
CN108256523A (en) * 2018-01-11 2018-07-06 上海展扬通信技术有限公司 Recognition methods, device and computer readable storage medium based on mobile terminal
CN109599105A (en) * 2018-11-30 2019-04-09 广州富港万嘉智能科技有限公司 Dish method, system and storage medium are taken based on image and the automatic of speech recognition
CN109822561A (en) * 2018-11-30 2019-05-31 广州富港万嘉智能科技有限公司 It is a kind of that dish method, system and storage medium are taken based on speech recognition automatically
CN109948416A (en) * 2018-12-31 2019-06-28 上海眼控科技股份有限公司 A kind of illegal occupancy bus zone automatic auditing method based on deep learning
WO2021076205A1 (en) * 2019-10-14 2021-04-22 UiPath Inc. Systems and methods of activity target selection for robotic process automation
CN110992422A (en) * 2019-11-04 2020-04-10 浙江工业大学 Medicine box posture estimation method based on 3D vision
CN111482967A (en) * 2020-06-08 2020-08-04 河北工业大学 Intelligent detection and capture method based on ROS platform
US20220016766A1 (en) * 2020-07-14 2022-01-20 Vicarious Fpc, Inc. Method and system for grasping an object
CN111823236A (en) * 2020-07-25 2020-10-27 湘潭大学 Library management robot and control method thereof
CN112258161A (en) * 2020-11-03 2021-01-22 苏州市龙测智能科技有限公司 Intelligent software testing system and testing method based on robot
CN113220818A (en) * 2021-05-27 2021-08-06 南昌智能新能源汽车研究院 Automatic mapping and high-precision positioning method for parking lot
CN113344967A (en) * 2021-06-07 2021-09-03 哈尔滨理工大学 Dynamic target identification tracking method under complex background
CN113450408A (en) * 2021-06-23 2021-09-28 中国人民解放军63653部队 Irregular object pose estimation method and device based on depth camera
CN113555087A (en) * 2021-07-19 2021-10-26 吉林大学第一医院 Artificial intelligence film reading method based on convolutional neural network algorithm

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ZHICHAO LIU ET AL.: "Scene images and text information‐based object location of robot grasping", 《IET CYBER‐SYSTEMS AND ROBOTICS》, 28 April 2022 (2022-04-28), pages 116 - 130 *
付纪元等: "家庭服务机器人闭环视觉伺服系统抓取研究", 《北京信息科技大学学报》, vol. 35, no. 3, 15 June 2020 (2020-06-15), pages 19 - 25 *
卢振利;谢亚飞;周立志;单长考;波罗瓦茨・布朗尼斯拉夫;李斌;: "基于机器视觉的机器人辨识及分拣盒装香烟的系统", 高技术通讯, no. 06, 15 June 2016 (2016-06-15) *
穆玉理;: "利用Pascal VOC目标检测数据深度学习进行目标检测", 通讯世界, no. 05, 25 May 2018 (2018-05-25) *
龙慧;朱定局;田娟;: "深度学习在智能机器人中的应用研究综述", 计算机科学, no. 2, 15 November 2018 (2018-11-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019202A (en) * 2022-05-26 2022-09-06 北京化工大学 Step-by-step grabbing detection method applied to service type mobile mechanical arm
CN115219852A (en) * 2022-09-19 2022-10-21 国网江西省电力有限公司电力科学研究院 Intelligent fault studying and judging method for distribution line of unmanned aerial vehicle
CN115219852B (en) * 2022-09-19 2023-03-24 国网江西省电力有限公司电力科学研究院 Intelligent fault studying and judging method for distribution line of unmanned aerial vehicle

Similar Documents

Publication Publication Date Title
CN113330490B (en) Three-dimensional (3D) assisted personalized home object detection
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
EP3499414B1 (en) Lightweight 3d vision camera with intelligent segmentation engine for machine vision and auto identification
CN108090435B (en) Parking available area identification method, system and medium
WO2020042419A1 (en) Gait-based identity recognition method and apparatus, and electronic device
CN104680508B (en) Convolutional neural networks and the target object detection method based on convolutional neural networks
CN107953329B (en) Object recognition and attitude estimation method and device and mechanical arm grabbing system
CN105574527B (en) A kind of quick object detecting method based on local feature learning
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN114495109A (en) Grabbing robot based on matching of target and scene characters and grabbing method and system
CN111862201A (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN111402331B (en) Robot repositioning method based on visual word bag and laser matching
CN112784712B (en) Missing child early warning implementation method and device based on real-time monitoring
CN110543817A (en) Pedestrian re-identification method based on posture guidance feature learning
CN112396036A (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN118189959A (en) Unmanned aerial vehicle target positioning method based on YOLO attitude estimation
CN114332814A (en) Parking frame identification method and device, electronic equipment and storage medium
Qureshi et al. Highway traffic surveillance over uav dataset via blob detection and histogram of gradient
CN113850195A (en) AI intelligent object identification method based on 3D vision
CN115797397B (en) Method and system for all-weather autonomous following of robot by target personnel
CN116912763A (en) Multi-pedestrian re-recognition method integrating gait face modes
CN110458177B (en) Method for acquiring image depth information, image processing device and storage medium
CN116977824A (en) Point cloud position identification method based on overlapping area
Yang et al. Target position and posture recognition based on RGB-D images for autonomous grasping robot arm manipulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination