CN111178250B

CN111178250B - Object identification positioning method and device and terminal equipment

Info

Publication number: CN111178250B
Application number: CN201911380815.6A
Authority: CN
Inventors: 刘培超; 徐培; 郎需林; 刘主福
Original assignee: Shenzhen Yuejiang Technology Co Ltd
Current assignee: Shenzhen Yuejiang Technology Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2024-01-12
Anticipated expiration: 2039-12-27
Also published as: CN111178250A

Abstract

The invention is applicable to the technical field of machine vision, and provides an object identification and positioning method, an object identification and positioning device and terminal equipment, wherein the method comprises the following steps: acquiring a two-dimensional image and point cloud data of a region to be detected; detecting the two-dimensional image through a pre-trained deep learning model, and identifying a two-dimensional target area and a geometric shape type corresponding to a target object in the two-dimensional image; mapping the two-dimensional target area to the point cloud data, and determining a first three-dimensional area of the target object according to a mapping result; and determining a second three-dimensional area of the target object according to the geometric shape type and the first three-dimensional area, and positioning the target object. The embodiment of the invention can improve the efficiency and accuracy of 3D object identification and positioning.

Description

Object identification positioning method and device and terminal equipment

Technical Field

The invention belongs to the technical field of machine vision, and particularly relates to an object identification and positioning method, an object identification and positioning device and terminal equipment.

Background

In industrial or robotic applications, it is often necessary to identify and locate objects by machine vision for subsequent gripping or other processing steps.

For existing three-dimensional (3D) objects, a 3D model matching algorithm is generally adopted, that is, a model matching is performed on an object to be detected according to a 3D model of the object to be detected, which is constructed in advance, so that the object to be detected is identified. However, the existing 3D model matching algorithm has poor robustness to occlusion and noisy background, and is easy to generate mismatching, so that the three-dimensional object recognition efficiency is low.

Disclosure of Invention

In view of this, the embodiments of the present invention provide an object identifying and positioning method, device and terminal device, so as to solve the problem in the prior art how to improve the efficiency and accuracy of 3D object identifying and positioning.

A first aspect of an embodiment of the present invention provides an object identifying and positioning method, including:

acquiring a two-dimensional image and point cloud data of a region to be detected;

detecting the two-dimensional image through a pre-trained deep learning model, and identifying a two-dimensional target area and a geometric shape type corresponding to a target object in the two-dimensional image;

mapping the two-dimensional target area to the point cloud data, and determining a first three-dimensional area of the target object according to a mapping result;

and determining a second three-dimensional area of the target object according to the geometric shape type and the first three-dimensional area, and positioning the target object.

A second aspect of an embodiment of the present invention provides an object recognition positioning device, including:

the first acquisition unit is used for acquiring a two-dimensional image and point cloud data of the region to be detected;

the identification unit is used for detecting the two-dimensional image through a pre-trained deep learning model and identifying a two-dimensional target area and a geometric shape type corresponding to a target object in the two-dimensional image;

the rough segmentation unit is used for mapping the two-dimensional target area to the point cloud data and determining a first three-dimensional area of the target object according to a mapping result;

and the positioning unit is used for determining a second three-dimensional area of the target object according to the geometric shape type and the first three-dimensional area and positioning the target object.

A third aspect of the embodiments of the present invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing steps of the object recognition positioning method when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements steps of a method for identifying and locating an object as described.

A fifth aspect of an embodiment of the invention provides a computer program product for, when run on a terminal device, causing the terminal device to perform the object recognition positioning method as described in the first aspect.

Compared with the prior art, the embodiment of the invention has the beneficial effects that: in the embodiment of the invention, the robustness of the method for recognizing the target object by the two-dimensional image is better than that of the method for recognizing the target object by the 3D model, so that the two-dimensional target area of the target object is recognized by the two-dimensional image, and then the two-dimensional target area is mapped to the three-dimensional point cloud space to determine the first three-dimensional area of the target object, so that the target object can be primarily recognized and the geometric shape type corresponding to the target object can be determined; and after the target object is primarily identified by determining the first three-dimensional area, the second three-dimensional area of the target object can be further accurately determined according to the geometric shape type of the target object and the first three-dimensional area, and the positioning of the target object is completed, so that the accuracy of object identification and positioning is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an implementation of a first object identification and positioning method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an implementation flow of a second object identification and positioning method according to an embodiment of the present invention;

FIG. 3 is a schematic view of an object recognition positioning device according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application, the terms "first," "second," "third," etc. are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Embodiment one:

fig. 1 shows a flowchart of a first object identifying and positioning method according to an embodiment of the present application, which is described in detail below:

in S101, a two-dimensional image of a region to be measured and point cloud data are acquired.

The region to be detected is a region containing a plurality of objects to be identified, and the region can simultaneously contain a plurality of objects to be identified. For convenience of description, an object to be recognized will be hereinafter referred to as a target object. The two-dimensional image of the region to be measured can be acquired through a depth camera (such as an RGBD camera) capable of acquiring the two-dimensional image, and also can be acquired through a common camera, wherein the two-dimensional image contains the two-dimensional shape information of the object in the region to be measured. The point cloud data of the region to be detected can be directly acquired by the depth camera, and can also be converted into corresponding point cloud space to be acquired according to the depth map acquired by the depth camera. The point cloud data comprises three-dimensional information of the region to be detected, and the three-dimensional information (such as three-dimensional coordinate information and three-dimensional direction information) of each object in the region to be detected can be reflected by the point cloud data. Preferably, the two-dimensional image of the region to be measured is a color image, because the color image includes color information of the target object in addition to the two-dimensional shape information of the target object, the target object can be better recognized. Optionally, the two-dimensional image and the point cloud data are acquired simultaneously by using one depth camera (for example, RGBD camera) that can be used to acquire the color map and the depth map, and may also be acquired by respectively acquiring the color camera and the depth camera.

Optionally, the acquiring the two-dimensional image and the point cloud data of the area to be measured includes:

acquiring a color image and a depth image of a region to be detected, wherein the color image is a two-dimensional image of the region to be detected;

and generating point cloud data of the region to be detected according to the color map and the depth map.

RGBD images of the region to be detected are acquired through an RGBD Depth camera, the RGBD images comprise a pair of color images (RGB map) and a pair of Depth images (Depth map), the resolutions of the color images and the Depth images are the same, after alignment operation, points on the color images correspond to points on the Depth images one by one, and the points on the Depth images can be converted into corresponding space position coordinates according to parameters of the camera. Therefore, the color map can be a two-dimensional image of the two-dimensional information of the region to be detected, and meanwhile, according to the depth map, point cloud data reflecting the three-dimensional information of the region to be detected can be generated.

In S102, the two-dimensional image is detected by a pre-trained deep learning model, and a two-dimensional target area and a geometric shape type corresponding to a target object in the two-dimensional image are identified.

In the embodiment of the present application, for convenience of description, an image area where a target object is located in a two-dimensional image is referred to as a two-dimensional target area. The pre-trained deep learning model is a target detection model trained in advance, a two-dimensional image is detected through the model, a two-dimensional target area corresponding to a target object in the two-dimensional image and a geometric shape type corresponding to the target object are identified, and the geometric shape type can comprise a plane, a sphere, a cylinder, a cuboid, an abnormal body and the like. Because the corresponding characteristic information of the target objects with different three-dimensional shapes is different when the projection mapping is carried out to the two-dimensional image, the three-dimensional geometric shape type of the target object can be preliminarily determined according to the two-dimensional image, and specifically, the target detection model trained in advance comprises the learned geometric shape characteristic extraction parameters. Inputting the two-dimensional image into the model, extracting all geometric feature information in the two-dimensional image through the geometric feature extraction parameters, and carrying out two-dimensional target area division and geometric type determination according to the geometric feature information.

Optionally, the target detection model includes a color feature information extraction network layer, a texture feature information extraction network layer, a geometric feature information extraction network layer and a discriminator; correspondingly, the detecting the two-dimensional image through the pre-trained deep learning model, and identifying the two-dimensional target area and the geometric shape type corresponding to the target object in the two-dimensional image, including:

s1021: inputting the two-dimensional image into a target detection model, and extracting the color characteristic information of the two-dimensional image through the color characteristic information extraction network layer; extracting texture feature information of the two-dimensional image through the texture feature information extraction network layer; extracting the network layer through the geometric feature to obtain the geometric feature information of the two-dimensional image;

s1022: according to the color feature information, texture feature information and geometric feature information of the two-dimensional image, carrying out region division on the two-dimensional image to obtain a two-dimensional target region in the two-dimensional image;

s1023: and determining the geometric shape type of the target object according to the information of the two-dimensional target area and the discriminator.

In S1021, after the two-dimensional image is subjected to preprocessing such as denoising or contrast enhancement, the three types of feature information including color feature information, texture feature information and geometric feature information of the two-dimensional image are extracted by the color feature information extraction network layer, the texture feature information extraction network layer and the geometric feature extraction network layer, respectively, and then the two-dimensional image is input into the target detection model.

In S1022, specifically, region division is performed according to the color feature information of the two-dimensional image, so as to obtain a first image; dividing the region according to the texture feature information of the two-dimensional image to obtain a second image; performing region division according to the geometric characteristic information of the two-dimensional image to obtain a third image; then, the intersection of the areas of the first image, the second image and the third image is obtained, and a two-dimensional target area in the two-dimensional image is determined.

In S1023, the geometric feature information of the two-dimensional target areas determined in step S1022 is acquired and input into a discriminator, and the geometric shape type of the target object corresponding to each two-dimensional target area in the two-dimensional image is determined. Wherein a two-dimensional target area corresponds to a target object.

In the embodiment of the application, when the two-dimensional target area is identified through the deep learning model, the color characteristic information, the texture characteristic information and the geometric characteristic information of the two-dimensional image can be respectively extracted, the three characteristic information are combined to divide the two-dimensional target area and finally determine the geometric shape type of the target object, so that the two-dimensional target area and the corresponding geometric shape type of the target object can be more accurately identified, and the accuracy of object identification and positioning is improved.

Optionally, when the two-dimensional target area is identified, the two-dimensional target area is selected in a two-dimensional image through a rectangle ROI (Region of interest), and the geometric shape type information of the target object corresponding to the two-dimensional target area is marked.

Optionally, before the step S102, the method further includes:

acquiring a two-dimensional sample image, wherein the two-dimensional sample image contains two-dimensional image information of a predetermined number of target objects;

selecting a two-dimensional target area corresponding to a target object in the two-dimensional sample image, and identifying a corresponding geometric shape type label;

and training the two-dimensional sample image serving as a training sample through a target detection algorithm to obtain the pre-trained deep learning model.

A two-dimensional sample image is acquired, in particular a sample set of one or more two-dimensional sample images, which sample set is required to contain two-dimensional image information of various target objects. The two-dimensional sample image can contain a plurality of two-dimensional image examples corresponding to a plurality of target objects to be identified, or can only contain one or a plurality of two-dimensional image examples corresponding to one or only one target object to be identified. In the sample set, the total number of two-dimensional image instances of each target object to be identified needs to reach a predetermined number, for example, the number of two-dimensional image instances of each object is greater than or equal to 500, and the greater the predetermined number is, the higher the accuracy of the finally trained deep learning model is.

In the two-dimensional sample image, a frame selection instruction of a user is received, two-dimensional target areas corresponding to all target objects are selected by a frame, and corresponding category labels are marked. The class label may contain information about the name, attribute characteristics, geometry type, etc. of the target object.

And taking the two-dimensional sample image which is subjected to frame selection and identification as a training sample, inputting the training sample into a convolutional neural network, adjusting training parameters, and training through a target detection algorithm to obtain a pre-trained deep learning model. The target detection algorithm can be any one of an RCNN (Region-based Convolutional Neural Network) target detection algorithm, a Fast-RCNN target detection algorithm, a YOLO (You Only Look Once) target detection algorithm and a SSD (Single Shot MultiBox Detector) target detection algorithm, and specifically can be selected according to the requirements on training speed and training accuracy. Preferably, the pre-trained deep learning model is obtained through training of an SSD target detection algorithm or a Faster-RCNN target detection algorithm, so that training accuracy is ensured, and meanwhile training speed is improved.

In S103, the two-dimensional target area is mapped to the point cloud data, and a first three-dimensional area of the target object is determined according to the mapping result.

If the two-dimensional image and the point cloud data are acquired by the same depth camera, the coordinate mapping relation of the two-dimensional image and the point cloud data is calibrated, a two-dimensional target area in the two-dimensional image can be mapped into the point cloud data through certain algorithm conversion, a corresponding three-dimensional area of a target object in the point cloud data is determined, and rough segmentation of the target object in the point cloud data is completed. For the sake of illustration, this three-dimensional region mapped by the two-dimensional target region is referred to as the first three-dimensional region of the target object. If the two-dimensional image and the point cloud data are acquired by two different cameras respectively, calibrating a coordinate mapping relation of the two-dimensional image and the point cloud data according to the position relation of the two cameras, and mapping a two-dimensional target area in the two-dimensional image into the point cloud data to obtain a first three-dimensional area of the target object.

In S104, a second three-dimensional region of the target object is determined and the target object is located according to the geometry type and the first three-dimensional region.

According to the determined geometric shape type of the target object, performing more accurate second region segmentation in the first three-dimensional region by a fitting algorithm or a model matching method to obtain a three-dimensional region corresponding to the target object, and for the sake of illustration and distinction, the three-dimensional region of the target object obtained by the second segmentation is called as a second three-dimensional region of the target object. After the second three-dimensional area is obtained, the coordinates of all target objects are represented by the position coordinates of the world coordinate system according to the mapping relation between the second three-dimensional area in the point cloud data and the depth camera coordinate system and the world coordinate system (namely, the coordinate system of the real world where the objects are located), so that the positioning of the target objects is completed.

Specifically, the step S104 includes:

s1041: if the geometric shape type is a regular geometric shape, determining a second three-dimensional area of the target object from the first three-dimensional area through a fitting algorithm and positioning the target object;

s1042: if the geometric shape type is irregular geometric shape, determining a second three-dimensional area of the target object from the first three-dimensional area through a 3D model matching method and positioning the target object.

In this embodiment of the present application, the geometry type at least includes a plane, a sphere, a cylinder, a cuboid, an abnormal shape, and the like, where the plane, the sphere, the cylinder, and the cuboid are all regular geometries, and the abnormal shape is an irregular geometry.

In S1041, when the geometry type is a regular geometry, a second accurate three-dimensional region division may be performed in the first three-dimensional region by a corresponding fitting algorithm thereof, a second three-dimensional region of the target object may be determined, and an actual position of the target object may be located according to the second three-dimensional region. For example, if the geometry type of one target object in the region to be measured determined in step S102 is a sphere, a sphere fitting algorithm is established in simulation software by combining a spherical equation and a least square method, a sphere is fitted in the first three-dimensional region of the target object, and the sphere is the second three-dimensional region of the target object, and the target object is positioned according to the spherical center coordinates and the radius of the sphere.

In S1042, since the irregular geometric shaped body cannot determine the second three-dimensional region by establishing a fitting algorithm, in the embodiment of the present application, a corresponding 3D template is established and pre-stored in advance for the irregular geometric shaped object. When the geometric shape type of the target object is detected to be irregular geometric shape, the first three-dimensional area is matched with a pre-stored 3D template through a 3D model matching method, a second three-dimensional area of the target object is accurately determined from the first three-dimensional area, and the actual position of the target object is positioned according to the second three-dimensional area.

In the embodiment of the application, when the target object is in a regular geometric shape, a fitting algorithm which is simple and has low operation resource consumption is adopted to accurately determine the corresponding second three-dimensional region from the first three-dimensional region; and when the target object is in an irregular geometric shape, accurately determining the second three-dimensional area by adopting a 3D template matching algorithm. The corresponding second three-dimensional region determining method can be selected according to the geometric shape type characteristics of the target object, so that operation resources can be saved on the premise of ensuring accurate region division, and the object identification and positioning efficiency and accuracy are improved.

Optionally, after determining the second three-dimensional regions in the point cloud data, all the second three-dimensional regions are framed by cuboid ROIs or marked with different colors, and meanwhile, attribute tags are marked for the second three-dimensional regions, wherein the attribute tags can include information such as names, geometric attributes (such as planes, cylinders, balls, cuboids, irregular shapes) and the like of the target object.

In the embodiment of the invention, the robustness of the method for recognizing the target object by the two-dimensional image is better than that of the method for recognizing the target object by the 3D model, so that the two-dimensional target area of the target object is recognized by the two-dimensional image, and then the two-dimensional target area is mapped to the three-dimensional point cloud space to determine the first three-dimensional area of the target object, so that the target object can be primarily recognized and the geometrical shape type corresponding to the target object can be determined, and compared with the object recognition method for directly performing 3D model matching with poor robustness, the object recognition efficiency and the object accurate recognition probability can be improved; and after the target object is primarily identified by determining the first three-dimensional area, the second three-dimensional area of the target object can be further accurately determined according to the geometric shape type of the target object and the first three-dimensional area, and the positioning of the target object is completed, so that the accuracy of object identification and positioning is further improved.

Embodiment two:

fig. 2 is a schematic flow chart of a second object identifying and positioning method according to an embodiment of the present application, which is described in detail below:

in S201, a two-dimensional image of a region to be measured and point cloud data are acquired.

In this embodiment, S201 is the same as S101 in the previous embodiment, and please refer to the description related to S101 in the previous embodiment, which is not repeated here.

In S202, the two-dimensional image is detected by a pre-trained deep learning model, and a two-dimensional target region and a geometry type corresponding to a target object in the two-dimensional image are identified.

In this embodiment, S202 is the same as S102 in the previous embodiment, and detailed descriptions of S102 in the previous embodiment are omitted here.

In S203, the two-dimensional target area is mapped to the point cloud data, and a first three-dimensional area of the target object is determined according to the mapping result.

In this embodiment, S203 is the same as S103 in the previous embodiment, and please refer to the description related to S103 in the previous embodiment, which is not repeated here.

In S204, a second three-dimensional region of the target object is determined and the target object is located according to the geometry type and the first three-dimensional region.

In this embodiment, S204 is the same as S104 in the previous embodiment, and detailed description of S104 in the previous embodiment is omitted here.

In S205, the target object is grabbed according to the geometry type and the position of the target object.

And positioning grabbing points (which can comprise position coordinate information and grabbing direction information of the grabbing points) of the target object through a fitting algorithm or a 3D matching algorithm according to the geometric shape type information of the target object determined in the two-dimensional image and the position of the target object positioned after the second three-dimensional area of the target object, and moving grabbing instruments to grab the target object.

Optionally, the step S205 includes:

and positioning a grabbing point and grabbing the target object according to the geometric shape type, the position of the target object and the type of grabbing instrument.

In this embodiment, the gripping apparatus may include a suction cup gripping apparatus and a manipulator gripping apparatus. When the gripping apparatus is of the suction cup type, the position of the center point of one plane of the target object (any point on the sphere can be used as the center point position of the sphere if the geometry type of the target object is a sphere) is determined as the coordinate position of the gripping point by the position and the geometry type of the target object, and the normal direction of the plane passing through the gripping point is determined as the gripping direction. When the type of the grabbing apparatus is a manipulator type, determining that a plurality of edge contour coordinates of the target object are a plurality of grabbing point positions corresponding to the manipulator according to the position and the geometric shape type of the target object, wherein the grabbing direction is along the axial direction of the arm of the manipulator.

According to the method and the device for determining the grabbing points, corresponding grabbing points can be determined according to the geometric shape type of the target object and the grabbing instrument type, so that grabbing of the target object is more stable and efficient.

Specifically, the step S205 includes:

if the geometric shape type is a regular geometric shape, the position of the target object and the type of the grabbing apparatus adopt a fitting algorithm to calculate grabbing points of the target object, and grab the target object.

And if the geometric shape type is a regular geometric shape, the position of the corresponding grabbing point is calculated through a fitting algorithm. For example, if the three-dimensional target object is a cylinder and the gripping apparatus is of a suction cup type, then the corresponding fitting algorithm is used to determine any one of two planes, namely the upper surface or the lower surface of the cylinder, from the cylindrical second three-dimensional region of the target object through a plane fitting algorithm, and determine the center point coordinate of the plane as the gripping point position.

In the embodiment of the application, when the geometric shape type of the target object is a regular geometric shape, the grabbing points of the target object are simply and accurately obtained through a fitting algorithm, so that the accurate grabbing of the target object is realized.

Specifically, the step S205 includes:

if the geometric shape type is an irregular geometric shape, matching the second three-dimensional area with a pre-stored 3D template to obtain 6D gesture information of a target object, wherein the pre-stored 3D template comprises preset grabbing points;

and grabbing the target object according to the 6D gesture information and the matched pre-stored 3D template.

And if the target object is in an irregular geometric shape, matching the second three-dimensional area with a pre-stored 3D template to obtain the 6D gesture information of the target object. The 6D gesture information consists of three-dimensional position coordinate information and three-dimensional direction information of the target object, wherein the three-dimensional position coordinate information and the three-dimensional direction information both take a pre-calibrated three-dimensional coordinate system XYZ as a reference system. Wherein the three-dimensional direction information may be represented by euler angles; the Euler angle comprises a first included angle alpha, a second included angle beta and a third included angle gamma; let xyz be the coordinate system that is fixedly connected to (i.e. rotates with the rotation of the target object), the first angle α is used to represent the angle between the Z axis of the reference system and the intrinsic Z axis of the target object, the second angle β is used to represent the angle between the Y axis of the reference system and the intrinsic Y axis of the target object, and the third angle γ is used to represent the angle between the X axis of the reference system and the intrinsic X axis of the target object. The pre-stored 3D template is template point cloud data of a target object containing preset grabbing point information, and the template point cloud data contains three-dimensional position information and grabbing direction information of the preset grabbing points.

Optionally, after the second three-dimensional region is matched with the pre-stored 3D target to obtain the 6D pose information of the target object, the 6D pose information is refined by iterating a closest point (Iterative Closest Point, ICP) algorithm. Obtaining rough initial pose p of target object by pre-stored 3D target matching ₀ Then, the initial pose p is used ₀ As an initial input, further refining (refine) operations are performed on the obtained pose using ICP algorithm to obtain a more accurate result p, and finally refined 6D pose information p is output.

And determining the position information of a preset grabbing point in the pre-stored 3D template at the grabbing point corresponding to the actual object space according to the finally obtained 6D gesture information of the target object and the position mapping relation of the pre-stored 3D template matched with the second three-dimensional area, positioning the grabbing point and grabbing the target object.

In the embodiment of the application, when the geometric shape type of the target object is an irregular geometric shape, the position and the grabbing direction of the grabbing point of the target object can be accurately determined according to the preset grabbing point of the pre-stored 3D template, so that the target object can be accurately grabbed.

Optionally, before the capturing the target object according to the second three-dimensional area in the point cloud data, the method further includes:

Establishing a 3D template of the target object by adopting a point-to-point characteristic PPF algorithm;

and determining a preset grabbing point in the 3D template and storing the preset grabbing point to obtain the pre-stored 3D template.

Generating a three-dimensional model of the target object through a three-dimensional Computer Aided Design (CAD) model or directly shooting through a depth camera to obtain three-dimensional information of the target object, inputting the three-dimensional model or the three-dimensional information into a point-to-feature (Point Pair Feature, PPF) detector, and establishing a 3D template of the target object through a PPF algorithm. Alternatively, the sampling step of the PPF detector with respect to the model diameter, the discrete step with respect to the model diameter, the angle discretization value, etc. may be set before three-dimensional information of the target object is input to the PPF detector.

And receiving a specified instruction in the established 3D template, designating a preset grabbing point and storing the preset grabbing point to obtain a pre-stored 3D template containing the preset grabbing point. Specifically, the preset grabbing point includes position information of the preset grabbing point and preset grabbing direction information. Specifically, according to the characteristics of the grabbing apparatus (such as a suction head, a suction cup, a manipulator, etc.), a preset grabbing point and a preset grabbing direction are designated. If the grabbing device is a sucker, for example, the position of the preset grabbing point is designated as the center point of a certain plane of the target object, and the direction of the preset grabbing point is the normal direction along the position of the preset grabbing point.

In the embodiment of the application, the pre-stored 3D template containing the preset grabbing points is accurately established through the point-to-point characteristic algorithm, so that an accurate basis is provided for the subsequent positioning of the target object and the determination of the grabbing points, and the accuracy and the efficiency of identifying, positioning and grabbing the target object are improved.

In the embodiment of the invention, after the target object is positioned, the grabbing point of the target object can be determined according to the geometric shape type of the target object and the position of the target object, so that the target object can be accurately and efficiently grabbed.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Embodiment III:

fig. 3 is a schematic structural diagram of an object identifying and positioning device according to an embodiment of the present application, and for convenience of explanation, only a portion related to the embodiment of the present application is shown.

The object recognition positioning device comprises: a first acquisition unit 31, an identification unit 32, a rough segmentation unit 33, a positioning unit 34. Wherein:

the first acquiring unit 31 is configured to acquire a two-dimensional image of an area to be measured and point cloud data.

Optionally, the first acquisition unit includes a first acquisition module and a point cloud data generation module:

the first acquisition module is used for acquiring a color image and a depth image of the region to be detected, wherein the color image is a two-dimensional image of the region to be detected;

and the point cloud data generation module is used for generating point cloud data of the region to be detected according to the depth map.

And the identifying unit 32 is configured to detect the two-dimensional image through a pre-trained deep learning model, and identify a two-dimensional target area corresponding to the target object in the two-dimensional image.

Optionally, the object recognition positioning device further includes a deep learning model training unit, where the deep learning model training unit specifically includes a second acquisition module, a first identification module, and a training module:

a second acquisition module for acquiring a two-dimensional sample image, wherein the two-dimensional sample image contains a predetermined number of target objects;

the first identification module is used for selecting a two-dimensional target area corresponding to the target object in the two-dimensional sample image in a frame mode and identifying a corresponding geometric shape type label;

and the training module is used for taking the two-dimensional sample image as a training sample and obtaining the pre-trained deep learning model through training of a target detection algorithm.

And the rough segmentation unit 33 is configured to map the two-dimensional target area to the point cloud data, and determine a first three-dimensional area of the target object according to a mapping result.

And the positioning unit 34 is used for determining a second three-dimensional area of the target object according to the geometric shape type and the first three-dimensional area and positioning the target object.

Optionally, the positioning unit 34 includes a first positioning module and a second positioning module:

the first positioning module is used for determining a second three-dimensional area of the target object from the first three-dimensional area through a fitting algorithm and positioning the target object if the geometric shape type is a regular geometric shape;

and the second positioning module is used for determining a second three-dimensional area of the target object from the first three-dimensional area through a 3D model matching method and positioning the target object if the geometric shape type is irregular geometric shape.

Optionally, the object identifying and positioning device further includes:

and the grabbing unit is used for grabbing the target object according to the geometric shape type and the position of the target object.

Optionally, the grabbing unit includes:

and the first grabbing module is used for calculating grabbing points of the target object by adopting a fitting algorithm according to the position of the target object and the type of grabbing instruments if the geometric shape type is a regular geometric shape, and grabbing the target object.

Optionally, the grabbing unit includes a matching module and a second grabbing module:

the matching module is used for matching the second three-dimensional area with a pre-stored 3D template to obtain 6D gesture information of the target object if the geometric shape type is an irregular geometric shape, wherein the pre-stored 3D template comprises a preset grabbing point;

and the second grabbing module is used for grabbing the target object according to the 6D gesture information and the matched pre-stored 3D template.

Optionally, the object identifying and positioning device further includes a template establishing unit, where the template establishing unit includes an establishing module and a preset grabbing point determining module:

the template building unit is used for building a 3D template of the target object by adopting a point-to-point characteristic PPF algorithm;

the preset grabbing point determining module is used for determining preset grabbing points in the 3D template and storing the preset grabbing points to obtain the pre-stored 3D template.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Embodiment four:

fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as an object recognition positioning program, stored in said memory 41 and executable on said processor 40. The steps of the above-described respective object recognition positioning method embodiments, such as steps S101 to S104 shown in fig. 1, are implemented when the processor 40 executes the computer program 42. Alternatively, the processor 40, when executing the computer program 42, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 31-34 shown in fig. 3.

Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into a first acquisition unit, an identification unit, a positioning unit, each unit functioning in particular as follows:

The first acquisition unit is used for acquiring the two-dimensional image and the point cloud data of the area to be detected.

The identification unit is used for detecting the two-dimensional image through a pre-trained deep learning model and identifying a two-dimensional target area corresponding to a target object in the two-dimensional image.

And the rough segmentation unit is used for mapping the two-dimensional target area to the point cloud data and determining a first three-dimensional area of the target object according to a mapping result.

The terminal device 4 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the terminal device 4 and does not constitute a limitation of the terminal device 4, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The processor 40 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 41 may also be used for temporarily storing data that has been output or is to be output.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. An object identification positioning method, comprising:

detecting the two-dimensional image through a pre-trained deep learning model, and identifying a two-dimensional target area and a geometric shape type corresponding to a target object in the two-dimensional image, wherein the method comprises the following steps: inputting the two-dimensional image into a target detection model, and extracting the color characteristic information of the two-dimensional image through the color characteristic information extraction network layer; extracting texture feature information of the two-dimensional image through the texture feature information extraction network layer; extracting the network layer through the geometric feature to obtain the geometric feature information of the two-dimensional image; according to the color feature information, texture feature information and geometric feature information of the two-dimensional image, carrying out region division on the two-dimensional image to obtain a two-dimensional target region in the two-dimensional image; determining the geometric shape type of the target object according to the information of the two-dimensional target area and the discriminator;

determining a second three-dimensional region of the target object and locating the target object according to the geometric shape type and the first three-dimensional region, comprising: if the geometric shape type is a regular geometric shape, determining a second three-dimensional area of the target object from the first three-dimensional area through a fitting algorithm and positioning the target object; if the geometric shape type is irregular geometric shape, determining a second three-dimensional area of the target object from the first three-dimensional area through a 3D model matching method and positioning the target object.

2. The method of claim 1, further comprising, prior to detecting the two-dimensional image with the pre-trained deep learning model to identify a two-dimensional target region and a geometry type corresponding to a target object in the two-dimensional image:

3. The object recognition positioning method according to claim 1, further comprising, after said determining a second three-dimensional region of said target object and positioning said target object based on said geometry type and said first three-dimensional region:

and grabbing the target object according to the geometric shape type and the position of the target object.

4. The object recognition positioning method of claim 3, wherein said grasping said target object based on said geometry type and a position of said target object comprises:

5. The object recognition positioning method according to claim 3, wherein the grasping the target object according to the geometry type and the position of the target object includes:

if the geometric shape type is an irregular geometric shape, matching the second three-dimensional area with a pre-stored 3D template to obtain 6D gesture information of the target object, wherein the pre-stored 3D template comprises preset grabbing points;

6. The object recognition positioning method according to claim 5, further comprising, before the gripping of the target object according to the geometry type and the position of the target object:

7. An object recognition positioning device, characterized by comprising:

the identification unit is used for detecting the two-dimensional image through a pre-trained deep learning model and identifying a two-dimensional target area and a geometric shape type corresponding to a target object in the two-dimensional image, and comprises the following steps: inputting the two-dimensional image into a target detection model, and extracting the color characteristic information of the two-dimensional image through the color characteristic information extraction network layer; extracting texture feature information of the two-dimensional image through the texture feature information extraction network layer; extracting the network layer through the geometric feature to obtain the geometric feature information of the two-dimensional image; according to the color feature information, texture feature information and geometric feature information of the two-dimensional image, carrying out region division on the two-dimensional image to obtain a two-dimensional target region in the two-dimensional image; determining the geometric shape type of the target object according to the information of the two-dimensional target area and the discriminator;

the positioning unit is used for determining a second three-dimensional area of the target object according to the geometric shape type and the first three-dimensional area and positioning the target object;

the positioning unit comprises a first positioning module and a second positioning module;

the second positioning module is used for determining a second three-dimensional area of the target object from the first three-dimensional area through a 3D model matching method and positioning the target object if the geometric shape type is irregular geometric shape.

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.