CN113191174A

CN113191174A - Article positioning method and device, robot and computer readable storage medium

Info

Publication number: CN113191174A
Application number: CN202010039632.4A
Authority: CN
Inventors: 刘伟峰; 万保成
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2021-07-30
Anticipated expiration: 2040-01-14
Also published as: CN113191174B

Abstract

The present disclosure provides an article positioning method, an article positioning device, a robot, and a computer-readable storage medium. The article positioning method comprises the following steps: acquiring a two-dimensional image related to the surface of the article and three-dimensional position information related to the surface of the article; extracting an edge image from the two-dimensional image; extracting corner data from the edge image; processing and analyzing the extracted corner data to obtain a target polygon representing the edge of the surface of the article; determining three-dimensional position information of a target polygon from the three-dimensional position information related to the surface of the object; and determining the position of the surface of the object by using the three-dimensional position information of the target polygon.

Description

Article positioning method and device, robot and computer readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and more particularly, to an article positioning method, an article positioning device, a robot, and a computer-readable storage medium.

Background

With the progress of science and technology, the degree of mechanization is continuously improved, and robots are often required to grab and carry objects in the production and logistics processes. For example, during the shipment of goods, the staff picks up the goods from the shelves according to the shopping order and collects the goods on the trays to form a stack of goods. Thereafter, the trays are transported to a prescribed position, and the articles are picked one by one from the stack of articles by the robot on the basis of visual assistance. The goods at this moment are in the primary packaging state, namely the outer packing has the carton, forms into article not of uniform size.

In the process of implementing the concept disclosed herein, the inventors found that in a scenario where a robot is used to unstack articles, since the types and size information of the articles in a stack are generally unknown, the existing vision-aided system cannot assist in positioning by using sizes and templates, thereby bringing great difficulty to precise positioning of the articles. If the precise positioning cannot be realized, the situation that the article cannot be picked up or the article falls off may occur in the following article grabbing and article carrying processes.

Disclosure of Invention

In view of the above, the present disclosure provides an article positioning method, an article positioning apparatus, a robot, and a computer-readable storage medium.

One aspect of the present disclosure provides an article positioning method, including: acquiring a two-dimensional image related to the surface of an article and three-dimensional position information related to the surface of the article; extracting an edge image from the two-dimensional image; extracting corner data from the edge image; processing and analyzing the extracted corner data to obtain a target polygon representing the edge of the surface of the article; determining three-dimensional position information of the target polygon from the three-dimensional position information related to the surface of the object; and determining the position of the surface of the object by using the three-dimensional position information of the target polygon.

According to the embodiment of the disclosure, the article is one article in a stack formed by closely arranging a plurality of articles, and the surfaces of the plurality of articles in at least one direction of the stack are polygonal surfaces with the same number of sides.

According to an embodiment of the present disclosure, the article is a rectangular parallelepiped article, and the processing and analyzing the extracted corner data to obtain a target polygon representing an edge of the article surface includes: clustering the angular point data to reduce the number of angular points; extracting a plurality of quadrangles formed by connecting angular points; and screening the target rectangle from the plurality of quadrangles.

According to an embodiment of the present disclosure, the obtaining of the target rectangle by filtering from the plurality of quadrangles includes: screening quadrangles with the size within a target size range from the plurality of quadrangles; screening one or more candidate quadrangles according to the coincidence degree between the quadrangles with the sizes within the target size range; generating a convolution kernel corresponding to the size of the candidate quadrangle for each of the one or more candidate quadrangles, the convolution kernel having an annular rectangular frame area; and matching the corresponding candidate quadrangles by utilizing the annular strip-shaped rectangular frame area of the convolution kernel corresponding to each candidate quadrangle to obtain the target rectangle.

According to an embodiment of the present disclosure, matching the corresponding candidate quadrangle by using the annular strip-shaped rectangular frame region of the convolution kernel corresponding to each candidate quadrangle to obtain the target rectangle includes: assigning 1 to the annular rectangular frame region and assigning 0 to the region outside the annular rectangular frame region; convolving the corresponding candidate quadrangle by using the convolution kernel to obtain a convolution value; judging whether the matching degree of the candidate quadrangle and the annular rectangular frame area is greater than a specified value or not by using the convolution value; and if the matching degree is larger than a specified value, determining the candidate quadrangle as the target rectangle.

According to an embodiment of the present disclosure, if there are a plurality of candidate quadrangles having a matching degree with the corresponding frame region of the annular rectangular frame that is greater than a predetermined value, the candidate quadrangle having the highest matching degree is determined as the target rectangle.

According to the embodiments of the present disclosure, when a plurality of quadrangles connected by corner points are extracted, the angles of the four corners of the extracted quadrangles are each in the range of 80 ° to 100 °.

According to the embodiment of the present disclosure, after determining the position of the surface of the article, the method further includes: calculating a target picking pose for picking the item according to the information of the position of the surface of the item; and sending the target picking pose to the robot so that the robot picks the article according to the target picking pose.

Another aspect of the present disclosure provides an article positioning device, including: the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a two-dimensional image related to the surface of an article and three-dimensional position information related to the surface of the article; an edge image extraction module, configured to extract an edge image from the two-dimensional image; the corner data extraction module is used for extracting corner data from the edge image; the processing and analyzing module is used for processing and analyzing the extracted corner data to obtain a target polygon representing the edge of the surface of the article; a target polygon position determining module for determining three-dimensional position information of the target polygon from the three-dimensional position information related to the surface of the object; and the object surface position determining module is used for determining the position of the object surface by using the three-dimensional position information of the target polygon.

According to an embodiment of the present disclosure, the article is a rectangular parallelepiped article, and the processing and analyzing module includes: the angular point clustering submodule is used for clustering the angular point data so as to reduce the number of angular points; the quadrangle extraction submodule is used for extracting a plurality of quadrangles formed by connecting angular points; and the target rectangle screening submodule is used for screening the plurality of quadrangles to obtain a target rectangle.

According to an embodiment of the present disclosure, the target rectangle screening submodule includes: a quadrangle preliminary selection unit for screening a quadrangle of which the size is within a target size range from the plurality of quadrangles; a candidate quadrangle screening unit for screening one or more candidate quadrangles according to the coincidence degree between the quadrangles with the size in the target size range; a convolution kernel generation unit configured to generate, for each of the one or more candidate quadrangles, a convolution kernel corresponding to a size of the candidate quadrangle, the convolution kernel having an annular rectangular frame region; and the convolution kernel matching unit is used for matching the corresponding candidate quadrangle by using the annular strip-shaped rectangular frame area of the convolution kernel corresponding to each candidate quadrangle so as to obtain the target rectangle.

According to an embodiment of the present disclosure, the convolution kernel matching unit includes: an assignment subunit, configured to assign 1 to the annular rectangular frame region and assign 0 to a region outside the annular rectangular frame region; a convolution subunit configured to convolve the corresponding candidate quadrangle with the convolution kernel to obtain a convolution value; a judging subunit, configured to judge, by using the convolution value, whether a matching degree of the candidate quadrangle and the annular rectangular frame region is greater than a predetermined value; and a determination subunit configured to determine the candidate quadrangle as the target rectangle if the matching degree is greater than a predetermined value.

According to an embodiment of the present disclosure, if there are a plurality of candidate quadrangles having a matching degree with the corresponding frame region of the annular rectangular frame that is greater than a predetermined value, the determining subunit determines the candidate quadrangle having the highest matching degree as the target rectangle.

According to the embodiment of the present disclosure, the angles of the four corners of the quadrangle extracted by the quadrangle extraction sub-module are all in the range of 80 ° to 100 °.

According to an embodiment of the present disclosure, the article positioning device further comprises: a target picking pose calculation module for calculating a target picking pose for picking the item according to the information of the position of the surface of the item; and the sending module is used for sending the target picking pose to the robot so that the robot picks the article according to the target picking pose.

Another aspect of the present disclosure provides a robot including: one or more processors; a memory for storing one or more instructions, wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method described above.

Another aspect of the present disclosure provides a computer-readable storage medium storing executable instructions that, when executed by a processor, cause the processor to implement the above-described method.

According to the embodiment of the disclosure, according to the acquired two-dimensional image related to the surface of the article and the three-dimensional position information related to the surface of the article, the more accurate positioning of the article can be realized by performing a series of processing on the two-dimensional image related to the surface of the article without using special equipment, the problem that the existing vision auxiliary system needs to perform auxiliary positioning by using a size and a template is solved to at least a certain extent, and the effect of simply realizing the more accurate positioning of the article is achieved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario of an article positioning method and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram that schematically illustrates a manner in which values are assigned to a circular-banded rectangular bounding box region of a convolution kernel, in accordance with an embodiment of the present disclosure;

figures 7-16 schematically illustrate one particular embodiment of an article positioning method according to an embodiment of the present disclosure;

FIG. 17 schematically illustrates a block diagram of an article positioning device, in accordance with an embodiment of the present disclosure; and

fig. 18 schematically shows a block diagram of a robot according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a device having at least one of A, B and C" would include but not be limited to devices having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a device having at least one of A, B or C" would include but not be limited to devices having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Embodiments of the present disclosure provide an article positioning method. The article positioning method comprises the following steps: acquiring a two-dimensional image related to the surface of the article and three-dimensional position information related to the surface of the article; extracting an edge image from the two-dimensional image; extracting corner data from the edge image; processing and analyzing the extracted corner data to obtain a target polygon representing the edge of the surface of the article; determining three-dimensional position information of a target polygon from the three-dimensional position information related to the surface of the object; and determining the position of the surface of the object by using the three-dimensional position information of the target polygon.

Fig. 1 schematically illustrates an application scenario in which the method and apparatus for locating an item may be applied according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the articles of varying sizes in the initially wrapped condition are collected on a tray 140 to form a stack, wherein the stack includes the articles 130. According to an embodiment of the present disclosure, the plurality of articles may have a rectangular parallelepiped shape, and it should be noted that, in an embodiment of the present disclosure, the rectangular parallelepiped shape may include a rectangular parallelepiped shape. The tray 140 is disposed at a predetermined position, and the imaging device 110 is disposed above the predetermined position. The photographing device 110 photographs image information of the article 130 and transmits the image information to the robot 120, and the robot 120 locates the article 130 based on the image information and then grasps and transports the article 130. The robot 120 may, for example, suck the upper surface of the article 130 by the pickup mechanism 121, thereby performing gripping and transportation of the article 130. The action of the robot 120 to grasp an item 130 from a stack is also referred to as unstacking.

In this application, the articles 130 in the stack have a rectangular surface shape in plan view, for example, but are various in kind and thus have various sizes. In other words, the article 130 is one article in a stack in which a plurality of articles are closely arranged, and the surfaces of the plurality of articles in at least one direction of the stack are polygonal surfaces having the same number of sides, but the size of the surface of the article 130 is unknown.

In this application scenario, if the object 130 cannot be accurately positioned, the object may not be picked up or dropped during the following process of grabbing and transporting the object. In view of the specific requirements of unstacking, it is necessary to know accurately the shape and size information of the upper surface of the article 130 to calculate the center of gravity of the article 130, while at the same time, to know accurately the corresponding depth information of the upper surface of the article 130 to determine the lowering height of the pick-up mechanism 121 when positioning the article 130.

The current object localization techniques mainly include a template matching technique, a point cloud segmentation technique, and a technique of directly predicting a bounding box of an object through deep learning, but these object localization techniques are not suitable for the above application scenarios for the following reasons.

(1) The template matching technology is used for finding a target template matched with an article from pre-stored templates and locating the article by utilizing the information of the target template when the article is located by pre-storing the templates of the article which may appear. When the type and size of an article are unknown, a template for the article cannot be stored in advance. Thus, the template matching technique is not suitable for the above application scenarios.

(2) The point cloud segmentation technology divides a point cloud according to features such as space, geometry and texture, and positions the point cloud with similar features in the same division as an article, and the technology needs to use a 3D camera with sufficiently high resolution and is large in computation amount. In a case where one article in a stack in which a plurality of articles are closely arranged is close to each other in height, in particular, the plurality of articles in the stack are also close to each other in height, it is difficult to directly divide the article by a point cloud. Therefore, the point cloud segmentation technique is not suitable for the application scenario.

(3) The technology of directly predicting the bounding box of the article through deep learning needs to collect data of all articles in storage for model training, the workload is huge, the articles in storage are frequently updated, the model needs to be frequently updated to adapt to the updating of the articles, and the technology is not beneficial to practical application. Thus, the technique of directly predicting the bounding box of an article through deep learning is also not suitable for the above application scenarios.

Aiming at the technical pain point of target positioning in the mixed article unstacking application scene, the effective article positioning method is provided by the disclosure, and the defect that the same-layer target is difficult to be segmented by the traditional template matching technology depending on a template and a point cloud segmentation technology is overcome; meanwhile, the defect that the model needs to be updated frequently in practical application by the technology of directly predicting the bounding box of the article through deep learning is overcome. The technical scheme disclosed by the invention can realize relatively accurate positioning of the article without depending on the size and the template of the article.

Fig. 2 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S260.

In operation S210, a two-dimensional image related to a surface of an article and three-dimensional position information related to the surface of the article are acquired.

This operation S210 is explained in detail in conjunction with the scenario example of fig. 1. As shown in fig. 1, the surface of the article may be, for example, an upper surface of the article 130, and the act of acquiring a two-dimensional image associated with the surface of the article and three-dimensional position information associated with the surface of the article may be performed, for example, by the camera 110. The camera 110 may be provided with a 2D camera and a 3D camera, for example. The 2D camera and the 3D camera are configured by using an external reference calibration technique so that the plane position information captured by the 2D camera and the plane position information captured by the 3D camera for the same point in the spatial cartesian rectangular coordinate system coincide with each other. That is, since the 2D camera and the 3D camera are externally calibrated based on the same coordinate system, each point photographed by the 2D camera has a certain three-dimensional position information, and the plane position and the depth position thereof can be determined. Thus, the two-dimensional image can be obtained by, for example, shooting with a 2D camera, and the three-dimensional position information can be obtained by, for example, shooting with a 3D camera. The 2D camera and the 3D camera may be a general camera.

The two-dimensional image related to the surface of the article 130 may be a two-dimensional image including the surface of the article captured by a 2D camera, and the two-dimensional image may be an RGB image or a grayscale image. When the two-dimensional image is an RGB image, conversion into a grayscale map is necessary for the convenience of processing described later.

For example, the two-dimensional image associated with the surface of the item 130 may be a two-dimensional image taken with a 2D camera having 400 ten thousand pixels in an area of 1.5m x 2m inclusive of the surface of the item 130. That is, the two-dimensional image related to the surface of the article 130 includes not only the two-dimensional image of the surface of the article 130 but also the two-dimensional image of the surface of other articles in the photographing area, even the ground, the amount of data is large, and there is disturbance data. Thus, the surface of the item 130 may not be accurately located based on the two-dimensional image associated with the surface of the item.

The three-dimensional position information associated with the surface of article 130 may include spatial coordinate information of article 130, from which, for example, a particular spatial position of the surface of article 130 may be determined. The three-dimensional position information may be captured with a 3D camera, but may also be obtained in other ways.

Although the conventional 2D camera can acquire a planar image of an object, the conventional 2D camera cannot directly acquire spatial information of the object. In addition, although the conventional 3D camera can acquire three-dimensional position information of an article, it cannot directly generate a planar image for easy recognition. Thus, it is difficult to achieve more accurate positioning of an object using either a conventional 2D camera or a 3D camera alone.

In operation S220, an edge image is extracted from the two-dimensional image.

The method of deep learning can be used for extracting the edge image from the two-dimensional image, and for example, a RCF network, a deep edge network, a deep content network, and other general feature extraction networks can be used. The general feature extraction networks do not need to update the network model due to frequent updating of the stored articles, so that the convenience of practical application is increased. It should be noted that the edge image extracted from the two-dimensional image in this way includes not only an image of the edge of the actual surface of the article but also an image of the edge of some pattern present on the surface of the article, and the amount of data is still large and there is disturbance data. Thus, the surface of the item 130 may not be accurately located based on the edge image.

In operation S230, corner data is extracted from the edge image. Extracting corner data from the edge image may utilize, for example, Harris corner detection techniques.

According to an embodiment of the present disclosure, the article 130 has a polygonal surface. The corner point data is extracted from the edge image in order to determine the plane positions of the corners of the polygonal surface of the article.

By this operation S230, it is possible to determine approximately tens of thousands of corner points.

In operation S240, the extracted corner data is processed and analyzed to obtain a target polygon representing an edge of the surface of the article.

According to an embodiment of the present disclosure, the article 130 may be one article in a stack of a plurality of articles closely arranged, and the surface of the plurality of articles in at least one direction of the stack is a polygonal surface having the same number of sides. In this way, the polygonal surfaces of a plurality of articles in at least one direction of the stack, although not of the same size, can be determined by connecting the same number of corner points, since they have the same number of sides. The target polygon representing the edge of the surface of the article obtained in operation S240 can reflect the plane position of the polygonal surface of the article with high accuracy.

In operation S250, three-dimensional position information of the target polygon is determined from the three-dimensional position information associated with the surface of the object.

As mentioned above, since the 2D camera and the 3D camera are externally calibrated based on the same coordinate system, and the target polygon representing the edge of the surface of the object is obtained based on the two-dimensional image captured by the 2D camera, each point of the target polygon has a certain three-dimensional position information. Thus, after the target polygon is determined, the three-dimensional position information of the target polygon can be determined from the three-dimensional position information related to the surface of the object.

In operation S260, the position of the surface of the object is determined using the three-dimensional position information of the target polygon. In fact, the three-dimensional position information of the target polygon can be regarded as the position of the surface of the object and directly used for the next processing.

Through the series of processing in the operations S210 to S260, according to the acquired two-dimensional image related to the surface of the article and the three-dimensional position information related to the surface of the article, relatively accurate positioning of the article can be achieved without using special equipment, the problem that the conventional visual auxiliary system needs to use a size and a template for auxiliary positioning is solved to at least a certain extent, and an effect of simply achieving relatively accurate positioning of the article is achieved. Thus, the precise positioning of the article can be achieved independent of the size of the article and the template.

Fig. 3 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, after determining the position of the surface of the article, the following operations S270 to S280 may be further included.

In operation S270, a target picking pose for picking an item is calculated from information of a position of a surface of the item. After the position of the surface of the object is determined, the size, the height, the gravity center and the like of the surface of the object can be obtained accordingly. Thereby, for example, the translation position, the lowering height, and the like of the pickup mechanism 121 of the robot 120 can be calculated.

In operation S280, the target picking pose is sent to the robot such that the robot picks the item according to the target picking pose.

Through the above operation, the robot can be assisted to pick the article by using the article positioning method of the embodiment of the disclosure.

Fig. 4 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the article 130 may be a rectangular parallelepiped article, in which case the article 130 has a rectangular surface. In this case, as shown in fig. 4, operation S240 may further include operations S410 to S430 as follows.

In operation S410, the corner point data is clustered to reduce the number of corner points. A plurality of adjacent corner points are gathered near the corners of the article, and if each corner point participates in the calculation, the calculation amount is too large and no substantial benefit is brought. Therefore, the number of corner points can be reduced by clustering the corner points, and the operation speed of the processing to be described later is improved. Corner clustering uses, for example, the DBSCAN clustering algorithm. For example, one hundred or so right corner points can be selected from tens of thousands of corner points through this operation S410.

In operation S420, a plurality of quadrangles connected by corner points are extracted. Specifically, four corner points are selected at a time, and the four corner points are connected two by two to form a quadrangle. In order to reduce the operation interference, only convex quadrangles connected by angular points can be extracted.

According to the embodiments of the present disclosure, when a plurality of quadrangles connected by corner points are extracted, the angles of the four corners of the extracted quadrangles may be all in the range of 80 ° to 100 °. This is because, when the article 130 has a rectangular surface, the angles of the four corners of the extracted quadrangle are all within a range close to a right angle, and only the quadrangle close to the rectangle can be extracted, thereby reducing the computation interference. Note that, the range of the angles of all the four corners from 80 ° to 100 ° is an example of the range of the angles of all the four corners from 90 ° ± 10%, and this range may be, for example, 90 ° ± 5% depending on the actual situation.

In operation S430, a target rectangle is filtered from the plurality of quadrangles. The target rectangle is a target polygon representing the edge of the surface of the article, and can reflect the plane position of the rectangular surface of the article with high precision.

Through the series of processes of the above operations S410 to S430, the amount of data can be effectively reduced, and the target rectangle for item location can be obtained quickly without knowing the size of the item in advance or storing a template corresponding to the item.

Fig. 5 schematically illustrates a flow chart of an item location method according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, as shown in fig. 5, operation S430 may further include operations S510 to S540 as follows.

In operation S510, a quadrangle having a size within a target size range is screened from among the plurality of quadrangles.

Typically, the size of the articles in the stack is not determined, but is not too large or too small. Therefore, a target size range is set, and only quadrangles having a size within the target size range are screened out. For example, the target size range may be set to 20cm by 30cm to 80cm by 100 cm. If quadrangles with the side lengths of the four sides of 50cm, 60cm, 50cm and 60cm are found, screening the quadrangles and carrying out next processing; if quadrangles with the sides of 5cm, 6cm, 5cm and 6cm are found, the quadrangles are directly discarded to reduce data interference. The target size range may be set as appropriate according to the actual usage scenario. For example, the target size range may also be set according to the size of the smallest item that the robot can pick.

In operation S520, one or more candidate quadrangles are screened out according to the degree of coincidence between quadrangles having size dimensions within the target size range.

In the actual processing, a plurality of highly coinciding quadrilaterals may appear, which often correspond to only one real object surface. In this case, the processing can be performed by selecting a representative quadrangle from among a plurality of highly superposed quadrangles. For example, if the coincidence degree of two quadrangles with each other is 80% or more, one of the two quadrangles is retained. Criteria for how to judge which quadrangle to keep may be set as necessary. For example, a quadrilateral may be retained that is closer to a rectangle.

In operation S530, for each candidate quadrangle of the one or more candidate quadrangles, a convolution kernel corresponding to a size of the candidate quadrangle is generated, the convolution kernel having an annular band-shaped rectangular frame region. The length and width of the convolution kernel may coincide with the length and width of the candidate quadrangle, for example. That is, the outer edge of the annular-band-shaped rectangular bounding box region of the convolution kernel may coincide with the length and width of the candidate quadrangle, for example, and the inner edge thereof may be several pixels wide from the outer edge.

In operation S540, the corresponding candidate quadrangle is matched using the circular strip-shaped rectangular frame region of the convolution kernel corresponding to each candidate quadrangle to obtain the target rectangle. For example, the value on the convolution kernel may be multiplied by the gray value of the portion of the corresponding candidate quadrangle that falls within the frame area of the above-mentioned annular band-shaped rectangle.

Fig. 6 schematically illustrates a schematic diagram of a manner of assigning values to the circular-band-shaped rectangular bounding box regions of a convolution kernel according to an embodiment of the present disclosure.

Operation S540 may specifically include the following procedure according to an embodiment of the present disclosure.

As shown in fig. 6, 1 is assigned to the annular rectangular frame region and 0 is assigned to the region outside the annular rectangular frame region.

The convolution core is used for convolving the corresponding candidate quadrangle to obtain a convolution value.

Specifically, the numerical value 1 in the annular rectangular frame region may be used to multiply the gray value of the portion of the corresponding candidate quadrangle that falls within the annular rectangular frame region, and the resultant products may be summed.

The values of the edge image and the non-edge image obtained in the foregoing operation are greatly different. For example, the gradation value is one of image data, and in the case where the edge of the two-dimensional image is displayed as black, the gradation value of the edge image is low, and the gradation value of the non-edge image is high.

If the candidate quadrangle is a true target rectangle, the candidate quadrangle should correspond to the edge area of the two-dimensional image, and the convolution value is low because the gray value of the edge image is low.

If the candidate quadrangle is not a true target rectangle, at least a part of the candidate quadrangle does not correspond to an edge region of the two-dimensional image, the convolution value being higher because the gray value of the non-edge image is higher. The matching degree of the candidate quadrangle and the annular strip-shaped rectangular frame area can be characterized by the convolution value.

Therefore, a predetermined value for the convolution value may be set, and it may be determined whether the matching degree of the candidate quadrangle and the frame region of the annular rectangle is greater than the predetermined value using the convolution value.

And if the matching degree is greater than a specified value, determining the candidate quadrangle as the target rectangle.

According to the embodiment of the disclosure, if there are a plurality of candidate quadrilaterals having a matching degree with the corresponding annular rectangular frame region greater than a prescribed value, the candidate quadrilaterals having the highest matching degree are determined as the target rectangles.

Fig. 7-16 schematically illustrate one particular embodiment of an article positioning method according to an embodiment of the present disclosure.

Fig. 7 schematically shows a top view of an article 130 and a tray 140 on which the article 130 is placed. The items 130 and trays 140 in fig. 7 correspond to the items 130 and trays 140 in fig. 1.

For ease of illustration, only one item 130 on the tray 140 will be described. Other items on the tray 140 may be treated in the same manner.

In this embodiment, a photographing device composed of one 2D camera and one 3D camera is provided, but illustration of the photographing device is omitted here.

As described above, the range shown in fig. 7 is the shooting area of the 2D camera, and the size is 1.5m × 2 m.

For the shooting area, a two-dimensional image of a plane is obtained by a 2D camera having 400 ten thousand pixels, and three-dimensional position information corresponding to the two-dimensional image is obtained by a 3D camera.

The two-dimensional image includes not only the two-dimensional image of the surface of the object 130 but also other objects, trays, floors, and the like in the shooting area.

Next, an edge image is extracted from the two-dimensional image.

Fig. 8 schematically shows the extracted edge image. In fig. 8, the edge image is displayed in black, and the non-edge image is displayed in white.

The edge image includes not only an image of the edge of the actual article surface but also an image of the edge of some pattern or the like present on the article surface.

Next, corner data is extracted from the edge image using Harris corner detection techniques.

Fig. 9 schematically shows a part of corner points extracted.

It should be noted that, at this time, there are probably thousands to ten thousand corner points extracted, and in fig. 9, the main corner point gathering positions are indicated by dots.

Then, the corner data is clustered using the DBSCAN clustering algorithm.

Fig. 10 schematically shows a part of corner points after clustering.

And about one hundred angular points are formed after clustering.

Next, a plurality of quadrangles connected by corner points are extracted.

Fig. 11 schematically shows a part of a quadrangle connected by corner points.

Since the upper surface of the article 130 in this embodiment is rectangular, only a quadrangle close to the rectangle can be extracted, and the operation interference can be reduced. For example, in fig. 11, only three quadrangles shown by black solid lines may be extracted.

Next, a quadrangle having a size within a target size range is screened out from the plurality of quadrangles.

The upper right quadrilateral in fig. 11 is discarded because of its undersize.

And screening one or more candidate quadrangles according to the coincidence degree between the quadrangles with the size within the target size range.

In the actual processing, a plurality of highly coinciding quadrilaterals may appear, which often correspond to only one real object surface. If the coincidence degree of two quadrangles with each other is 80% or more, one of the two quadrangles is retained. Criteria for how to judge which quadrangle to keep may be set as necessary. For example, a quadrilateral may be retained that is closer to a rectangle. This screening by overlap ratio is not shown in FIG. 11.

Fig. 12 schematically shows the candidate quadrangles screened out.

For the candidate quadrangle on the left side, the minimum bounding rectangle of the candidate quadrangle is obtained using the four vertex coordinates (u0, v0), (u1, v1), (u2, v2), (u3, v3) of the candidate quadrangle, thereby obtaining the length L and the width W of the minimum bounding rectangle. This operation is implemented by the minAreaRect function of OpenCV. In addition, not only the length L and the width W of the minimum circumscribed rectangle can be obtained by the minAreaRect function, but also the inclination angle of the minimum circumscribed rectangle can be obtained, and whether the minimum circumscribed rectangle is an inclined rectangle or not can be determined by the inclination angle.

And designing a convolution kernel with an annular strip rectangular frame area according to the length L and the width W of the minimum bounding rectangle obtained by the candidate quadrangle. The length and width of the convolution kernel are consistent with the length and width of the minimum bounding rectangle, and the width of the frame area of the annular belt-shaped rectangle can be 10 pixels. The value on the annulus is 1 and the value inside is 0. And, 1 is assigned to the annular rectangular frame region and 0 is assigned to the region other than the annular rectangular frame region. Fig. 13 shows convolution kernels corresponding to the left candidate quadrangle in fig. 12.

After the minimum circumscribed rectangle is obtained, the edge image data corresponding to the candidate quadrangle can be intercepted from the two-dimensional image.

If the minimum external rectangle is an inclined rectangle, the image can be rotated through a warpAffine function of OpenCV, and after the rectangle is rotated to be positive, the edge image data is intercepted in the two-dimensional image.

Fig. 14 shows, in hatched parts, edge image data truncated from the candidate quadrangle on the left in fig. 12.

Recording the intercepted edge image data as I, and recording the convolution kernel designed in the previous step as K, and performing convolution calculation as follows:

the value of the annular rectangular frame region of the convolution kernel is 1, and the internal value is 0. Therefore, through the above calculation, the gradation value data of the edge image of 10 pixel width of the annular band-shaped rectangular frame region is summed.

As described above, the edge image is displayed in black with a lower gray value; the non-edge image appears white with a higher gray value.

As for the candidate quadrangle on the left side in fig. 12, since there are many black portions in the frame region of the annular rectangular shape, the sum value is low, and the candidate quadrangle is determined as the target rectangle.

The candidate quadrangle in the center in fig. 12 is determined by the same method as described above. Fig. 15 shows convolution kernels corresponding to the central candidate quadrangle in fig. 12.

Fig. 16 shows, in hatched parts, edge image data truncated from the central candidate quadrangle in fig. 12.

For the candidate quadrangle in the center in fig. 12, there are more white portions in the frame region of the annular rectangular shape, and therefore the sum value is high, so that the candidate quadrangle is not determined as the target rectangle.

One specific embodiment of the present disclosure is described above based on fig. 7 to 16, but the present disclosure is of course not limited to the specific embodiment described above.

Another aspect of the present disclosure provides an article positioning device.

FIG. 17 schematically illustrates a block diagram of an article positioning device, according to an embodiment of the disclosure.

As shown in fig. 17, the article positioning apparatus 1700 of the embodiment of the present disclosure may include an acquisition module 1710, an edge image extraction module 1720, a corner point data extraction module 1730, a processing and analysis module 1740, a target polygon position determination module 1750, and an article surface position determination module 1760.

The obtaining module 1710 is configured to obtain a two-dimensional image associated with a surface of an item and three-dimensional position information associated with the surface of the item.

The edge image extraction module 1720 is used to extract an edge image from the two-dimensional image.

The corner data extraction module 1730 is used to extract corner data from the edge image.

The processing and analyzing module 1740 is configured to process and analyze the extracted corner data to obtain a target polygon representing an edge of the surface of the article.

Target polygon position determination module 1750 is used to determine three-dimensional position information for a target polygon from three-dimensional position information associated with a surface of an article.

The item surface location determination module 1760 is used to determine the location of the item surface using the three-dimensional location information of the target polygon.

According to the embodiment of the disclosure, the article is one article in a stack formed by closely arranging a plurality of articles, and the surface of the plurality of articles in at least one direction of the stack is a polygonal surface with the same number of sides.

According to an embodiment of the present disclosure, the article is a rectangular parallelepiped article. In this case, the processing and analyzing module 1740 may include a corner clustering sub-module, a quadrilateral extracting sub-module, and a target rectangle screening sub-module.

The angular point clustering submodule is used for clustering angular point data so as to reduce the number of angular points.

The quadrangle extraction submodule is used for extracting a plurality of quadrangles formed by connecting angular points.

And the target rectangle screening submodule is used for screening a plurality of quadrangles to obtain a target rectangle.

According to an embodiment of the present disclosure, the target rectangle screening submodule may include a quadrangle preliminary selection unit, a candidate quadrangle screening unit, a convolution kernel generation unit, and a convolution kernel matching unit.

The quadrangle primary selection unit is used for screening the quadrangles with the size in the target size range from the plurality of quadrangles.

The candidate quadrangle screening unit is used for screening one or more candidate quadrangles according to the coincidence degree between the quadrangles with the size in the target size range.

The convolution kernel generation unit is used for generating a convolution kernel corresponding to the size of the candidate quadrangle for each candidate quadrangle in the one or more candidate quadrangles, and the convolution kernel has an annular strip-shaped rectangular frame area.

And the convolution kernel matching unit is used for matching the corresponding candidate quadrangle by using the annular rectangular frame area of the convolution kernel corresponding to each candidate quadrangle to obtain the target rectangle.

According to an embodiment of the present disclosure, the convolution kernel matching unit may include an assignment subunit, a convolution subunit, a judgment subunit, and a determination subunit.

The assignment subunit is configured to assign 1 to the annular rectangular frame region and assign 0 to a region outside the annular rectangular frame region.

The convolution subunit is configured to convolve the corresponding candidate quadrangle with the convolution kernel to obtain a convolution value.

The judgment subunit is configured to judge, by using the convolution value, whether the matching degree of the candidate quadrangle and the frame region of the annular rectangle is greater than a predetermined value.

The determining subunit is configured to determine the candidate quadrangle as the target rectangle if the matching degree is greater than a prescribed value.

According to the embodiment of the present disclosure, if there are a plurality of candidate quadrilaterals having a matching degree with respect to the corresponding frame region of the annular band-shaped rectangle that is greater than a prescribed value, the determination subunit determines the candidate quadrilaterals having the highest matching degree as the target rectangle.

According to the embodiment of the present disclosure, the angles of the four corners of the quadrangle extracted by the quadrangle extraction sub-module are each in the range of 80 ° to 100 °.

According to an embodiment of the present disclosure, the item positioning device may further include a target picking pose calculation module and a sending module.

The target picking pose calculation module is used for calculating a target picking pose for picking the item according to the information of the position of the surface of the item.

The sending module is used for sending the target picking pose to the robot so that the robot can pick the object according to the target picking pose.

According to the embodiment of the disclosure, the article positioning device can realize accurate positioning of the article without depending on the size and the template of the article.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any of the obtaining module 1710, the edge image extraction module 1720, the corner data extraction module 1730, the processing and analysis module 1740, the target polygon position determination module 1750, and the item surface position determination module 1760 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the obtaining module 1710, the edge image extracting module 1720, the corner data extracting module 1730, the processing and analyzing module 1740, the target polygon position determining module 1750, and the article surface position determining module 1760 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, at least one of the acquisition module 1710, the edge image extraction module 1720, the corner data extraction module 1730, the processing analysis module 1740, the target polygon position determination module 1750, and the item surface position determination module 1760 may be implemented at least in part as computer program modules that, when executed, may perform corresponding functions.

It should be noted that the article positioning device in the embodiment of the present disclosure corresponds to the article positioning method in the embodiment of the present disclosure, and the description of the article positioning device specifically refers to the article positioning method, and is not repeated herein.

The robot of embodiments of the present disclosure includes one or more processors and memory for storing one or more instructions that, when executed by the one or more processors, cause the one or more processors to implement the above-described method.

Fig. 18 schematically shows a block diagram of a robot according to an embodiment of the present disclosure. The robot shown in fig. 18 is only an example, and should not bring any limitation to the functions and the range of use of the embodiments of the present disclosure.

As shown in fig. 18, a robot 1800 according to an embodiment of the present disclosure includes a processor 1801, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1802 or a program loaded from a storage portion 1808 into a Random Access Memory (RAM) 1803. The processor 1801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1801 may also include onboard memory for caching purposes. The processor 1801 may include a single processing unit or multiple processing units for performing the different actions of the method flows in accordance with embodiments of the present disclosure.

In the RAM1803, various programs and data necessary for the operation of the robot 1800 are stored. The processor 1801, ROM1802, and RAM1803 are connected to one another by a bus 1804. The processor 1801 performs various operations of the method flows according to embodiments of the present disclosure by executing programs in the ROM1802 and/or the RAM 1803. Note that the programs may also be stored in one or more memories other than ROM1802 and RAM 1803. The processor 1801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, robot 1800 may also include an input/output (I/O) interface 1805, input/output (I/O) interface 1805 also connected to bus 1804. Robot 1800 may also include one or more of the following components connected to I/O interface 1805: an input portion 1806 including a keyboard, a mouse, and the like; an output portion 1807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1808 including a hard disk and the like; and a communication section 1809 including a network interface card such as a LAN card, a modem, or the like. The communication section 1809 performs communication processing via a network such as the internet. A driver 1810 is also connected to the I/O interface 1805 as needed. A removable medium 1811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1810 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1808 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1809, and/or installed from the removable media 1811. The computer program, when executed by the processor 1801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The computer-readable storage medium of the disclosed embodiments stores executable instructions that, when executed by a processor, cause the processor to implement the above-described method.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM1802 and/or RAM1803 and/or one or more memories other than ROM1802 and RAM1803 described above.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, robots, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. An article positioning method comprising:

acquiring a two-dimensional image related to the surface of an article and three-dimensional position information related to the surface of the article;

extracting an edge image from the two-dimensional image;

extracting corner data from the edge image;

processing and analyzing the extracted corner data to obtain a target polygon representing the edge of the surface of the article;

determining three-dimensional position information of the target polygon from three-dimensional position information related to the surface of the object; and

and determining the position of the surface of the object by using the three-dimensional position information of the target polygon.

2. The method of claim 1, wherein the article is one of a plurality of closely packed articles in a stack having a polygonal surface with the same number of sides on at least one surface.

3. The method of claim 2, wherein the article is a cuboid article,

the processing and analyzing the extracted corner data to obtain a target polygon representing the edge of the surface of the article comprises:

clustering the angular point data to reduce the number of angular points;

extracting a plurality of quadrangles formed by connecting angular points; and

and screening the plurality of quadrangles to obtain a target rectangle.

4. The method of claim 3, wherein filtering a target rectangle from the plurality of quadrilaterals comprises:

screening quadrangles with the size within a target size range from the plurality of quadrangles;

screening one or more candidate quadrangles according to the coincidence degree between the quadrangles with the sizes within the target size range;

for each candidate quadrangle in the one or more candidate quadrangles, generating a convolution kernel corresponding to the size of the candidate quadrangle, wherein the convolution kernel has an annular strip-shaped rectangular frame area; and

and matching the corresponding candidate quadrangles by utilizing the annular strip-shaped rectangular frame area of the convolution kernel corresponding to each candidate quadrangle to obtain the target rectangle.

5. The method of claim 4, wherein matching each candidate quadrilateral with a circular-banded rectangular bounding box region of its corresponding convolution kernel to obtain the target rectangle comprises:

assigning 1 to the annular rectangular frame region and assigning 0 to the region outside the annular rectangular frame region;

convolving the corresponding candidate quadrangle by using the convolution core to obtain a convolution value;

judging whether the matching degree of the candidate quadrangle and the annular rectangular frame area is greater than a specified value or not by using the convolution value; and

6. The method of claim 5, wherein,

and if a plurality of candidate quadrangles with the matching degrees with the corresponding annular rectangular frame areas larger than a specified value exist, determining the candidate quadrangles with the highest matching degrees as the target rectangles.

7. The method according to claim 3, wherein, in extracting a plurality of quadrangles connected by corner points, angles of four corners of the extracted quadrangles are each in a range of 80 ° to 100 °.

8. The method of claim 1, wherein after determining the position of the surface of the item, further comprising:

calculating a target picking pose for picking the item according to the information of the position of the surface of the item; and

sending the target picking pose to a robot to cause the robot to pick the item according to the target picking pose.

9. An article positioning device comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a two-dimensional image related to the surface of an article and three-dimensional position information related to the surface of the article;

the edge image extraction module is used for extracting an edge image from the two-dimensional image;

the corner data extraction module is used for extracting corner data from the edge image;

the processing and analyzing module is used for processing and analyzing the extracted corner data to obtain a target polygon representing the edge of the surface of the article;

a target polygon position determining module, configured to determine three-dimensional position information of the target polygon from the three-dimensional position information related to the surface of the object; and

and the object surface position determining module is used for determining the position of the object surface by utilizing the three-dimensional position information of the target polygon.

10. A robot, comprising:

one or more processors;

a memory to store one or more instructions that,

wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

11. A computer readable storage medium storing executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 8.