CN116228838A

CN116228838A - Object boxing reinforcement learning method and related device based on visual detection

Info

Publication number: CN116228838A
Application number: CN202310521000.5A
Authority: CN
Inventors: 胡瑞珍; 许聚展; 黄惠; 张皓; 龚明伦
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2023-05-10
Filing date: 2023-05-10
Publication date: 2023-06-06
Anticipated expiration: 2043-05-10
Also published as: CN116228838B

Abstract

The application discloses an object boxing reinforcement learning method and a related device based on visual detection, wherein the method comprises the steps of determining a placement state sequence and a candidate placement position sequence based on an object image and a container image acquired by the visual detection; controlling the object transfer boxing network model to determine a target putting state and a target putting position based on the putting state sequence and the candidate putting position sequence; and placing the target object at the target placing position according to the target placing state. According to the method and the device, the placement state and the candidate placement position are determined through the images acquired through visual detection, the placement state and the candidate placement position are matched through the object transfer boxing network model, the visual detection and the object transfer boxing network model are combined, the objects and the containers in the real scene are identified through the visual detection, the target placement state and the target placement position are determined through the object transfer boxing network model, and the boxing problem in the real scene is solved.

Description

Object boxing reinforcement learning method and related device based on visual detection

Technical Field

The application relates to the technical field of computer graphics, in particular to an object boxing reinforcement learning method based on visual detection and a related device.

Background

One study showed that approximately 25% of the solid waste produced annually was produced by parcel boxing. As an indispensable link in the transportation and storage industries, in the face of the increasing number of packages, a more effective and stable planning method is needed to guide, and the improvement of the boxing efficiency brings about significant influence on economy and environment.

At present, when the object boxing problem is faced, boxing planning is mainly performed by adopting two types of position searching strategies, wherein the first type is to traverse all the placeable positions of the target container, score each placeable position, and select the placeable position with the highest score as the placement position of the box, for example, the deepest, bottommost and leftmost strategies, the minimum height map strategy and the like are selected preferentially. The second category is to choose from a number of candidate locations, such as corner points, extreme points, etc. However, the existing loading planning generally only starts from the placeable position, a fixed box number needs to be input, and the box number in the real three-dimensional scene is generally not fixed, so that the existing boxing planning cannot well solve the problem of transferring and boxing objects in the real three-dimensional scene.

There is thus a need for improvements and improvements in the art.

Disclosure of Invention

The technical problem to be solved by the application is to provide an object boxing reinforcement learning method and a related device based on visual detection aiming at the defects of the prior art.

In order to solve the above technical problems, a first aspect of an embodiment of the present application provides an object boxing reinforcement learning method based on visual detection, the method including:

acquiring a container image and an object image, wherein the container image comprises a target container, and the object image comprises a plurality of objects;

determining a placement state sequence based on the object image, and determining a candidate placement position sequence based on the container image, wherein each placement state in the placement state sequence comprises size information and priority information for reflecting a blocking relationship between objects;

controlling an object transfer boxing network model to determine a target placement state and a target placement position row based on the placement state sequence and the candidate placement position sequence, wherein the object transfer boxing network model is obtained based on reinforcement learning training;

and placing the target object corresponding to the target placing state at the target placing position according to the target placing state.

The object boxing reinforcement learning method based on visual detection, wherein the object image based determination of the placement state sequence specifically comprises the following steps:

detecting the object image to obtain the space state and size information of each object in the object image;

constructing a priority map based on the space state of each object, and determining priority information of each object based on the priority map, wherein the priority map takes the objects as nodes and the blocking relation among the objects as directed edges;

and generating a placement state based on the size information and the priority information to obtain a placement state sequence.

The object boxing reinforcement learning method based on visual detection, wherein the construction of the priority map based on the space state of each object specifically comprises the following steps:

for each object, acquiring a first target object which blocks the movement of the object, and establishing a directional movement blocking edge between the object and the first target object;

determining a grabbing direction of the object based on the space state, acquiring a second target object which forms shielding to the object along the grabbing direction, and establishing a directional approach blocking edge between the object and the second target object, wherein the directional approach blocking edge carries the grabbing direction;

And taking each object as a node, and generating a priority map based on the directional movement blocking edge and the directional approaching blocking edge corresponding to each object.

The object boxing reinforcement learning method based on visual detection, wherein the object transferring boxing network model is controlled to determine a target putting state and a target putting position sequence based on the putting state sequence and the candidate putting position sequence, specifically comprises the following steps:

inputting the placing state sequence into a source encoder in an object transferring and boxing network model, and determining the state characteristics of each placing state through the source encoder, wherein the source encoder comprises an object encoder, an attention network and a normalization unit;

inputting the candidate placement position sequence into a target encoder in an object transfer boxing network model, and determining the position characteristic corresponding to each candidate placement position through the target encoder, wherein the target encoder comprises a space encoder, an attention network and a normalization unit;

and determining the matching scores of the state features and the position features through a matching module in the input object transferring and boxing network model, and determining the target placement state and the target placement position based on the matching scores of the state features and the position features.

The object boxing reinforcement learning method based on visual detection, wherein the step of inputting the placing state sequence into a source encoder in an object transferring boxing network model, and the step of determining the state characteristics of each placing state through the source encoder specifically comprises the following steps:

for each put state in a put state sequence, inputting size information in the put state into a multi-layer perceptron in an object encoder, and determining size characteristics through the multi-layer perceptron;

inputting the priority information in the placement state into a priority encoder in an object encoder, and determining a priority characteristic through the priority encoder;

splicing the size feature and the priority feature to obtain an initial feature, inputting the initial feature into an attention network, and determining an intermediate feature through the attention network;

and inputting the intermediate features into a normalization unit, and determining the state features through the normalization unit.

The object boxing reinforcement learning method based on visual detection, wherein the placing the object corresponding to the object placing state at the object placing position according to the object placing state specifically comprises:

Grabbing a target object corresponding to the target placement state according to the grabbing direction corresponding to the target placement state, and obtaining the target object size of the target object;

when the size of the target object is matched with the size information of the target object, placing the target object at the target placing position according to the target placing state;

if the size of the target object is not matched with the size information of the target object, updating the size information by adopting the size of the target object, and re-executing the step of controlling the object transfer boxing network model to determine the target placing state and the target placing position sequence based on the placing state sequence and the candidate placing position sequence.

The object boxing reinforcement learning method based on visual detection, wherein the training process of the object transferring boxing network model specifically comprises the following steps:

determining a sequence of pose states based on the object image and a sequence of candidate pose positions based on the container image;

inputting the placement state sequence and the candidate placement position sequence into an initial object transfer boxing network model, and determining a target placement state, a target placement position and a reward score through the object transfer boxing network model;

Inputting the placement state sequence and the candidate placement position sequence into an evaluator network, and determining an evaluation reward score through the evaluator network;

and performing reinforcement learning on the initial object transfer boxing network model and the evaluator network based on the reward points and the evaluation reward points to obtain an object transfer boxing network model.

A second aspect of embodiments of the present application provides a vision-based object boxing apparatus, the apparatus comprising

The device comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a container image and an object image, the container image comprises a target container, and the object image comprises a plurality of objects;

a determining module, configured to determine a sequence of placement states based on the object image, and determine a sequence of candidate placement positions based on the container image, where each placement state in the sequence of placement states includes size information and priority information for reflecting a blocking relationship between objects;

the control module is used for controlling an object transfer boxing network model to determine a target placement state and a target placement position column based on the placement state sequence and the candidate placement position sequence, wherein the object transfer boxing network model is obtained based on reinforcement learning training;

And the execution module is used for placing the target object corresponding to the target placing state at the target placing position according to the target placing state.

A third aspect of the embodiments provides a computer readable storage medium storing one or more programs executable by one or more processors to implement the steps in the vision-based object vanning reinforcement learning method as described in any one of the above.

A fourth aspect of the present embodiment provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps in the vision detection based object boxing reinforcement learning method as described in any one of the above.

The beneficial effects are that: compared with the prior art, the application provides an object boxing reinforcement learning method based on visual detection and a related device, wherein the method comprises the steps of obtaining a container image and an object image; determining a sequence of pose states based on the object image and a sequence of candidate pose positions based on the container image; controlling an object transfer boxing network model to determine a target placing state and a target placing position column based on the placing state sequence and the candidate placing position sequence; and placing the target object corresponding to the target placing state at the target placing position according to the target placing state. According to the method and the device, the placement state and the candidate placement position are determined through the images acquired through visual detection, the placement state and the candidate placement position of the object are matched through the object transfer boxing network model, so that the visual detection and the object transfer boxing network model are combined, the object information in the real scene is identified through the visual detection, then the object placement state and the object placement position are determined through the object transfer boxing network model, and the boxing problem in the real scene is well solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without creative effort for a person of ordinary skill in the art.

Fig. 1 is a flowchart of an object boxing reinforcement learning method based on visual detection.

Fig. 2 is a schematic diagram of a real scene boxing process.

Fig. 3 is a schematic diagram of a real scene.

Fig. 4 is an exemplary diagram of a priority diagram.

Fig. 5 is a schematic diagram of a priority map determination process.

Fig. 6 is a schematic structural diagram of an object transfer boxing network model.

Fig. 7 is a schematic diagram of the structure of a source encoder.

Fig. 8 is a schematic diagram comparing the method provided in the present embodiment with the prior art.

Fig. 9 is a schematic structural diagram of an object boxing reinforcement learning device based on visual detection.

Fig. 10 is a schematic structural diagram of a terminal device provided in the present application.

Detailed Description

The application provides an object boxing reinforcement learning method based on visual detection and a related device, and in order to make the purposes, technical schemes and effects of the application clearer and more definite, the application is further described in detail below by referring to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

It should be understood that the sequence number and the size of each step in this embodiment do not mean the sequence of execution, and the execution sequence of each process is determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiment of the present application.

It has been found that, according to one study, about 25% of the solid waste produced annually is due to parcel boxing. As an indispensable link in the transportation and storage industries, in the face of the increasing number of packages, a more effective and stable planning method is needed to guide, and the improvement of the boxing efficiency brings about significant influence on economy and environment.

In order to solve the above-described problems, in the embodiment of the present application, a container image and an object image are acquired; determining a sequence of pose states based on the object image and a sequence of candidate pose positions based on the container image; controlling an object transfer boxing network model to determine a target placing state and a target placing position column based on the placing state sequence and the candidate placing position sequence; and placing the target object corresponding to the target placing state at the target placing position according to the target placing state. According to the method and the device, the placement state and the candidate placement position are determined through the images acquired through visual detection, the placement state and the candidate placement position of the object are matched through the object transfer boxing network model, so that the visual detection and the object transfer boxing network model are combined, the object information in the real scene is identified through the visual detection, then the object placement state and the object placement position are determined through the object transfer boxing network model, and the boxing problem in the real scene is well solved.

The application will be further described by the description of embodiments with reference to the accompanying drawings.

The embodiment provides an object boxing reinforcement learning method based on visual detection, as shown in fig. 1 and 2, the method comprises the following steps:

S10, acquiring a container image and an object image.

Specifically, the container image and the object image can be obtained by detecting an object to be boxed in a real boxing scene and a target container for loading the object to be boxed through a visual detection device, the container image carries the target container, the object image carries a plurality of objects, and each of the objects in the object image is the object to be boxed. The container image and the object image each include depth information and image information, i.e., each pixel includes depth information and RGB information.

In this embodiment, the real boxing scene may be a working scene as shown in fig. 3, where the real boxing scene includes a plurality of stacked boxes stacked together, a target container for placement, and two cameras for observing the scenes, one of the two cameras for observing the target container, i.e. for capturing images of the container, and the other for observing the stacked boxes, i.e. for capturing images of the object.

S20, determining a placement state sequence based on the object image, and determining a candidate placement position sequence based on the container image.

Specifically, the placing state sequence includes a plurality of placing states, each of the plurality of placing states includes size information and priority information, wherein the size information includes an object size, an object posture of the object when the object is placed in the target container can be determined according to the object size, the priority information is used for reflecting a blocking relationship between the objects, and whether a blocking relationship exists between the object corresponding to the placing state and other objects, namely whether the object corresponding to the placing state is blocked by the other objects or not can be determined according to the priority information. The candidate placement position sequence comprises a plurality of candidate placement positions, and each candidate placement position is a residual maximum space of the target container, wherein the residual maximum space refers to a maximum cube space which cannot be continuously expanded along the directions of three coordinate axes.

In one implementation, the determining the pose state sequence based on the object image specifically includes:

s21, detecting the object image to obtain the space state and size information of each object in the object image;

s22, constructing a priority map based on the space state of each object, and determining priority information of each object based on the priority map;

s23, generating a placement state based on the size information and the priority information to obtain a placement state sequence.

Specifically, in the step S21, the spatial state is used to reflect the spatial pose and the spatial position of the object, the size information is used to reflect the object size of the object, for example, the object is a box, the size information includes the length, width and height of the box, that is, the x-axis length, the y-axis length and the z-axis length, and accordingly, the size information may be written as

Represents the size of the ith dimension, +.>

Representing the identified x-axis length, +.>

Representing the identified y-axis length, +.>

Representing the identified z-axis length. The determination of the spatial state and size informationThe method comprises the steps of firstly identifying object images through a trained identification network model to obtain object areas in the object images, and then identifying the object areas through a matching algorithm to obtain the space state and size information of each object, wherein the identification network model can adopt a Mask RCNN network, and the matching algorithm can adopt a square matching algorithm and the like. Of course, in practical applications, other manners of acquiring the spatial state and size information may be adopted, for example, through a trained multitasking model, the object image is input into the multitasking model, and the spatial state and size information of each object in the object image is directly input through the multitasking model.

Further, in the step S22, the priority map is configured to reflect the blocking relationship between the objects, and the priority information corresponding to each object may be determined based on the priority map, where the priority map is constructed by using the objects as nodes and the blocking relationship between the objects as directed edges, and the blocking relationship between the objects may be determined by edges between the nodes in the priority map. For example, in the priority graph shown in FIG. 4, there is a directed edge between node 1 and node 6 that node 6 points to node 1, illustrating that node 6 blocks node 1.

In one implementation manner, the constructing the priority map based on the spatial state of each object specifically includes:

s221, for each object, acquiring a first target object which blocks the movement of the object, and establishing a directional movement blocking edge between the object and the first target object;

s222, determining a grabbing direction of the object based on the space state, acquiring a second target object which forms shielding to the object along the grabbing direction, and establishing a directional approach blocking edge between the object and the second target object, wherein the directional approach blocking edge carries the grabbing direction;

s223, taking each object as a node, and generating a priority map based on the directional movement blocking edge and the directional approaching blocking edge corresponding to each object.

In particular, the directional movement blocking edge MB is used to reflect the inability to be blocked above an objectThe moving state, i.e. when there is a movement blocking edge between the object and the first target object, the first target object at least partly obstructs directly above the object. For example, when an object

Is a part of (1) which shields the object->

When just above (2), the object is->

Quilt object->

A state of blocking the movement, then the object +.>

For objects->

A corresponding first target object, in object +.>

And object->

A directional movement blocking edge MB is established between the two, and the direction of the directional movement blocking edge MB is defined as an object +.>

Pointing object->

. In one implementation, as shown in fig. 5, the determining process of the first target object may be to project each object from top to bottom in a vertical direction, so as to obtain a projected image; and detecting whether the projection areas of the two objects have overlapping areas or not based on the projection images, wherein when the overlapping areas exist, the object positioned above blocks the object positioned below from moving, so that the object positioned above is a first target object positioned below. />

The direction of the directional grabbing is a direction in which the object can be grabbed, and in this embodiment, the object is taken as an example of a box. As shown in fig. 5, when the boxes are gripped, gripping may be performed along any one of the 3 coordinate axis directions, that is, each box corresponds to three gripping directions; the three grabbing directions are respectively an x-axis direction, a y-axis direction and a z-axis direction. Thus, for each object, the x-axis direction, y-axis direction, and z-axis direction of the object may be determined based on the spatial state to obtain three gripping directions corresponding to the object. Of course, in practical application, the object may have only one grabbing direction, may be a cylindrical structure, and the grabbing direction is the central axis direction.

The directional approaching blocking edge AB is used for reflecting the state that the grabbing surface of the object along the grabbing direction cannot grab due to shielding of other objects, and the directional approaching blocking edge AB carries the grabbing direction. That is, when a directional approaching blocking edge exists between the object and the second target object, the second target object at least partially obscures the object in a gripping direction carried by the directional approaching blocking edge. For example, when an object

The gripping direction along the x-axis direction shields the object +.>

Object->

Quilt object->

Blocking in the x-axis direction, which makes it impossible to grasp in the x-axis direction, then the object +.>

For objects->

A corresponding second target object, at the object

And object->

A directional approach blocking edge is established between the two, and the direction of the directional approach blocking edge is the direction of the object +.>

Pointing object

. In addition, since the box corresponds to three grabbing directions of the X-axis direction, the Y-axis direction and the Z-axis direction, the directional approaching blocking edge AB between objects includes the directional approaching blocking edge of the X-axis direction, the directional approaching blocking edge of the Y-axis direction and the directional approaching blocking edge of the Z-axis direction, denoted XAB, YAB and ZAB, respectively, where X, Y and Z are used to reflect grabbing directions and AB is used to reflect the directional approaching blocking edge.

Further, after the priority map is obtained, as the two objects can have a blocking relationship between the two objects, namely a moving blocking relationship and a approaching blocking relationship, the approaching blocking relationship comprises approaching blocking relationships in a plurality of grabbing directions, therefore, a plurality of priority information exists between the two objects, the priority information corresponds to the grabbing directions one by one, and each priority information comprises approaching blocking relationships and moving blocking relationships in the grabbing directions corresponding to the priority information. Based on this, the priority information is used to reflect the blocking relationship between the object and each object identified in the object image, which can be expressed as

Wherein->

For a dimension equal to the number of objects identified by the object, a 01 vector is used to represent the object +.>

Movement blocking relation between all objects, +.>

For a vector 01 having a dimension equal to the number of objects recognized by the object, for representing the object +.>

Proximity blocking relationship with all objects. In addition, since the priority information reflects the blocking relationship between the object and each object identified in the object image, the priority information can be expressed by the blocking relationship between the object and each object identified in the object image, that is, the priority information can be expressed as +. >

，/>

=1, 2,. -%, n; n is the number of objects identified in the object image, < >>

And

。

for example, as shown in the priority graph of FIG. 4, the number of boxes is 8, node 1 is nearly blocked by node 6 along the x-axis, 6 points to 1 are shown in the priority graph, while there is no MB edge pointing to number 1, so that the MB edge can be represented by 0, AB edge can be represented by 6, so that the priority information of node 1 along the x-axis can be represented as {0,6}, which is converted into a 01 vector of

At the same time the priority information can also be expressed as +.>

。

Further, in step S223, after the size information and the priority information of each object are acquired, since the gripping directions corresponding to the priority information are different, when the object is gripped in each gripping direction and placed in the target container, the x-axis direction, the y-axis direction and the z-axis direction of the object are relatively changed, so that when the placement posture is determined, the coordinate axis direction of the object needs to be aligned with the coordinate axis direction of the container, and thus the coordinate axis of the containerThe directions are to the x-axis direction, the y-axis direction, and the z-axis direction of the object such that the x-axis length, the y-axis length, and the z-axis length match the x-axis direction, the y-axis direction, and the z-axis direction of the target container. In this embodiment, the object is a box, which can be grasped along the x-axis, y-axis and z-axis directions, and rotated 90 degrees about the grasping direction when the box is placed, whereby each grasping direction corresponds to two placing states, and 3 grasping directions corresponds to 6 placing states, wherein each placing state can be expressed as

Wherein->

Indicating the x-axis length, y-axis length and z-axis length in the put state +.>

Indicating priority information in the put state.

In one implementation, as shown in fig. 2, the candidate placement location sequence is determined based on the container image, and the determination of the candidate placement location sequence may be determining height information of the target container based on the container image, and discretizing the height information to obtain a height map of the container. However, a remaining maximum space (empty maximal spaces, hereinafter abbreviated as EMS) in the container space is calculated based on the height map, the EMS is a maximum square space in the container which cannot be continuously expanded along the three coordinate axis directions, and finally, the lower left corner of each EMS can be used as a candidate placement position to obtain a candidate placement position sequence.

And S30, controlling the object transfer boxing network model to determine a target putting state and a target putting position row based on the putting state sequence and the candidate putting position sequence.

Specifically, the object transferring and boxing network model is obtained in advance through reinforcement learning training and is used for determining a target placing state and a target placing position, wherein the target placing state is a final state that an object is placed in a target container, and the target placing position is used for placing the final position of the object. As shown in fig. 6, the object transferring and boxing network model may include a source encoder, a target encoder, and a matching module, where the source encoder is configured to encode the placement states, the target encoder is configured to encode the candidate placement states, and the matching module is configured to match an encoding feature obtained by encoding the source encoder with an encoding feature obtained by encoding the target encoder, so as to obtain a matching score of each placement state and each candidate placement position.

Based on the above, the control object transferring and boxing network model determines a target putting state and a target putting position sequence based on the putting state sequence and the candidate putting position sequence specifically includes:

s31, inputting the placing state sequence into a source encoder in an object transferring boxing network model, and determining the state characteristics corresponding to each placing state through the source encoder;

s32, inputting the candidate placement position sequence into a target encoder in an object transfer boxing network model, and determining the position characteristic corresponding to each candidate placement position through the target encoder, wherein the target encoder comprises a space encoder, an attention network and a normalization unit;

s33, determining the matching scores of the state features and the position features through a matching module in the input object transfer boxing network model, and determining the target placement state and the target placement position based on the matching scores of the state features and the position features.

Specifically, in step S31, the source encoder is configured to encode the placement state to obtain a state characteristic of the placement state. As shown in fig. 6, the source encoder includes an object encoder, an attention network and a normalization unit, where the object encoder is connected with the attention network and the normalization unit through an adder, the attention network is connected with the normalization unit, the object encoder is used to determine initial features corresponding to each placement state, after each initial feature is spliced by the adder, the attention network and the normalization unit are input respectively, the attention network determines intermediate features based on the initial features, and inputs the intermediate features into the normalization unit, and the normalization unit sums and normalizes the input initial features and intermediate features to obtain state features corresponding to each placement state.

In one implementation, inputting the sequence of pose states into a source encoder in an object transfer boxing network model, and determining, by the source encoder, state characteristics of each pose state specifically includes:

s311, for each placement state in the placement state sequence, inputting the size information in the placement state into a multi-layer perceptron in an object encoder, and determining the size characteristics through the multi-layer perceptron;

s312, inputting the priority information in the placement state into a priority encoder in an object encoder, and determining a priority characteristic through the priority encoder;

s313, splicing the size feature and the priority feature to obtain an initial feature, inputting the initial feature into an attention network, and determining an intermediate feature through the attention network;

s314, inputting the intermediate features into a normalization unit, and determining the state features through the normalization unit.

Specifically, as shown in fig. 7, the object encoder includes a multi-layer perceptron MLP for encoding the size information to obtain the size feature and a priority encoder for encoding the priority information to obtain the priority feature. The priority encoder comprises a first multi-layer perceptron MLP, an attention network, a first normalization unit, a second multi-layer perceptron MLP and a second normalization unit, wherein the first multi-layer perceptron is respectively connected with the attention network and the first normalization unit through adders, the attention network is connected with the first normalization unit, the first normalization unit is respectively connected with the second multi-layer perceptron and the second normalization unit, and the second multi-layer perceptron is connected with the second normalization unit. The first normalization unit and the second normalization unit are used for carrying out feature summation and normalization operation.

Further, the first multi-layer perceptron is configured to receive priority information

，

N is the number of objects identified in the object image>

Coding as high-dimensional features->

Then the high-dimensional feature ++>

After splicing, as key and value of the attention network, the high-dimensional feature +_>

As a query feature of the attention network, the attention network determines candidate features based on the key-value feature, the value feature, and the query feature, and then the candidate features obtain priority features through the first normalization unit, the second multi-layer perceptron, and the second normalization unit.

After the size feature and the priority feature are acquired, the size feature and the priority feature are spliced to obtain an initial feature, then the initial feature is input into an attention network, an intermediate feature is obtained through the attention network, and after the intermediate feature and the initial feature are summed, the summed feature is normalized to obtain a state feature. Based on this, the state feature sequence corresponding to the placement state sequence can be expressed as

Wherein->

D is the dimension of the feature vector and N represents the number of put states.

Further, in the step S32, as shown in fig. 6, the target encoder includes a spatial encoder and an attention network and a normalization unit, the spatial encoder and the attention network and the normalization unit, respectively The normalization unit is connected with the attention network, and the normalization unit is used for carrying out summation and normalization operation, wherein the space encoder can comprise two cascaded multi-layer perceptron MLPs. Each candidate placement position

Both including position information and size information, can be expressed as

Spatial encoder pair candidate placement position +.>

Performing feature coding to obtain initial position features, and inputting the initial position features into an attention network and a normalization unit to obtain position features so as to obtain a position feature sequence corresponding to the candidate placement position sequence>

Wherein->

D is the dimension of the feature vector and M represents the number of candidate pose positions in the sequence of candidate pose positions.

Further, in the step S33, the source encoder determines a state feature vector corresponding to the put state sequence

The target encoder determines the position feature vector corresponding to the candidate placement position sequence>

Vector elements in the state feature vector and the position feature vector are subjected to pairwise inner product to obtain a decision matrix +.>

Wherein->

, />

The symbol represents the vector inner product, which is the matching score of the placement state and the candidate placement position, for reflecting the placement state +. >

And candidate placement position->

The higher the matching score, the higher the matching degree between the placement state and the candidate placement position, and conversely, the lower the matching score, the lower the matching degree between the placement state and the candidate placement position.

In one implementation manner, after the matching score of each placement state and each candidate placement position is obtained by calculation, a problem that a part of objects cannot be placed in the candidate placement position in the placement state due to the fact that size information corresponding to the placement state is larger than the space size of the candidate placement position may exist, or a problem that the objects cannot be grabbed in the grabbing direction corresponding to the placement state due to blocking constraint in priority information; thus, it is also possible to base the determination of the mask of the sequence of put states and the sequence of candidate put positions

Where 0 indicates that no match can be made, and 1 indicates that match can be made to remove unpaired pose and candidate pose location through a mask. Thus, it is possible to base on the decision matrix +.>

And mask->

Determining the target decision matrix as +.>

Wherein->

Symbolically representing element-by-element multiplication and targetingDecision matrix->

The pair of placement states and candidate placement positions with the highest scores are used as the output of the object transfer and boxing network model TAP-Net++ so as to obtain the target placement states and target placement positions.

S40, placing the target object corresponding to the target placing state at the target placing position according to the target placing state.

Specifically, after the target placement state is obtained, a grabbing direction corresponding to the target placement state and a target object are obtained, the target object is grabbed according to the grabbing direction, and then the target object is placed at the target placement position according to the target placement state. In this embodiment, the grabbing work may be completed by a robot, and the robot sucks the target object through the suction cup, and then places the target object in the target placement state according to the target placement position, so as to complete the boxing operation of the target object.

In one implementation manner, the placing the target object corresponding to the target placement state at the target placement position according to the target placement state specifically includes:

s41, grabbing a target object corresponding to the target placement state according to the grabbing direction corresponding to the target placement state, and obtaining the target object size of the target object;

s42, when the size of the target object is matched with the size information of the target object, placing the target object at the target placing position according to the target placing state;

And S43, if the size of the target object is not matched with the size information of the target object, updating the size information by adopting the size of the target object, and re-executing the control object transfer boxing network model to determine a target putting state and a target putting position row based on the putting state sequence and the candidate putting position sequence.

Specifically, the target object size can be obtained by shooting the grabbed target object to obtain a target object image and identifying the target object image, so that the target object can be acquired by directly utilizing an image acquisition device for shooting the stacked object in the real scene. After the size of the target object is obtained, the size of the target object is matched with the size information of the target object, so that the accuracy of the size information of the target object is improved. This is because the image pickup apparatus for photographing the stacked object in the real scene is an image pickup apparatus (e.g., camera, etc.) of a single view angle, and then the object image is obtained by photographing the stacked object through the image pickup apparatus of the single view angle, so that there may be an error in the size information of the target object acquired based on the object image, and by photographing the target image alone and then recognizing the photographed target object image, the accuracy of the recognized target object size can be ensured, and thus the boxing operation can be ensured to be performed safely.

Based on the method, after the robot grabs the target object, the robot is hung in the air to independently shoot the grabbed target object to obtain a target object image, then the target object image is identified to obtain the target object size, and if the target object size is matched with the size information of the target object (namely, the size information identified based on the object image), the target object is placed at a target placing position according to a target placing state through the robot; otherwise, if the size of the target object is not matched with the size information of the target object, the size of the target object is used as the size information of the target object, and then the size information is transferred to the object transfer boxing network model TAP-Net++, so that the object transfer boxing network model TAP-Net++ determines a target placement state and a target placement position according to the updated size information of the target object, and then the robot is controlled to execute placement operation based on the updated target placement state and the target placement position. In addition, when the robot performs the placing operation, the steps of acquiring the object image and the container image are re-performed until all the stacked objects in the real scene are loaded into the container. Of course, it should be noted that when the matching scores of each placement state and each candidate placement position are determined by the object transfer and boxing network model TAP-net++, if the matching scores of the plurality of placement states and each candidate placement position are smaller than the preset score threshold, it is determined that the target container is already assembled, and then a new container is added as the target container.

In one implementation, the training process of the object transfer boxing network model specifically includes:

Specifically, the initial object transferring and boxing network model and the evaluation network model are both preset network models, wherein the mode structure of the initial object transferring and boxing network model is the same as the model structure of the object transferring and boxing network model, and the difference between the initial object transferring and boxing network model and the evaluation network model is that the initial object transferring and boxing network model adopts the initial network model, and the object transferring and boxing network model adopts the trained network model.

The object transfer boxing network model is trained by using an 'executor-evaluator' strategy in reinforcement learning, an initial object transfer boxing network model is taken as an executor in the training process of the object transfer boxing network model, an evaluation network model is taken as an evaluator, and the initial object transfer boxing network model outputs a target placement state, a target placement position and a reward score in each step in the training process, wherein the reward score is determined by a preset reward function, and the reward function is used for evaluatingAnd estimating the boxing quality of boxing the target object based on the target placement state and the target placement position. The input items of the evaluator network model are a placement state sequence and a candidate placement position sequence, and the input items are output as evaluation reward points which are used as estimated optimal reward points corresponding to the placement state sequence and the candidate placement position sequence. After the bonus points and the evaluation bonus points are obtained, the model parameters of the initial object transfer boxing network model and the model parameters of the evaluator network are synchronously adjusted according to the difference between the bonus points and the evaluation bonus points, and after training is completed, the initial object transfer boxing network model with the adjusted model parameters is used as an object transfer boxing network model. In one implementation, the compactibility of the case used is that of the case

Compactness is defined as total volume of boxed boxes as a reward function +.>

And the current highest height multiplied by the container bottom area>

The ratio between them.

In summary, the present embodiment provides an object boxing reinforcement learning method based on visual detection, which includes obtaining a container image and an object image; determining a sequence of pose states based on the object image and a sequence of candidate pose positions based on the container image; controlling an object transfer boxing network model to determine a target placing state and a target placing position column based on the placing state sequence and the candidate placing position sequence; and placing the target object corresponding to the target placing state at the target placing position according to the target placing state. According to the method and the device, the placement state and the candidate placement position are determined through the images acquired through visual detection, the placement state and the candidate placement position of the object are matched through the object transfer boxing network model, so that the visual detection and the object transfer boxing network model are combined, the object information in the real scene is identified through the visual detection, then the object placement state and the object placement position are determined through the object transfer boxing network model, and the boxing problem in the real scene is well solved.

In order to further verify the object boxing reinforcement learning method based on visual detection provided by the embodiment, experiments based on simulator simulation and experiments of a true machine show that the object transferring boxing system can be practically applied to a real robot scene. As shown in FIG. 8, the method provided in this example was compared with TAP-Net in an experiment with the bottom area of

The transfer boxing effect of 10 boxes with any size is tested, 1000 groups of data are counted and averaged, the average compactness of the TAP-Net algorithm is 0.536, and the method can reach 0.648, so that the transfer boxing effect is remarkably improved, and objects can be transferred and boxed by using fewer containers and higher compactness.

Based on the above-mentioned object boxing reinforcement learning method based on visual detection, the embodiment provides an object boxing device based on visual detection, as shown in fig. 9, the device comprises

An acquisition module 100 for acquiring a container image and an object image, wherein the container image comprises a target container and the object image comprises a plurality of objects;

a determining module 200, configured to determine a sequence of placement states based on the object image, and determine a sequence of candidate placement positions based on the container image, where each placement state in the sequence of placement states includes size information and priority information for reflecting a blocking relationship between objects;

The control module 300 is configured to control an object transferring and boxing network model to determine a target placement state and a target placement position column based on the placement state sequence and the candidate placement position sequence, where the object transferring and boxing network model is obtained based on reinforcement learning training;

and the execution module 400 is configured to place the target object corresponding to the target placement state at the target placement position according to the target placement state.

Based on the above-described visual inspection-based object boxing reinforcement learning method, the present embodiment provides a computer-readable storage medium storing one or more programs executable by one or more processors to implement the steps in the visual inspection-based object boxing reinforcement learning method as described in the above-described embodiments.

Based on the above-mentioned object boxing reinforcement learning method based on visual detection, the present application also provides a terminal device, as shown in fig. 10, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory) 22, which may also include a communication interface (Communications Interface) 23 and a bus 24. Wherein the processor 20, the display 21, the memory 22 and the communication interface 23 may communicate with each other via a bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may invoke logic instructions in the memory 22 to perform the methods of the embodiments described above.

Further, the logic instructions in the memory 22 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product.

The memory 22, as a computer readable storage medium, may be configured to store a software program, a computer executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 performs functional applications and data processing, i.e. implements the methods of the embodiments described above, by running software programs, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. In addition, the memory 22 may include high-speed random access memory, and may also include nonvolatile memory. For example, a plurality of media capable of storing program codes such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or a transitory storage medium may be used.

In addition, the specific processes that the storage medium and the plurality of instruction processors in the terminal device load and execute are described in detail in the above method, and are not stated here.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. An object boxing reinforcement learning method based on visual detection, which is characterized by comprising the following steps:

2. The visual inspection-based object boxing reinforcement learning method in accordance with claim 1, wherein the determining a putting state sequence based on the object image specifically comprises:

3. The visual inspection-based object boxing reinforcement learning method in accordance with claim 2, wherein the constructing a priority map based on the spatial state of each object specifically comprises:

4. The vision-detection-based object boxing reinforcement learning method according to claim 1, wherein the controlling the object transfer boxing network model to determine the target putting state and the target putting position sequence based on the putting state sequence and the candidate putting position sequence specifically comprises:

5. The visual inspection-based object boxing reinforcement learning method in accordance with claim 4, wherein said inputting the set of put states into a source encoder in an object transfer boxing network model, determining the state characteristics of each put state by the source encoder specifically comprises:

6. The method for reinforcement learning of object boxing by visual inspection according to claim 1, wherein the placing the object corresponding to the object placing state at the object placing position according to the object placing state specifically comprises:

7. The visually inspected object boxing reinforcement learning method according to claim 1, wherein the training process of the object transfer boxing network model specifically comprises:

8. An object boxing device based on visual detection, which is characterized by comprising

9. A computer readable storage medium storing one or more programs executable by one or more processors to implement the steps in the visual inspection-based object boxing reinforcement learning method in accordance with any one of claims 1-7.

10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps in the visual inspection-based object boxing reinforcement learning method as recited in any one of claims 1-7.