CN117226854A

CN117226854A - Method and device for executing clamping task, storage medium and electronic equipment

Info

Publication number: CN117226854A
Application number: CN202311505956.2A
Authority: CN
Inventors: 王亚杰; 宛敏红; 张春龙; 杨嵘; 王文
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2023-12-15
Anticipated expiration: 2043-11-13
Also published as: CN117226854B

Abstract

The specification discloses a method and a device for executing a clamping task, a storage medium and electronic equipment. The method comprises the following steps: acquiring image data containing a target object, and determining contour information corresponding to the target object and environment information of an environment in which the target object is located; according to the contour information, pose information, shape information and semantic information corresponding to each part contained in the object are determined; determining a clamping loss value when each component in the object is clamped by different clamping postures according to the environment information, the pose information, the shape information, the semantic information and the preset clamping posture information corresponding to each clamping posture; and determining a target clamping gesture in all clamping gestures according to the clamping loss value, determining a target component in all components of the target object, and executing a clamping task aiming at the target component according to the target clamping gesture.

Description

Method and device for executing clamping task, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of robots, and in particular, to a method and apparatus for executing a clamping task, a storage medium, and an electronic device.

Background

The robot gripping technology is also an important research direction in the robot field as a basic function of the robot. Compared with a common two-finger gripper, the flexibility of the multi-finger dexterous hand enables the multi-finger dexterous hand to stably grip objects with more complex shapes, but one main problem faced in the gripping of the multi-finger dexterous hand is the high-dimensional characteristic of multiple degrees of freedom of the multi-finger dexterous hand, so that the gripping gesture planning of the multi-finger dexterous hand is complex. The underactuated smart mobile phone has the advantages of high integration level of the system on the hardware of the mobile phone, simplicity and high efficiency of an upper-layer motion control system, simplicity in design, low cost and the like, so that the underactuated smart mobile phone becomes a scheme adopted by most manipulators.

However, the existing clamping technology of the underactuated dexterous hand is still not mature, and can only be applied to some simple scenes of objects, and the problems that the clamped objects fall off, collide or the clamping position is unreasonable in the process of clamping are usually caused, so that the service requirements of some scenes with higher precision requirements are hardly met.

Therefore, how to improve the stability and safety of the multi-finger gripper in the process of executing the gripping task and fully meet the business requirements of different scenes is a problem to be solved.

Disclosure of Invention

The present disclosure provides a method and apparatus for executing a clamping task, a storage medium, and an electronic device, so as to partially solve the foregoing problems in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a method for performing a gripping task, the method being applied to a multi-finger gripper, comprising:

acquiring image data containing a target object, and determining contour information corresponding to the target object and environment information of an environment where the target object is located;

according to the contour information, pose information, shape information and semantic information corresponding to each part contained in the target object are determined;

determining a clamping loss value when each component in the target object is clamped through different clamping postures according to the environment information, the pose information, the shape information, the semantic information and preset clamping posture information corresponding to each clamping posture, wherein the loss value is used for determining the matching degree between the clamping postures and the component, and the larger the loss value is, the lower the matching degree is;

and determining a target clamping gesture in all clamping gestures according to the clamping loss value, determining a target component in all components of the target object, and executing a clamping task aiming at the target component according to the target clamping gesture.

Optionally, determining pose information and semantic information corresponding to each component included in the target object according to the contour information specifically includes:

acquiring point cloud data corresponding to the image data;

according to the contour information, determining a point cloud image corresponding to the target object in the point cloud data;

dividing the point cloud image to determine each component contained in the target object and three-channel color and depth RGBD information corresponding to each component in the point cloud image;

and determining semantic information corresponding to each part segmented in the point cloud image, and determining pose information according to the RGBD information.

Optionally, determining the pose information according to the RGBD information specifically includes:

for each component, RGBD information corresponding to the component is input into a preset pose determination model, 6D pose data corresponding to the component is determined through the pose determination model, and the pose information is determined according to the 6D pose data.

Optionally, determining the shape information corresponding to each component included in the target object specifically includes:

for each part segmented in the point cloud image, determining a basic shape corresponding to the part according to a point cloud surrounding area of the part;

And determining the corresponding shape information of the component according to the basic shape.

Optionally, determining, according to the environmental information, the pose information, the shape information, the semantic information, and preset clamping pose information corresponding to each clamping pose, a clamping loss value when each component in the target object is clamped by different clamping poses, where the method specifically includes:

for each clamping gesture, determining a first loss value when each component is clamped by the clamping gesture according to at least one of the environment information, the shape information, the position gesture information and the clamping gesture information corresponding to the clamping gesture, and

for each component, determining a corresponding second loss value when the component is clamped according to the pose information and semantic information corresponding to the component;

and determining the clamping loss value according to the first loss value corresponding to each clamping gesture and the second loss value corresponding to each component.

Optionally, for each clamping gesture, determining a first loss value when the clamping gesture clamps each component according to at least one of the environment information, the shape information, the pose information and the clamping gesture information corresponding to the clamping gesture specifically includes:

For each component, determining the probability of collision when the component is clamped by the clamping gesture according to the environment information, the clamping gesture information corresponding to the clamping gesture and the position gesture information corresponding to the component, determining a collision loss value according to the probability, and

determining the stability degree when the component is clamped by the clamping gesture according to the clamping gesture information corresponding to the clamping gesture, the position gesture information corresponding to the component and the form information, and determining a clamping stability loss value according to the stability degree;

the first loss value is determined based on the collision loss value when each component is gripped by the gripping posture and the gripping stability loss value.

Optionally, for each component, determining a second loss value corresponding to the component when the component is clamped according to the pose information and the semantic information corresponding to the component specifically includes:

determining the volume ratio of each component in the target according to the pose information corresponding to the component in the target;

and determining a corresponding second loss value when the component is clamped according to the volume ratio and the semantic information.

The specification provides a clamping task execution device, which comprises:

The acquisition module acquires image data containing a target object and determines contour information corresponding to the target object and environment information of an environment where the target object is located;

the segmentation module is used for determining pose information, shape information and semantic information corresponding to each part contained in the target object according to the contour information;

the determining module is used for determining a clamping loss value when each component in the target object is clamped through different clamping postures according to the environment information, the pose information, the shape information, the semantic information and preset clamping posture information corresponding to each clamping posture, wherein the loss value is used for determining the matching degree between the clamping postures and the component, and the larger the loss value is, the lower the matching degree is;

and the clamping module is used for determining a target clamping gesture in all clamping gestures according to the clamping loss value, determining a target component in all components of the target object, and executing a clamping task aiming at the target component according to the target clamping gesture. The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of performing a gripping task as described above.

The present specification provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method of executing the above-described pinching task when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the method for executing the clamping task provided by the specification, image data containing a target object is obtained, and contour information corresponding to the target object and environment information of an environment in which the target object is located are determined; according to the contour information, pose information, shape information and semantic information corresponding to each part contained in the object are determined; determining a clamping loss value when each component in the object is clamped by different clamping postures according to the environment information, the pose information, the shape information, the semantic information and the preset clamping posture information corresponding to each clamping posture; and determining a target clamping gesture in all clamping gestures according to the clamping loss value, determining a target component in all components of the target object, and executing a clamping task aiming at the target component according to the target clamping gesture.

According to the method, in the process of executing the clamping task aiming at the target object, the components are segmented, and the matching degree between different clamping postures and the components is determined by fully combining the information of the pose, the shape, the semantics and the like of each component and the environmental information. Therefore, the clamping gesture selected by the method can more stably clamp the selected target component, so that the collision problem in the clamping process is avoided, and the stability and safety of the clamping task are fully ensured.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

fig. 1 is a schematic flow chart of a method for executing a clamping task provided in the present specification;

FIG. 2 is a schematic illustration of a gripping gesture and a determined flow of gripping members provided herein;

FIG. 3 is a schematic diagram of a force threshold adaptive grabbing flow provided in the present specification;

FIG. 4 is a schematic diagram of a device for performing a gripping task provided in the present disclosure;

fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

At present, the planning method of the dexterous hand is mainly divided into an analysis method and an experience method. The analysis method converts the planning problem into the optimization problem with constraint, and the empirical method mainly uses a data driving mode, such as an artificial intelligent calculation method and the like. The high dimensional nature of the object shape and smart hand grip configuration places significant limitations on the above-described approach.

Aiming at the difficult problem in the clamping of the dexterous hand, the underactuated dexterous hand widely used at present is combined. The under-actuated smart hand clamping expert system is provided, the shape and posture information of an object to be clamped is obtained through the research results of artificial intelligence technologies such as RGB image segmentation, point cloud component segmentation and 6D posture estimation technology in the field of image processing, then a proper clamping posture is finally selected through the clamping loss function provided by the under-actuated smart hand clamping expert system and the preset multiple clamping gestures, and finally the clamping work of complex scenes of multiple objects can be effectively realized through the clamping system with adaptive force threshold.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method for executing a clamping task provided in the present specification, which includes the following steps:

S101: and acquiring image data containing a target object, and determining contour information corresponding to the target object and environment information of an environment in which the target object is positioned.

In the present specification, the method for executing a gripping task may refer to a server or an upper computer of a robot, and of course, may also be a robot terminal.

The method for executing the gripping task in the specification can be applied to a multi-finger gripper, and the multi-finger gripper can be a multi-finger dexterous hand, a humanoid manipulator or the like with two or more mechanical fingers.

The server can collect image data containing a target object through an image sensor (such as a camera) and acquire point cloud data corresponding to the image data, the image data and the point cloud data are all RGB data, and the server can collect the point cloud data through the point cloud sensor (such as a radar), and of course, the server can also directly convert the image data into a point cloud format to obtain the point cloud data.

In this specification, the sensor for image acquisition may be disposed on the gripper, but may also be disposed at an image acquisition site of the robot (e.g., an eye of the robot).

After the image data is acquired, the server can determine contour information corresponding to the target object in the image data. In this process, the server may input the image data into a preset image recognition model (such as Mask R-CNN, SAM, etc.), and perform semantic segmentation on the image through the image recognition model to determine contour information of all objects contained therein.

And then the server can identify the contour information of the object corresponding to the clamping instruction from the contour information based on the preset clamping instruction.

In addition, the server can also determine the environment information of the environment where the target object is located according to the relative position information between the contour information of the target object and the contour information of other objects.

S102: and determining pose information, shape information and semantic information corresponding to each part contained in the target object according to the contour information.

The server can cut the point cloud corresponding to the contour in the point cloud data according to the contour information corresponding to the target object, so as to cut out the point cloud image corresponding to the target object.

In practical application, the colors corresponding to different parts on the object often have differences, and parts on the object can be distinguished based on the structural information of the object, for example, for a water bottle with a bottle cap, certain differences exist between the color of the bottle cap and the color of the bottle body, certain concave or convex parts also exist at the junction of the bottle body and the bottle cap, and the server can distinguish the bottle cap and the bottle body by combining the differences of the colors and the concave or convex parts.

The server may divide each component included in the target object in the point cloud image in the above manner, and then intercept an RGB image surrounding area corresponding to each component in the point cloud, to obtain three-channel color and Depth (RGBD) information corresponding to each component, where the RGBD information is used to characterize the color and three-dimensional shape of the component.

In addition, the server may further perform fitting of the basic shape to the component according to the point cloud surrounding area of each component to determine the basic shape corresponding to the component, and then determine the shape information corresponding to the component according to the basic shape corresponding to the component, where the shape information may be used to characterize the basic shape type to which the target object belongs and the shape fitting error of the component.

In the present specification, the basic shape may include a sphere, a cylinder, a cuboid, a cube, etc., and of course, other regular and irregular basic shapes may be included, which is not particularly limited in the present specification.

In practical application, the server may determine a fitting formula corresponding to the shape of the component, and further determine the basic shape corresponding to the component according to the fitting formula of the component and the fitting formulas corresponding to the basic shapes.

The server may also determine semantic information corresponding to each of the divided parts, i.e. a specific meaning of the part name, such as "bottle cap", "bottle body" in the above example. In practical application, the semantic information can be identified through a preset image identification model.

Further, the server may input the RGBD information into a preset pose determination model (for example, denseFusion, PVN3D, etc.), so as to determine 6D pose data corresponding to the component through the pose determination model, and determine pose information according to the 6D pose data.

In practical applications, 6D refers to 6 degrees of freedom, representing a displacement (Translation) of 3 degrees of freedom, and a spatial Rotation (Rotation) of 3 degrees of freedom, collectively referred to as Pose (Pose). Pose is a relative concept that refers to displacement and rotation transformations between two coordinate systems.

S103: and determining a clamping loss value when each component in the target object is clamped through different clamping postures according to the environment information, the pose information, the shape information, the semantic information and preset clamping posture information corresponding to each clamping posture, wherein the loss value is used for determining the matching degree between the clamping postures and the component, and the larger the loss value is, the lower the matching degree is.

The server may determine, according to at least one of the environmental information, the shape information, the pose information, and the preset gripping pose information corresponding to the gripping pose, a first loss value when each component is gripped by the gripping pose, and determine, for each component, a second loss value corresponding to the gripped component according to the pose information of the component and other components and semantic information corresponding to the component, and further determine, according to the first loss value corresponding to each gripping pose and the second loss value corresponding to each component, a gripping loss value when each component in the object is gripped by different gripping poses.

The clamping loss value is used for determining the matching degree between the clamping gesture and the component, and the larger the clamping loss value is, the smaller the matching degree is, and the larger the matching degree is otherwise. The clamping gesture information can comprise parameter information such as positions, gestures and the like corresponding to different mechanical fingers in the clamp holder.

In this specification, the gripping gesture may include pinching, enveloping, and gripping, wherein pinching includes pinching with two fingers, pinching with three fingers, pinching with four fingers, and the like, enveloping includes horizontal enveloping and vertical enveloping, and gripping includes horizontal gripping and vertical gripping. Of course, other gripping gestures, such as gripping, may be included, which are not specifically limited in this description.

Specifically, for each component and each clamping gesture, the server may determine, according to the environmental information, the clamping gesture information corresponding to the clamping gesture, and the pose information corresponding to the component, a probability of collision when the component is clamped by the clamping gesture, and further determine a collision loss value according to the probability, where the greater the probability of collision, the greater the collision loss value, and vice versa.

For example, for the gesture of pinching with two fingers, since the space occupied by the two fingers in the pinching process is smaller, the two fingers are not easy to collide with other objects, so that the collision loss value is smaller. And when other objects around the target object are closer to the current component, larger collision loss can be determined.

In addition, the server can determine the clamping gesture loss when the component is clamped by the clamping gesture according to the clamping gesture information corresponding to the clamping gesture, the position gesture information corresponding to the component and the form informationThe gripping attitude loss indicates the degree of fit between the gripping attitude and the part form, and the higher the degree of fit, the smaller the gripping attitude loss, and conversely the larger the gripping attitude loss.

At the same time, the server can also determine the component form fitting error The component form fitting error is used to characterize the error value between the component and the shape to which the component is fitted.

The server can then determine a gripping stability penalty value for the component being gripped by the gripping pose based on the stability penalty value and the component form fitting errorThe clipping stability loss value may be expressed as:

the clamping stability loss value is used for representing the stability degree when the component is clamped through the clamping gesture, and the stability loss value is smaller when the stability degree is larger, and is larger when the stability degree is larger. The body fitting error and the clipping posture loss are in positive correlation with the clipping stability loss value.

For example, for a larger sized sphere part, the stability of the envelope gripping is significantly higher than for pinching, while for a vertically placed longer cuboid part, the stability of the horizontal gripping is significantly higher than for vertical gripping and other gripping.

For each gripping gesture, the server may determine a first loss value when the gripping gesture grips a different component based on the collision loss value and the gripping stability loss value when gripping each component by the gripping gesture.

Further, the server may determine the volume ratio of each component in the target according to the pose information corresponding to the component in the target, and determine the volume ratio loss according to the volume ratio and the semantic information of the componentAnd semantic duty cycle loss->Further, a corresponding rationality loss value +_when the component is clamped is determined according to the volume duty loss and the semantic duty loss>As a second loss value. The second loss value may be expressed as:

taking the bottle cap and the bottle body as examples, the bottle cap occupies smaller volume in the whole water bottle, so that the clamping difficulty is relatively higher, the larger the volume ratio loss is, the larger the bottle body occupies, the lower the clamping difficulty is, and the smaller the volume ratio loss is.

In addition, from the aspect of semanteme, the clamping difficulty of the bottle body is obviously lower than that of the bottle cover, so that the semanteme proportion loss of the bottle body is lower than that of the bottle cover.

In determining the volume fraction loss, the server may preset a volume threshold representing the volume of the object when the gripper maintains the optimal gripping state, and for each object, if the volume of the object is within the threshold, the volume fraction of the object is inversely related to the volume fraction loss or the second loss value, and if the volume of the object is outside the volume threshold, the volume fraction of the object is positively related to the volume fraction loss or the second loss value.

The server can clamp the loss value of each component in the object through different clamping postures according to the first loss value corresponding to each clamping posture and the second loss value corresponding to each componentThe clipping loss value may be expressed as:

thus, the server can obtain the clamping loss values corresponding to the various combinations of each clamping gesture and each component in the target object.

S104: and determining a target clamping gesture in all clamping gestures according to the clamping loss value, determining a target component in all components of the target object, and executing a clamping task aiming at the target component according to the target clamping gesture.

After determining the clamping loss values corresponding to the plurality of combinations formed by each clamping gesture and each component in the target object, the server can determine the optimal target clamping combination according to the loss values, and the target clamping combination can be the combination of the target clamping gesture with the minimum loss value and the target component.

In this way, the server can further determine the target clamping gesture and the target component, and then clamp the target component according to the target clamping gesture, thereby completing the clamping task for the target object.

For ease of understanding, the present description provides a schematic illustration of the gripping attitude and the determined flow of the gripping members, as shown in fig. 2.

Fig. 2 is a schematic view of a gripping gesture and a determined flow of gripping members provided in the present specification.

The server can extract the outline of an instance (target object) from the image data, then cut out an instance point cloud from the point cloud data, divide the instance point cloud into a plurality of parts, respectively perform shape fitting and pose information, and then determine an optimal target clamping pose based on clamping loss values corresponding to the combination of different clamping poses and the clamping parts to clamp the target part.

For the humanoid manipulator, the optimal gripping postures corresponding to the gripping components in different forms can be as follows:

the optimal clamping gesture corresponding to the small-size sphere part (such as a coin) can be double-finger pinching, the clamping gesture corresponding to the medium-size sphere part can be three-finger pinching (such as an egg), and the optimal clamping gesture corresponding to the large-size sphere part (such as an apple) can be vertical envelope.

The optimal gripping gesture corresponding to a small-sized cylindrical member (such as a pen) may be a double-finger pinching gesture, the optimal gripping gesture corresponding to a medium-sized cylindrical member may be a vertical envelope (such as a cup), and the optimal gripping gesture corresponding to a large-sized cylindrical member may be a horizontal envelope (such as a cylinder thermos cup with a longer size).

The optimal clamping gesture corresponding to the small-size cuboid component (such as rubber) can be double-finger pinching, the optimal clamping gesture corresponding to the middle-size cuboid component can be vertical grabbing (such as a camera), and the optimal clamping gesture corresponding to the large-size round cuboid component can be horizontal enveloping (such as a cuboid vacuum cup with a longer size).

Further, in the process of clamping the target component, the server can further analyze the material attribute information of the target component, and then determine preset clamping force information (such as a stress stroke, moment, and angle overrun) according to the material attribute information. And then executing the clamping task according to the clamping strength information.

For example, for objects such as cups that are hard and have a relatively high mass, the gripping force can be relatively high, so that the gripping stability can be ensured, while for objects such as fruits that are soft, the gripping force can be relatively low, so that the fruits themselves are prevented from being damaged.

For ease of understanding, the present disclosure provides a force threshold adaptive grabbing flow schematic, as shown in fig. 3.

Fig. 3 is a schematic diagram of a force threshold adaptive grabbing flow provided in the present specification.

The server can grab the target object based on preset information such as an initial angle, a limit joint angle, a joint movement speed, a joint limit moment and the like, monitor the joint moment and the joint angle in the grabbing action executing process, and reach a final grabbing state under the constraint of the stress stroke, the moment and the angle overrun.

According to the method, a complete solution of the underactuated smart hand grabbing technology is provided, the problem of grabbing a common object smart hand aiming at a complex scene is effectively solved, information such as the shape and the gesture of the object is acquired to the maximum extent, and sufficient information is preferably provided for the subsequent grabbing gesture.

In addition, the scheme comprehensively considers the influence of various factors, so that the grabbing is more in line with the habit of human beings, and the grabbing stability can be ensured.

In addition, according to the scheme, multiple grabbing gestures can be preset aiming at the same grabbing type, and the optimal grabbing gesture can be selected according to the object gesture and the surrounding environment. The adaptability to complex environments is ensured.

The above is a method for executing one or more clamping tasks in the present specification, and based on the same concept, the present specification further provides a corresponding device for executing the clamping tasks, as shown in fig. 4.

Fig. 4 is a schematic diagram of a device for performing a gripping task provided in the present specification, including:

an acquiring module 401, configured to acquire image data including a target object, and determine contour information corresponding to the target object and environmental information of an environment in which the target object is located;

a segmentation module 402, configured to determine pose information, shape information, and semantic information corresponding to each component included in the target object according to the contour information;

a determining module 403, configured to determine, according to the environmental information, the pose information, the shape information, the semantic information, and preset gripping pose information corresponding to each gripping pose, a gripping loss value when each component in the target object is gripped by different gripping poses, where the loss value is used to determine a matching degree between the gripping pose and the component, and the greater the loss value, the lower the matching degree;

and the clamping module 404 is configured to determine a target clamping gesture among the clamping gestures according to the clamping loss value, determine a target component among the components of the target object, and execute a clamping task for the target component according to the target clamping gesture.

Optionally, the segmentation module 402 is specifically configured to obtain point cloud data corresponding to the image data; according to the contour information, determining a point cloud image corresponding to the target object in the point cloud data; dividing the point cloud image to determine each component contained in the target object and three-channel color and depth RGBD information corresponding to each component in the point cloud image; and determining semantic information corresponding to each part segmented in the point cloud image, and determining pose information according to the RGBD information.

Optionally, the segmentation module 402 is specifically configured to input, for each component, RGBD information corresponding to the component into a preset pose determination model, so as to determine, by using the pose determination model, 6D pose data corresponding to the component, and determine the pose information according to the 6D pose data.

Optionally, the segmentation module 402 is specifically configured to determine, for each component segmented in the point cloud image, a basic shape corresponding to the component according to a point cloud surrounding area of the component; and determining the corresponding shape information of the component according to the basic shape.

Optionally, the determining module 403 is specifically configured to determine, for each gripping gesture, a first loss value when each component is gripped by the gripping gesture according to at least one of the environmental information, the shape information, the pose information, and the gripping gesture information corresponding to the gripping gesture, and determine, for each component, a second loss value corresponding to the component when the component is gripped according to the pose information and the semantic information corresponding to the component; and determining the clamping loss value according to the first loss value corresponding to each clamping gesture and the second loss value corresponding to each component.

Optionally, the determining module 403 is specifically configured to determine, for each component, a probability of collision when the component is clamped by the clamping gesture according to the environmental information, the clamping gesture information corresponding to the clamping gesture, and the pose information corresponding to the component, determine a collision loss value according to the probability, determine a stability degree when the component is clamped by the clamping gesture according to the clamping gesture information corresponding to the clamping gesture, the pose information corresponding to the component, and the form information, and determine a clamping stability loss value according to the stability degree; the first loss value is determined based on the collision loss value when each component is gripped by the gripping posture and the gripping stability loss value.

Optionally, the determining module 403 is specifically configured to determine, according to pose information corresponding to each component in the target object, a volume ratio of the component in the target object; and determining a corresponding second loss value when the component is clamped according to the volume ratio and the semantic information.

The present specification also provides a computer readable storage medium storing a computer program operable to perform a method of performing a gripping task as provided in fig. 1 above.

The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the method for executing the clamping task described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of performing a gripping task, the method being applied to a multi-finger gripper, comprising:

2. The method of claim 1, wherein determining pose information and semantic information corresponding to each component included in the target object according to the contour information specifically includes:

Acquiring point cloud data corresponding to the image data;

3. The method of claim 2, wherein determining the pose information from the RGBD information, in particular, comprises:

4. The method according to claim 2, wherein determining the shape information corresponding to each component included in the object specifically includes:

5. The method according to claim 1, wherein determining the clipping loss value when each component in the object is clipped by different clipping attitudes according to the environment information, the pose information, the shape information, the semantic information, and preset clipping attitudes information corresponding to each clipping attitudes, specifically comprises:

6. The method of claim 5, wherein for each gripping gesture, determining a first loss value for each component in the gripping gesture based on at least one of the environmental information, the shape information, the pose information, and the gripping gesture information corresponding to the gripping gesture, specifically comprises:

7. The method according to claim 5, wherein for each component, determining the corresponding second loss value when the component is clipped according to the pose information and the semantic information corresponding to the component specifically includes:

8. An apparatus for performing a gripping task, comprising:

and the clamping module is used for determining a target clamping gesture in all clamping gestures according to the clamping loss value, determining a target component in all components of the target object, and executing a clamping task aiming at the target component according to the target clamping gesture.

9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.