CN116197887B

CN116197887B - Image data processing method, device, electronic equipment and storage medium for generating grabbing auxiliary image

Info

Publication number: CN116197887B
Application number: CN202111426988.4A
Authority: CN
Inventors: 崔致豪; 丁有爽; 邵天兰
Original assignee: Mech Mind Robotics Technologies Co Ltd
Current assignee: Mech Mind Robotics Technologies Co Ltd
Priority date: 2021-11-28
Filing date: 2021-11-28
Publication date: 2024-01-30
Anticipated expiration: 2041-11-28
Also published as: CN116197887A

Abstract

The application discloses a method, a device, electronic equipment and a storage medium for generating a grabbing auxiliary image. The method for generating the grabbing auxiliary image comprises the following steps: acquiring image data comprising one or more items to be grabbed; outputting the image data and an operable control to form an interactive interface, wherein the control is operable by a user to select a grabbing auxiliary image and display the selected grabbing auxiliary image to the user; responding to the operation of the control by the user, acquiring grabbing auxiliary data corresponding to the grabbing auxiliary image selected by the user; generating a grabbing auxiliary layer based on the acquired grabbing auxiliary data; the capture assistance layer is combined with image data comprising one or more items to be captured to generate a user selected capture assistance image. The invention enables a user to intuitively determine various parameters in the process of the robot executing grabbing, and further determines how to adjust the parameters of the robot so that the robot can operate according to the needs of the user.

Description

Image data processing method, device, electronic equipment and storage medium for generating grabbing auxiliary image

Technical Field

The present invention relates to the field of automatic control of a robot arm or a gripper, program control B25J, and more particularly, to an image data processing method, apparatus, electronic device, and storage medium that generate a capture assist image.

Background

The robot has the basic characteristics of perception, decision making, execution and the like, can assist or even replace human beings to finish dangerous, heavy and complex work, improves the working efficiency and quality, serves the life of the human beings, and enlarges or extends the activity and capacity range of the human beings. With the development of industrial automation and computer technology, robots are beginning to enter mass production and practical application stages. In industrial settings, industrial robots have found widespread use, capable of performing some repetitive or dangerous work instead of humans. Traditional industrial robot designs focus on the design and manufacture of robot hardware, which is not "intelligent" by itself. When the robot is used in an industrial field, technicians need to plan hardware equipment, a production line, material positions, task paths of the robot and the like of the whole industrial field in advance, for example, if articles are to be sorted and carried, field workers need to sort out different types of articles and neatly put the articles into material frames with uniform specifications, before the robot is used for operation, the production line, the material frames, carrying positions and the like need to be determined, and a fixed motion path, a fixed grabbing position, a fixed rotating angle and a fixed clamp are set for the robot according to determined information.

As an improvement of the conventional robot technology, an intelligent program-controlled robot based on robot vision has been developed, however, the current 'intelligence' is simpler, and the main implementation mode is that image data related to a task is acquired through a vision acquisition device such as a camera, 3D point cloud information is acquired based on the image data, and then the operation of the robot is planned based on the point cloud information, including information such as movement speed and movement track, so as to control the robot to execute the task. However, the existing robot control schemes do not work well when they encounter complex tasks. For example, in the scenes of super business, logistics and the like, a plurality of stacked articles are processed, the mechanical arm is required to sequentially position and identify the positions of the articles by means of vision equipment in the scattered and unordered scenes, the articles are picked up by using suction cups, clamps or other bionic instruments, and the picked articles are placed at corresponding positions according to a certain rule by the operations of mechanical arm movement, track planning and the like. Under such an industrial scene, a robot is used to perform grabbing, for example, the number of objects to be grabbed in the scene is too large, light rays are uneven, so that the quality of point clouds of partial objects is poor, and the grabbing effect is affected; the objects are various, are not orderly placed and face the five-flower eight door, so that when each object is grabbed, the grabbing points are different, and the grabbing position of the clamp is difficult to determine; the stacked articles are easy to generate the condition that other articles fly when one article is grabbed. Under the industrial scene, the factors influencing the difficulty in grabbing the objects are more, and the effect of the traditional grabbing and sorting method is not good enough; in addition, when the grabbing algorithm is designed to be more complex, more barriers are brought to site workers, and when a problem occurs, the site workers have difficulty in finding out why the problem occurs and how to adjust the problem to solve the problem, and often the robot provider is required to send out an expert to assist.

Disclosure of Invention

The present invention has been made in view of the above problems, and aims to overcome or at least partially solve the above problems. Specifically, the invention provides a method for visually displaying parameters and image data associated with the grabbing control method to a user, so that the user can intuitively determine various parameters in the grabbing process of the robot under the condition that the operation principle of the robot is not known, understand what the robot can perform tasks in a certain way, and further determine how to adjust the parameters of the robot so that the robot can operate according to the needs of the user.

All of the solutions disclosed in the claims and the description of the present application have one or more of the innovations described above, and accordingly, one or more of the technical problems described above can be solved. Specifically, the application provides a method, a device, electronic equipment and a storage medium for generating a grabbing auxiliary image.

The method for generating the grabbing auxiliary image comprises the following steps of:

acquiring image data comprising one or more items to be grabbed;

outputting the image data and an operable control to form an interactive interface, wherein the control is operable by a user to select a grabbing auxiliary image and display the selected grabbing auxiliary image to the user;

Responding to the operation of the control by the user, acquiring grabbing auxiliary data corresponding to the grabbing auxiliary image selected by the user;

generating a grabbing auxiliary layer based on the acquired grabbing auxiliary data;

the capture assistance layer is combined with image data comprising one or more items to be captured to generate a user selected capture assistance image.

In some embodiments, the image data is within the same interactive interface as the operable controls.

In some embodiments, the image data is within a different interactive interface than the operable controls.

In some embodiments, the different interactive interfaces switch in response to user operation.

In some embodiments, the capture assistance data comprises: a value associated with the user selected capture assistance image and a mask of the grippable region of the item to be captured.

In some embodiments, the combining the grip facilitation layer with image data comprising one or more items to be gripped comprises: after adjusting the color, transparency and/or contrast of the grabbing auxiliary layer, the adjusted grabbing auxiliary layer is combined with image data comprising one or more objects to be grabbed.

The device for generating a grabbing auxiliary image in an embodiment of the application comprises:

the image data acquisition module is used for acquiring image data comprising one or more objects to be grabbed;

the interactive interface display module is used for outputting the image data and an operable control to form an interactive interface, wherein the control can be operated by a user to select to capture the auxiliary image and display the selected auxiliary image to the user;

the auxiliary data acquisition module is used for responding to the operation of the control by the user and acquiring grabbing auxiliary data corresponding to the grabbing auxiliary image selected by the user;

the auxiliary layer generation module is used for generating a grabbing auxiliary layer based on the acquired grabbing auxiliary data;

an auxiliary image generation module for combining the capture auxiliary image layer with image data comprising one or more items to be captured to generate a user selected capture auxiliary image.

In some embodiments, the auxiliary image generation module is further configured to: after adjusting the color, transparency and/or contrast of the grabbing auxiliary layer, the adjusted grabbing auxiliary layer is combined with image data comprising one or more objects to be grabbed.

The electronic device of the embodiment of the application comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method for generating the grabbing auxiliary image of any embodiment when executing the computer program.

The computer-readable storage medium of an embodiment of the present application has stored thereon a computer program which, when executed by a processor, implements the method of generating a capture auxiliary image of any of the above embodiments.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic illustration of mask pretreatment according to certain embodiments of the present application;

FIG. 2 is a flow diagram of a method of visualizing parameters for a grip in accordance with certain embodiments of the present application;

FIGS. 3a and 3b are schematic illustrations of a visual image presented to a user after visualization of a visual menu and selection of height and suction cup size in accordance with certain embodiments of the present application;

FIG. 4 is a schematic structural view of a grasping parameter visualization device according to certain embodiments of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to some embodiments of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In the description of the specific embodiments, it should be understood that the terms "center," "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the invention.

Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.

The invention can be used in industrial robot control scenes based on visual identification. A typical vision-based industrial robot control scenario includes devices for capturing images, control devices such as hardware for a production line and a PLC for the production line, robot components for performing tasks, and operating systems or software for controlling these devices. The means for capturing images may include a 2D or 3D smart/non-smart industrial camera, which may include an area camera, a line camera, a black and white camera, a color camera, a CCD camera, a CMOS camera, an analog camera, a digital camera, a visible light camera, an infrared camera, an ultraviolet camera, etc., depending on different functions and application scenarios; the production line can comprise a packaging production line, a sorting production line, a logistics production line, a processing production line and the like which need robots; the robot parts used in the industrial scene for performing tasks may be biomimetic robots, such as a human-type robot or a dog-type robot, or may be conventional industrial robots, such as a mechanical arm, etc.; the industrial robot may be an operation type robot, a program controlled type robot, a teaching reproduction type robot, a numerical control type robot, a sensory control type robot, an adaptation control type robot, a learning control type robot, an intelligent robot, or the like; the mechanical arm can be a ball-and-socket type mechanical arm, a multi-joint mechanical arm, a rectangular coordinate mechanical arm, a cylindrical coordinate mechanical arm, a polar coordinate mechanical arm and the like according to the working principle, and can be a grabbing mechanical arm, a stacking mechanical arm, a welding mechanical arm and an industrial mechanical arm according to the functions of the mechanical arm; the end of the mechanical arm can be provided with an end effector, and the end effector can use a robot clamp, a robot gripper, a robot tool quick-change device, a robot collision sensor, a robot rotary connector, a robot pressure tool, a compliance device, a robot spray gun, a robot burr cleaning tool, a robot arc welding gun, a robot electric welding gun and the like according to the requirements of tasks; the robot clamp can be various universal clamps, and the universal clamps refer to clamps with standardized structures and wide application range, such as a three-jaw chuck and a four-jaw chuck for a lathe, a flat tongs and an index head for a milling machine, and the like. For another example, the clamp may be classified into a manual clamp, a pneumatic clamp, a hydraulic clamp, a gas-liquid linkage clamp, an electromagnetic clamp, a vacuum clamp, etc. or other bionic devices capable of picking up an article, according to a clamping power source used for the clamp. The device for collecting images, the control devices such as hardware for a production line, a PLC (programmable logic controller) for the production line and the like, the robot parts for executing tasks and the operating system or software for controlling the devices can communicate based on TCP (transmission control protocol), HTTP (hyper text transfer protocol) and GRPC (generic personal computer) protocols (Google Remote Procedure Call Protocol ) so as to transmit various control instructions or commands. The operating system or software may be disposed in any electronic device, typically such electronic devices include industrial computers, personal computers, notebook computers, tablet computers, cell phones, etc., which may communicate with other devices or systems by wired or wireless means. Further, the gripping appearing in the present invention refers to any gripping action capable of controlling an article to change the position of the article in a broad sense, and is not limited to gripping the article in a narrow sense in a "gripping" manner, in other words, gripping the article in a manner such as suction, lifting, tightening, or the like, and also falls within the scope of the gripping of the present invention. The articles to be grasped in the present invention may be cartons, plastic soft packs (including but not limited to snack packages, milk tetra pillow packages, milk plastic packages, etc.), cosmeceutical bottles, cosmeceuticals, and/or irregular toys, etc., which may be placed in a floor, tray, conveyor belt, and/or material basket.

In an actual industrial scenario, the field staff is generally allowed to set various parameters of the robot for a specific grabbing task, however, the field staff is not familiar with the grabbing principle, so when a problem is found, it is not clear where the problem is present, nor how to modify the settings to solve the problem. For example, when gripping a plurality of stacked articles to be gripped, a situation occurs in which the articles are brought out of the frame, and the field staff judges that the article on the upper layer is gripped directly by the gripper, but he cannot determine why the robot would consider the priority value of the articles on the lower layer higher, nor is it clear how to set the weight to change the gripping order of the robot. In order to solve the problem, the inventor has developed a set of methods for visually displaying the graphics and parameters involved in the grabbing process to the field staff for operation according to the needs of the field staff, which is also one of the important points of the present invention.

FIG. 2 shows a flow diagram of a method of visualizing graphics and parameters in a grabbing process according to an embodiment of the invention. As shown in fig. 2, the method includes:

Step S400, obtaining image data comprising one or more objects to be grabbed;

step S410, outputting the image data and an operable control to form an interactive interface, wherein the control is operable by a user to select a grabbing auxiliary image and display the selected grabbing auxiliary image to the user;

step S420, responding to the operation of the control by the user, and acquiring grabbing auxiliary data corresponding to the grabbing auxiliary image selected by the user;

step S430, a grabbing auxiliary layer is generated based on the acquired grabbing auxiliary data;

step S440 combines the capture assistance layer with image data comprising one or more items to be captured to generate a user selected capture assistance image.

For step S400, the present invention may be applied to an industrial scene including one or more objects to be grasped, sequentially grasping all the objects to be grasped using a jig, and arranging the grasped objects at specific positions. The type of image data and the acquisition method are not limited in this embodiment. As an example, the acquired image data may include a point cloud or an RGB color map, the point cloud information may be acquired through a 3D industrial camera, and the 3D industrial camera is generally equipped with two lenses, which capture the object group to be grabbed from different angles, respectively, and the three-dimensional image of the object can be displayed after processing. And placing the object group to be grabbed below the vision sensor, shooting by two lenses at the same time, and calculating X, Y, Z coordinate values of each point and coordinate directions of each point of the glass to be glued by using a universal binocular stereoscopic vision algorithm according to the obtained relative attitude parameters of the two images so as to convert the X, Y, Z coordinate values and the coordinate directions of each point into point cloud data of the object group to be grabbed. In the specific implementation, the point cloud can also be generated by using elements such as a laser detector, a visible light detector such as an LED, an infrared detector, a radar detector and the like, and the specific implementation of the invention is not limited.

The point cloud data acquired in the mode is three-dimensional data, so that the data corresponding to the dimension with small influence on grabbing is filtered, the data processing amount is reduced, the data processing speed is further increased, the efficiency is improved, and the acquired three-dimensional point cloud data of the object group to be grabbed can be orthographically mapped to a two-dimensional plane.

As an example, a depth map corresponding to the orthographic projection may also be generated. A two-dimensional color map corresponding to the three-dimensional object region and a depth map corresponding to the two-dimensional color map may be acquired in a direction perpendicular to the depth of the object. Wherein the two-dimensional color map corresponds to an image of a planar area perpendicular to a preset depth direction; each pixel point in the depth map corresponding to the two-dimensional color map corresponds to each pixel point in the two-dimensional color map one by one, and the value of each pixel point is the depth value of the pixel point. .

For step S410, the shot picture and the control may be output to a display for presentation to the user. The interaction between the user and the robot may be performed by touch operation, voice operation, or conventional operation of a device, such as a mouse, a keyboard, etc., which is not limited by the present invention. The interactive interfaces are channels for information exchange between people and the computer system, the user inputs information to the computer system through the interactive interfaces and operates the computer system, the computer provides information for the user through the interactive interfaces for reading, analyzing and judging, and each interactive interface comprises an information display interface provided by the interactive interface and a control which can be operated by the user. The control for controlling the visualization can be displayed on an interactive interface with the image as a whole, can be divided into two interfaces with the image, and provides an interface for turning to the control interface on the image interface, and provides an interface for turning to the image interface on the space interface, and when a user operates the interface, the user turns to the control interface or the image interface. As shown in fig. 3a, the control interface may select operations related to visualization, including: and opening the visualized operation, displaying the outline of the overlapped object and the visualized attribute. Wherein the visual properties may comprise any of the parameters output in any of the previous embodiments, the alternative visual properties of fig. 3a comprise: ALL, height display according to pose, size display according to a sucker, pressing and overlapping degree display, transparency degree display and pose orientation display.

For step S420, the user may select the value of interest according to his own needs. For example, when the user finds that the robot does not grasp in the order expected by the user, the ALL control may be selected to display the grasp priority value of each object to be grasped to determine the difference between the actual grasp order and the grasp order expected by the user, and then select the specific visual attribute separately to determine which attribute affects the grasp order. When the user selects a certain visual option, the system searches and invokes the corresponding data. As a preferred embodiment, the system will obtain the parameters selected by the user and the mask of the grippable area in response to the user's selection to be used together as auxiliary data, for example, when the user selects "display by suction cup size", the system will call the mask of the grippable area and the value of suction cup size at the same time; similarly, when the user selects "display by pose height", the mask of the grippable region and the mask height feature value are called.

One possible embodiment of determining the grabber area and generating the mask may be to first, after acquiring image data comprising one or more objects to be grabbed, process the image data to identify each pixel in the image, e.g. for a 256 x 256 image 256 x 65536 pixels should be identified; and classifying all the pixel points included in the whole image based on the characteristics of each pixel point, wherein the characteristics of the pixel points mainly refer to RGB values of the pixel points, and in an actual application scene, RGB color images can be processed into gray images for conveniently classifying the characteristics, and the gray images can be classified by using the gray values. For classification of the pixel points, it may be predetermined which class the pixel points need to be classified into, for example, a large stack of beverage cans, food boxes and frames is included in the RGB image obtained by photographing, so if the purpose is to generate a mask in which the beverage cans, food boxes and frames are to be generated, the predetermined classification may be beverage cans, food boxes and frames. The three different classifications can be provided with a label, wherein the label can be a number, for example, a beverage can is 1, a food box is 2, a material frame is 3, or the label can be a color, for example, a beverage can is red, a food box is blue, and a material frame is green, so that after the classification and the processing are carried out, the beverage can is marked with 1 or red, the food box is marked with 2 or blue, and the material frame is marked with 3 or green in a finally obtained image. In this embodiment, the mask of the grippable region of the object is to be generated, so that only the grippable region is classified, for example, blue, and the blue region in the image processed in this way is the mask of the grippable region of the object to be grippable; a channel of image output is then created for each class, the channel acting to extract as output all class-dependent features in the input image. For example, after we create a channel of image output for the class of grippable region, the acquired RGB color image is input into the channel, and then the image from which the features of the grippable region are extracted can be acquired from the output of the channel. Finally, the feature image of the grippable region obtained by the processing is combined with the original RGB image to generate the composite image data with the grippable region mask identified.

Masks generated in this manner are sometimes unsuitable, e.g., some masks are of a size and shape that is inconvenient to follow. For another example, some areas may have masks generated, but the clamps may not be able to perform a grab at the mask locations. An unsuitable mask can have a significant impact on subsequent processing, and therefore requires pretreatment of the resulting mask for further steps. As shown in fig. 1, the preprocessing of the mask may include: 1. and (3) performing expansion treatment on the mask to fill in defects such as missing and irregular mask images. For example, for each pixel point on the mask, a certain number of points, e.g., 8-25 points, around the point may be set to be the same color as the point. This step corresponds to filling the periphery of each pixel, so if there is a defect in the object mask, the missing part will be filled completely, after this, the object mask will become complete, there is no defect, and the mask will become slightly "fat" due to expansion, and proper expansion will help to follow-up further image processing operation; 2. judging whether the area of the mask meets the preset condition, and if not, eliminating the mask. First, smaller mask areas are likely to be erroneous because of the continuity of the image data, one grabbed area will typically include a large number of pixels with similar characteristics, and mask areas formed by discrete small pixels may not be truly grabbed areas; secondly, the robot end actuating mechanism, namely the clamp, needs to have a certain area in the foot falling position when the grabbing task is executed, if the area of the grabbing area is too small, the clamp cannot drop the foot in the area at all, and therefore the object cannot be grabbed, and therefore too small mask is meaningless. The predetermined condition may be set according to the size of the jig and the size of the noise, and the value thereof may be a determined size, or the number of the included pixels, or a ratio, for example, the predetermined condition may be set to 0.1%, that is, when the ratio of the mask area to the whole image area is less than 0.1%, the mask is considered to be unusable, and then is removed from the image; 3. and judging whether the number of the point clouds in the mask is less than the preset minimum number of the point clouds. The number of the point clouds reflects the quality of the acquisition of the camera, and if the number of the point clouds in a certain grippable area is too small, the shooting of the area is not accurate enough. The point cloud may be used to control the gripper to perform the gripping, and too small a number may have an impact on the gripper's control process. Thus, the number of point clouds that should be included at least in a certain mask area may be set, for example: and when the number of the point clouds covered in a certain grabbing area is less than 10, eliminating the mask from the image data or randomly adding the point clouds for the grabbing area until the number reaches 10. The mask height refers to the height of a mask in a grabbing area of an object to be grabbed, and can also be Z coordinate value. The height of the mask reflects the height of the object grabbing surface, as a plurality of objects to be grabbed are stacked and placed together, the object on the upper layer is grabbed preferentially firstly, the problem that the object on the upper layer is scattered due to the fact that the object on the lower layer is pressed can be prevented, secondly, the object on the upper layer can be prevented from being knocked down, grabbing of the object on the lower layer is affected, and the object on the upper layer is obviously grabbed better than the object on the lower layer. The height of the mask can be obtained through a depth map or a point cloud of the position of the mask, in one embodiment, the point cloud including one or more objects to be grabbed can be obtained first, the point cloud is a data set of points under a preset coordinate system, and for convenience in calculating the height value, a camera can be used for shooting right above the objects to be grabbed. And then acquiring the point cloud included in the mask area based on the mask area. And calculating pose key points of the grippable region represented by the mask and depth values of the pose key points, wherein the three-dimensional pose information of the object is used for describing the pose of the object to be gripped in the three-dimensional world. The pose key points refer to: the pose point of the three-dimensional position feature of the grippable region can be reflected. The calculation can be performed by:

Firstly, three-dimensional position coordinates of each data point of a mask area are obtained, and position information of pose key points of the grippable area corresponding to the mask is determined according to a preset operation result corresponding to the three-dimensional position coordinates of each data point. For example, assuming that 100 data points are included in the point cloud of the mask region, three-dimensional position coordinates of the 100 data points are obtained, an average value of the three-dimensional position coordinates of the 100 data points is calculated, and a data point corresponding to the average value is used as a pose key point of the grippable region corresponding to the mask region. Of course, the above-mentioned preset operation method may be, in addition to the averaging, center of gravity calculation, maximum value calculation, minimum value calculation, or the like, which is not limited to the present invention. Then, the direction with the smallest variation and the direction with the largest variation among the 100 data points are found. The direction with the smallest variation is taken as a Z-axis direction (namely, a depth direction consistent with the shooting direction of a camera), the direction with the largest variation is taken as an X-axis direction, and a Y-axis direction is determined through a right-hand coordinate system, so that three-dimensional state information of position information of the pose key point is determined, and the direction characteristics of the pose key point in a three-dimensional space are reflected.

And finally, calculating pose key points of the object grippable areas corresponding to the mask areas and depth values of the pose key points. The depth value of the pose key point is a coordinate value of the object gripable region corresponding to a depth coordinate axis, wherein the depth coordinate axis is set according to a photographing direction of a camera, a gravity direction or a direction of a vertical line of a plane where the gripable region is located. Accordingly, the depth value is used to reflect the position of the grippable region at the depth coordinate axis. In specific implementation, the origin and direction of the depth coordinate axis can be flexibly set by a person skilled in the art, and the setting mode of the origin of the depth coordinate axis is not limited in the invention. For example, when the depth coordinate axis is set according to the photographing direction of the camera, the origin of the depth coordinate axis may be the position where the camera is located, and the direction of the depth coordinate axis is the direction from the camera to the object, so that the depth value of the mask in each graspable region corresponds to the opposite number of the distance from the graspable region to the camera, that is, the farther from the camera, the lower the depth value of the mask, and the depth value is taken as the mask height characteristic value.

The jig size refers to the size of a jig configured for a certain article to be grasped. Since the grippable region of the article is on the surface of the object, the article is gripped by the gripper, which essentially controls the gripper to perform the gripping operation in the grippable region, the gripper size may also be counted as a feature of the mask of the grippable region of the article. The influence of the clamp size on grabbing is mainly reflected in whether the clamp possibly bumps the article which does not correspond to the clamp by mistake. For example, if a large-sized suction cup is used, the suction cup is gripped when there are more stacked objects than if a small-sized suction cup is used, and the large-sized suction cup is more likely to collide with other objects during gripping, resulting in shaking of the suction cup or a change in the position of the objects, which may cause gripping failure. In an actual industrial scenario, what kind of jigs is used by each set of system may be predetermined, that is, the size of the jigs may be determined before the actual gripping, so that the jig size in this embodiment may be obtained based on the configured jigs and the mapping relationship between the jigs and the sizes thereof, which are established and stored in advance.

For step S430, the data invoked in step S420 are combined to generate a visual layer viewable by the user, taking the mask that the user has selected "display per pose height", "display per suction cup size", and the capture assistance data also includes a graspable region as an example: when the user selects 'displaying according to the pose height', calling masks of all objects to be grabbed in the original image, and mask height characteristic values of all objects to be grabbed, and generating a layer for placing the mask height characteristic values beside the corresponding masks; when the user selects 'display according to the size of the sucker', calling masks of all objects to be grabbed in the original image and the characteristic value of the size of the sucker of each object to be grabbed, and generating a layer for placing the characteristic value of the size of the sucker beside the corresponding mask;

for step S440, the capture auxiliary layer generated in step S430 is synthesized with the original captured image data and visually presented to the user. The layer generated in step S430 may be processed, and attributes such as color, transparency, and contrast of the layer may be adjusted, and then all pixels in the auxiliary image layer and all pixels in the original image data are sequentially combined together in order from left to right and from top to bottom, so as to generate synthesized image data. As shown in fig. 3b, the synthesized image shows the image of each item to be grabbed, and the mask covering the grabbed area over the item to be grabbed, and the user selected "pose height" value or "chuck size" value shown next to the mask.

In addition, it should be noted that although each embodiment of the present invention has a specific combination of features, further combinations and cross combinations of these features between embodiments are also possible.

Fig. 4 shows an image data processing apparatus according to still another embodiment of the present invention, the apparatus including:

an image data obtaining module 800, configured to obtain image data including one or more objects to be grabbed, i.e. to implement step S400;

the interactive interface display module 810 is configured to output the image data and an operable control to form an interactive interface, where the control is operable by a user to select to capture an auxiliary image and display the selected capture auxiliary image to the user, that is, to implement step S410;

an auxiliary data obtaining module 820, configured to obtain, in response to the operation of the control by the user, capturing auxiliary data corresponding to the capturing auxiliary image selected by the user, that is, to implement step S420;

an auxiliary layer generating module 830, configured to generate a grabbing auxiliary layer based on the acquired grabbing auxiliary data, that is, to implement step S430;

an auxiliary image generation module 840 for combining the capture auxiliary image layer with image data comprising one or more items to be captured to generate a user selected capture auxiliary image, i.e. for implementing step S440.

It should be understood that in the embodiment of the apparatus shown in fig. 4, only the main functions of the modules are described, and all the functions of each module correspond to the corresponding steps in the method embodiment, and the working principle of each module may refer to the description of the corresponding steps in the method embodiment. For example, the auxiliary image generation module 840 is used to implement the method of step S440 in the above-described embodiment, indicating that the content for describing and explaining step S440 is also the content for describing and explaining the function of the auxiliary image generation module 840. In addition, although the correspondence between functions of the functional modules and the method is defined in the above embodiments, those skilled in the art will understand that the functions of the functional modules are not limited to the correspondence, that is, a specific functional module may also implement other method steps or a part of the method steps. For example, the above embodiment describes the method for implementing step S440 by the auxiliary image generation module 840, however, the auxiliary image generation module 840 may be used to implement the method or a part of the method of steps S400, S410, S420 or S430 as the actual situation requires.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any of the above embodiments. It should be noted that, the computer program stored in the computer readable storage medium according to the embodiment of the present application may be executed by the processor of the electronic device, and in addition, the computer readable storage medium may be a storage medium built in the electronic device or may be a storage medium capable of being plugged into the electronic device in a pluggable manner, so that the computer readable storage medium according to the embodiment of the present application has higher flexibility and reliability.

Fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention, which may be a control system/electronic system configured in an automobile, a mobile terminal (e.g., a smart mobile phone, etc.), a personal computer (PC, e.g., a desktop computer or a notebook computer, etc.), a tablet computer, a server, etc., and the specific embodiment of the present invention is not limited to the specific implementation of the electronic device.

As shown in fig. 5, the electronic device may include: a processor 1202, a communication interface (Communications Interface) 1204, a memory 1206, and a communication bus 1208.

Wherein:

the processor 1202, the communication interface 1204, and the memory 1206 communicate with each other via a communication bus 1208.

A communication interface 1204 for communicating with network elements of other devices, such as clients or other servers, etc.

The processor 1202 is configured to execute the program 1210, and may specifically perform relevant steps in the method embodiments described above.

In particular, program 1210 may include program code including computer operating instructions.

The processor 1202 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the electronic device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

Memory 1206 for storing program 1210. The memory 1206 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 1210 may be downloaded and installed from a network and/or from a removable medium via communications interface 1204. The program, when executed by the processor 1202, may cause the processor 1202 to perform the operations of the method embodiments described above.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, system that includes a processing module, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It is to be understood that portions of embodiments of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

Furthermore, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

Although the embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the embodiments described above by those of ordinary skill in the art within the scope of the application.

Claims

1. A method of generating a capture assist image, comprising:

acquiring image data comprising one or more items to be grabbed;

combining the capture assistance layer with image data comprising one or more items to be captured to generate a user selected capture assistance image;

the capture assistance data includes: a value associated with the user selected capture assistance image and a mask of the grippable region of the item to be captured.

2. The method of generating a capture-assist image of claim 1 wherein the image data is within the same interactive interface as the operable controls.

3. The method of generating a capture-assist image of claim 1 wherein the image data is within a different interactive interface than the operable controls.

4. A method of generating a capture assistance image as claimed in claim 3 in which said different interactive interfaces are switched in response to user operation.

5. The method of generating a capture-assist image of any of claims 1-4 wherein said combining the capture-assist layer with image data comprising one or more items to be captured comprises: after adjusting the color, transparency and/or contrast of the grabbing auxiliary layer, the adjusted grabbing auxiliary layer is combined with image data comprising one or more objects to be grabbed.

6. An apparatus for generating a capture assist image, comprising:

an auxiliary image generation module for combining the capture auxiliary image layer with image data comprising one or more items to be captured to generate a user-selected capture auxiliary image;

7. The apparatus for generating a capture-assisted image of claim 6 wherein said image data is within the same interactive interface as the operable controls.

8. The apparatus for generating a capture-assisted image of claim 6 wherein the image data is within a different interactive interface than the operable controls.

9. The apparatus for generating a capture-assisted image of claim 8 wherein the different interactive interfaces are switched in response to user operation.

10. The apparatus for generating a capture auxiliary image according to any one of claims 6-9, wherein the auxiliary image generation module is further configured to: after adjusting the color, transparency and/or contrast of the grabbing auxiliary layer, the adjusted grabbing auxiliary layer is combined with image data comprising one or more objects to be grabbed.

11. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of generating a capture-assist image according to any one of claims 1 to 5 when the computer program is executed.

12. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of generating a grab-assist image according to any of claims 1 to 5.