WO2023145599A1

WO2023145599A1 - Image processing device, image processing method, image processing program, and robot control system

Info

Publication number: WO2023145599A1
Application number: PCT/JP2023/001517
Authority: WO
Inventors: 天毅手嶋
Original assignee: 株式会社Preferred Networks
Priority date: 2022-01-25
Filing date: 2023-01-19
Publication date: 2023-08-03

Abstract

The present invention comprises increasing the recognition accuracy in object recognition processing. This image processing device comprises one or a plurality of memories and one or a plurality of processors. The one or a plurality of processors execute, regarding a predetermined space including at least one object: acquisition of a multiple-tone image and three-dimensional information of the predetermined space; masking of a part of the multiple-tone image on the basis of the three-dimensional information; and predetermined processing using the masked multiple-tone image.

Description

Image processing device, image processing method, image processing program, and robot control system

The present disclosure relates to an image processing device, an image processing method, an image processing program, and a robot control system.

Development of a robot control system that automates the work of unloading pallets (depalletizing) is underway. In the robot control system, for example, an RGB image captured by an RGB camera is used to perform object recognition processing for identifying parcels to be depalletized.

JP 2020-075340 A Patent No. 6211734

The present disclosure improves recognition accuracy in object recognition processing.

An image processing apparatus according to one aspect of the present disclosure has, for example, the following configuration. Namely
one or more memories;
one or more processors;
The one or more processors are
Acquiring a gradation image of the predetermined space and three-dimensional information of the predetermined space for the predetermined space containing one or more objects;
masking a portion of the grayscale image based on the three-dimensional information;
performing a predetermined process using the masked gradation image;
to run.

FIG. 1 is a diagram showing an example of a usage scene of a robot control system. FIG. 2 is a diagram showing an example of the system configuration of each phase of the robot control system. FIG. 3 is a diagram illustrating an example of a hardware configuration of an image processing apparatus; FIG. 4 is a first diagram showing an example of the functional configuration of the image processing device in the training phase. FIG. 5 is a first diagram showing an example of the functional configuration of the training section. FIG. 6 is a first diagram showing an example of the functional configuration of the image processing apparatus in the depalletizing phase. FIG. 7 is a diagram showing a specific example of object recognition processing. FIG. 8 is a first flow chart showing the flow of processing of the entire robot control system. FIG. 9 is a flowchart showing the flow of training processing. FIG. 10 is a first flowchart showing the flow of depalletizing processing. FIG. 11 is a diagram showing a specific example of the depalletizing process. FIG. 12 is a second diagram showing an example of the functional configuration of the image processing device in the training phase. FIG. 13 is a second diagram illustrating an example of the functional configuration of the training unit; FIG. 14 is a second diagram showing an example of the functional configuration of the image processing device in the depalletizing phase. FIG. 15 is a second flowchart showing the flow of training processing. FIG. 16 is a second flowchart showing the flow of depalletizing processing. FIG. 17 is a diagram illustrating an example of the functional configuration of the image processing apparatus in the imaging condition adjustment phase. FIG. 18 is a second flowchart showing the flow of processing of the entire robot control system. FIG. 19 is a flowchart showing the flow of imaging condition adjustment processing.

Each embodiment will be described below with reference to the attached drawings. In the present specification and drawings, constituent elements having substantially the same functional configuration are denoted by the same reference numerals, thereby omitting redundant description.

[First embodiment]
<Usage scene of the robot control system>
First, a usage scene of the robot control system according to the first embodiment will be described. FIG. 1 is a diagram showing an example of a usage scene of a robot control system. The example of FIG. 1 shows a scene in which the robot control system 100 is used for automating depalletizing.

Note that the example of FIG. 1 shows a state in which one or more cardboard boxes having a rectangular parallelepiped shape are stacked on a pallet 140 as a package, which is an example of an object. The example of FIG. 1 also shows how the robot 110 picks up cardboard boxes one by one from the top and lowers them onto a conveying unit 160 such as a conveyor (depalletizing).

However, the usage scene shown in FIG. 1 is an example, and the work automated by the robot control system 100 is not limited to depalletizing, and may be other work. Further, the type of packages to be depalletized by the robot control system 100 is not limited to cardboard, and other types of packages may be used. Also, the shape of the luggage is not limited to a rectangular parallelepiped, and any other shape is permitted.

Also, the order of picking up the cardboard does not have to be from the top, and for example, cardboard at a predetermined height may be picked up preferentially. Alternatively, a specific area on the pallet 140 may be prioritized and the cardboard may be picked up in order from the top in the prioritized area.

Furthermore, the number of cardboards to be picked up in one operation is not limited to one, and multiple cardboards may be picked up in one operation. These work rules are defined in advance as work rule information, and the robot control system 100 performs work according to the work rule information.

In the present embodiment, it is assumed that a work rule "pick up cardboard boxes one by one in order from the top" is defined as an example of work rule information.

Return to the description of Figure 1. As shown in FIG. 1, the robot control system 100 has a robot 110, an RGB camera 121, a depth camera 122, and an image processing device . FIG. 1A shows how the robot control system 100 is viewed in the plus direction of the y-axis (the direction from the front to the back of the page) when the directions indicated by reference numeral 171 are the x-axis direction and the z-axis direction, respectively. is shown. On the other hand, FIG. 1B shows the robot control system 100 viewed in the negative direction of the x-axis (the direction from the front to the back of the paper) when the directions indicated by reference numeral 172 are the y-axis direction and the z-axis direction, respectively. It shows how it looks.

The RGB camera 121 shown in FIGS. 1A and 1B is an example of an image generation device that generates an image. The RGB camera 121 is, for example, fixedly attached to the ceiling of the space where the robot 110 is arranged, and notifies the image processing device 130 of the RGB image by photographing the cardboard 150 stacked on the pallet 140 from directly above. do. Note that in the present disclosure, including the present embodiment, an example of handling an RGB image having color information for each pixel will be described, but a grayscale image having brightness information for each pixel may be handled instead of the RGB image. good. That is, in the present disclosure, as an example of a gradation image having color or brightness gradation information (gradation value) for each pixel, a color image having color information, specifically, as color information,
・R (Red),
・G (Green),
・B (Blue),
An RGB image represented by three primary color values will be described. Other examples of color images include Lab images in which color information is represented by values in the Lab color space, and HSL images in which color information is represented by values of hue, saturation, and luminance.

Also, the depth camera 122 shown in FIGS. 1(a) and 1(b) is an example of a three-dimensional information generating device that generates three-dimensional information. The depth camera 122 is, for example, fixedly attached to the ceiling of the space where the robot 110 is arranged, and captures the cardboard 150 stacked on the pallet 140 from directly above, thereby providing three-dimensional information to the image processing device 130. Notice.

Note that the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 may have the same resolution or may have different resolutions. However, in processing the RGB image and the three-dimensional information in the image processing device 130, each pixel (two-dimensional coordinates (X coordinate, Y coordinate)) of the RGB image contains color information (R value, G value, B value). , three-dimensional information (x-coordinate, y-coordinate, z-coordinate) is associated. In the following description, unless otherwise specified, pixels refer to pixels in an RGB image.

Also, the image processing device 130 shown in FIGS. 1(a) and 1(b) outputs a control command for controlling the robot 110 based on the notified RGB image and three-dimensional information.

Specifically, the image processing device 130 identifies the area of the top surface of the cardboard with the highest height based on the three-dimensional information in the RGB image.

In addition, the image processing device 130 performs mask processing on color information associated with pixels other than the pixels included in the specified region in the RGB image. Note that hereinafter, a masked RGB image generated by performing mask processing on color information associated with pixels other than the pixels included in the specified region in the RGB image is referred to as a “mask image”. .

The image processing device 130 also uses the mask image to perform object recognition processing. Furthermore, the image processing device 130 generates a control command for the robot 110 to depalletize the corrugated cardboard identified as a depalletizing target based on the result of the object recognition processing, and transmits the control command to the robot 110 .

FIG. 1(a) shows how the robot 110 rotates in the direction of the thick arrow 180 after picking up one of the tallest cardboard boxes based on the control command sent from the image processing device 130. FIG. .

Also, FIG. 1(b) shows how the robot 110 completes rotation in the direction of the thick arrow 180 and lowers the picked up cardboard onto the transport unit 160 .

In this way, the robot control system 100 according to the first embodiment performs mask processing based on three-dimensional information in the RGB image, and performs object recognition processing using the mask image. As a result, for example, erroneous recognition can be suppressed and recognition accuracy can be improved, compared to the case where object recognition processing is performed using an RGB image that has not undergone mask processing.

<System configuration of robot control system>
Next, the system configuration of the robot control system 100 will be described. FIG. 2 is a diagram illustrating an example of a system configuration of a robot control system;

As described above, the image processing device 130 performs object recognition processing using mask images. Here, the object recognition unit used when the image processing device 130 performs object recognition processing (instance segmentation) is configured by, for example, a DNN (Deep Neural Network). The object recognition unit is then trained using the mask image as training data.

That is, the system configuration of the robot control system 100 is, as shown in FIG. can be divided into

As shown in FIG. 2(a), in the training phase, the robot control system 100, for example,
- RGB camera 121;
- a depth camera 122;
A CG (Computer Graphic) simulator 210;
Acquisition data storage unit 220;
- an image processing device 230;
Consists of

In the training phase, the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 may be stored in the acquired data storage unit 220 . In the training phase, the CG simulator 210 reproduces the environment in which the robot 110 is placed, thereby generating a virtual RGB image and virtual three-dimensional information, which may be stored in the acquired data storage unit 220. .

In the training phase, the RGB image and the three-dimensional image stored in the acquired data storage unit 220 are read by the image processing device 230, mask processing is performed by the image processing device 230, and training data is obtained. generated. Further, in the training phase, the image processing device 230 trains an object recognition unit (details will be described later) using the generated training data to generate a trained object recognition unit.

On the other hand, as shown in FIG. 2(b), in the case of the depalletizing phase, the robot control system 100
- RGB camera 121;
- a depth camera 122;
an image processing device 130 (including a trained object recognizer);
a robot 110;
Consists of

In the case of the depalletizing phase, the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 are respectively notified to the image processing device 130, and mask processing is performed by the image processing device 130. , object recognition processing is performed. Furthermore, in the case of the depalletizing phase, a control command is generated based on the result of the object recognition processing, and the robot 110 is controlled. As a result, the robot 110 depalletizes the cardboard to be depalletized.

It should be noted that in the depalletizing phase, the timing of photographing by the RGB camera 121 and the depth camera 122 is arbitrary, as long as the photographing is performed before the cardboard boxes to be depalletized are picked up.

In the depalletizing phase, the RGB camera 121 and the depth camera 122 can take pictures at any frequency. You can shoot once.

Also, FIG. 2 shows the case where different image processing devices are used in the training phase and the depalletizing phase. However, the image processing device 230 used in the training phase and the image processing device 130 used in the depalletizing phase may be the same image processing device.

<Hardware Configuration of Image Processing Apparatus>
Next, hardware configurations of the

image processing apparatuses

130 and 230 will be described. Since the image processing device 130 and the image processing device 230 have the same hardware configuration, the hardware configuration of the image processing device 130 will be described here as a representative.

FIG. 3 is a diagram showing an example of the hardware configuration of the image processing device. As shown in FIG. 3, the image processing apparatus 130 has a processor 301, a main storage device (memory) 302, an auxiliary storage device 303, a network interface 304, and a device interface 305 as components. Image processing apparatus 130 is implemented as a computer in which these components are connected via bus 306 .

In the example of FIG. 3, the image processing device 130 is shown as including each component, but the image processing device 130 may include a plurality of the same components. Further, although one image processing apparatus 130 is shown in the example of FIG. 3, the image processing program is installed in a plurality of image processing apparatuses, and each of the plurality of image processing apparatuses executes the image processing program. It may be configured to perform some of the same or different processes. In this case, a form of distributed computing may be adopted in which each image processing apparatus communicates via the network interface 304 or the like to execute the entire process. In other words, the image processing apparatus 130 may be configured as a system in which functions are realized by one or more computers executing instructions stored in one or more storage devices. Also, various data transmitted from the RGB camera 121 and the depth camera 122 are processed by one or more image processing devices provided on the cloud, and the processing results are transmitted to the customer's image processing device. good too.

Various operations of the image processing device 130 may be executed in parallel using one or more processors or using multiple image processing devices via the communication network 310 . Also, various operations may be distributed to a plurality of operation cores in the processor 301 and executed in parallel. In addition, part or all of the processing, means, etc. of the present disclosure is executed by an external device 320 (at least one of a processor and a storage device) provided on a cloud that can communicate with the image processing device 130 via the communication network 310. may be As such, the image processor 130 may take the form of parallel computing by one or more computers.

The processor 301 may be an electronic circuit (processing circuit, processing circuitry, CPU, GPU, FPGA, ASIC, etc.) that performs at least computer control or computation. Processor 301 may also be a general-purpose processor, a dedicated processing circuit designed to perform a particular operation, or a semiconductor device containing both a general-purpose processor and dedicated processing circuitry. Also, the processor 301 may include an optical circuit, or may include an arithmetic function based on quantum computing.

The processor 301 may perform various calculations based on various data and instructions input from each device of the internal configuration of the image processing device 130, and may output calculation results and control signals to each device. The processor 301 may control each component included in the image processing apparatus 130 by executing an OS (Operating System), an application, or the like.

Processor 301 may also refer to one or more electronic circuits located on one chip, or one or more electronic circuits located on two or more chips or two or more devices. may When multiple electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.

The main storage device 302 is a storage device that stores commands executed by the processor 301 and various data. Auxiliary storage device 303 is a storage device other than main storage device 302 . Note that these storage devices mean arbitrary electronic components capable of storing various data, and may be semiconductor memories. The semiconductor memory may be either volatile memory or non-volatile memory. A storage device for storing various data in the image processing apparatus 130 may be implemented by the main storage device 302 or the auxiliary storage device 303 , or may be implemented by an internal memory built into the processor 301 .

Also, a plurality of processors 301 may be connected (coupled) to one main storage device 302, or a single processor 301 may be connected. Alternatively, a plurality of main storage devices 302 may be connected (coupled) to one processor 301 . When the image processing apparatus 130 is composed of at least one main storage device 302 and a plurality of processors 301 connected (coupled) to this at least one main storage device 302, at least one of the plurality of processors 301 The processor may include a configuration that is connected (coupled) to at least one main memory device 302 . Also, this configuration may be realized by the main storage device 302 and the processor 301 included in the plurality of image processing apparatuses 130 . Furthermore, a configuration in which the main memory device 302 is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.

The network interface 304 is an interface for connecting to the communication network 310 wirelessly or by wire. Any suitable interface, such as one conforming to existing communication standards, may be used for network interface 304 . The network interface 304 may exchange various data with the robot 110 and other external devices 320 connected via the communication network 310 . In addition, the communication network 310 may be one or a combination of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), etc., and the computer, robot 110, and other external devices 320 may be used as long as information is exchanged. Examples of WANs include the Internet, examples of LANs include IEEE 802.11 and Ethernet (registered trademark), and examples of PANs include Bluetooth (registered trademark) and NFC (Near Field Communication).

The device interface 305 is an interface such as USB that directly connects to the external device 330 .

The external device 330 is a device connected to the computer. The external device 330 may be an input device, as an example. The input device is, for example, a device such as a camera (including the RGB camera 121 and depth camera 122 of this embodiment), microphone, motion capture, various sensors, keyboard, mouse, touch panel, etc., and provides acquired information to the computer. Alternatively, it may be a device such as a personal computer, a tablet terminal, a smartphone, or the like, which includes an input unit, a memory, and a processor.

Also, the external device 330 may be an output device, for example. The output device may be, for example, a display device such as an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) panel, or a speaker or the like for outputting sound. Alternatively, a device such as a personal computer, a tablet terminal, a smartphone, or the like, which includes an output unit, a memory, and a processor may be used.

Also, the external device 330 may be a storage device (memory). For example, the external device 330 may be a network storage or the like, and the external device 330 may be a storage such as an HDD.

Also, the external device 330 may be a device having the functions of some of the components of the image processing device 130 . That is, the computer may transmit or receive part or all of the processing results of the external device 330 .

<Details of the functional configuration of the image processing device in the training phase>
Next, the details of the functional configuration of the image processing device 230 in the training phase will be described. FIG. 4 is a first diagram showing an example of the functional configuration of the image processing device in the training phase. As described above, the image processing program is installed in the image processing device 230, and in the training phase, by executing the program, as shown in FIG.
- Three-dimensional information acquisition unit 410,
- Mask region identification unit 420,
- RGB image acquisition unit 430,
a mask unit 440,
- Training data generation unit 450,
a training unit 460,
function as

The three-dimensional information acquisition unit 410 acquires the three-dimensional information read from the acquired data storage unit 220 and notifies the mask area identification unit 420 of it.

The mask area specifying unit 420 accepts "processing range information" and "work rule information" in advance. The processing range information is a range specified in the RGB image, and indicates, for example, a range on the pallet 140 and corresponding to a predetermined space in which cardboard boxes can be placed. Also, the work rule information received by the mask region specifying unit 420 is information used when specifying pixels to be included in the mask region. In the case of this embodiment, the mask area specifying unit 420 receives "pick up cardboard boxes one by one in order from the top" as the work rule information.

Also, the mask region specifying unit 420 specifies a range in the RGB image based on the processing range information. In addition, the mask area specifying unit 420 obtains three-dimensional information corresponding to the work rule information (three-dimensional information of the top surface of the highest corrugated cardboard in this embodiment) among the three-dimensional information notified from the three-dimensional information acquisition unit 410. to extract In addition, the mask region specifying unit 420 selects pixels (predetermined condition are identified as pixels included in an area that satisfies

In another example of the process of identifying the non-masked area, the masked area identifying unit 420 selects the conditions determined based on the work rule information among the pixels included in the RGB image within the range designated based on the processing range information. A pixel associated with satisfying three-dimensional information is specified as a pixel to be included in the “non-mask region”. For example, in the present embodiment, the mask region specifying unit 420 determines that the height (z coordinate) of the pixels included in the RGB image in the range specified based on the processing range information is the top X percentile (for example, X=99). ) Identify pixels associated with the above three-dimensional information.

That is, the masked area identifying unit 420 identifies the non-masked area in the RGB image based on the three-dimensional information corresponding to the processing range information and the work rule information.

Further, the masked area identifying unit 420 identifies areas other than the non-masked areas as "masked areas" and notifies the masking unit 440 of them.

The RGB image acquisition unit 430 acquires the RGB image read from the acquired data storage unit 220 and notifies the mask area identification unit 420 and the mask unit 440 of it.

The masking unit 440 performs mask processing on the color information associated with the pixels included in the “masked area” notified by the masked area identifying unit 420 in the RGB image notified by the RGB image acquiring unit 430 . , to generate a "mask image".

The mask processing performed by the mask unit 440 on the color information associated with the pixels included in the mask area includes:
The color information (R value, G value, B value) associated with pixels other than the pixels included in the non-masked area (pixels included in the masked area) in the RGB image is converted into a single predetermined color information. (For example, R value, G value, B value indicating black or gray, etc.), or
-Processing an image (masked area image) of an RGB image other than the non-masked area into an image (for example, a gradation image) containing a plurality of predetermined color information, or
- Filtering processing such as blurring is performed on an image (image of the masked region) other than the non-masked region of the RGB image, and the image of the masked region is processed into an image that is more blurred than the image of the non-masked region;
etc. are included.

Furthermore, the mask unit 440 notifies the training data generation unit 450 of the mask image.

The training data generation unit 450 stores the mask image notified from the mask unit 440 in the training data storage unit 470 as training data in association with the correct data. Note that the correct data here refers to a pixel-by-pixel recognition result that is appropriately recognized when object recognition processing is performed using a mask image, for example. Note that the pixel-by-pixel recognition result may be, for example, annotation data in which a different label is assigned to each pixel of an object (instance) to be recognized.

The training unit 460 uses the training data stored in the training data storage unit 470 to train the object recognition unit and generate a trained object recognition unit. Further, the training unit 460 sets the generated trained object recognition unit to the image processing device 130 that operates in the depalletizing phase.

<Functional structure of the training department>
Next, the details of the functional configuration of the training unit 460 will be described. FIG. 5 is a first diagram showing an example of the functional configuration of the training section.

As shown in FIG. 5, the training unit 460 has an object recognition unit 510 and a comparison/modification unit 520, and uses the training data 500 to train the object recognition unit 510.

The training data 500 includes "mask image" and "correct data" as information items. A mask image is stored in "mask image". Further, the "correct data" stores the pixel-by-pixel recognition result that should be appropriately recognized based on the corresponding "mask image".

The object recognition unit 510 is configured by a DNN, and outputs output data by inputting the "mask image" of the training data 500. This output data may be, for example, segmentation data in which the probability of each label, which is different for each object, is assigned to each pixel.

The comparison/change unit 520 compares the output data output from the object recognition unit 510 with the "correct data" of the training data 500, and updates the model parameters of the object recognition unit 510 based on the comparison result. As a result, the training unit 460 generates a trained object recognition unit 510 .

<Details of the functional configuration of the image processing device in the depalletizing phase>
Next, the details of the functional configuration of the image processing device 130 in the depalletizing phase will be described. FIG. 6 is a first diagram showing an example of the functional configuration of the image processing apparatus in the depalletizing phase. As described above, the image processing program is installed in the image processing device 130, and in the depalletizing phase, by executing the program, as shown in FIG.
- Three-dimensional information acquisition unit 410,
- Mask region identification unit 420,
- RGB image acquisition unit 430,
a mask unit 440,
a trained object recognizer 650;
- Robot control unit 660,
function as

Of these, the three-dimensional information acquisition unit 410, the mask area identification unit 420, the RGB image acquisition unit 430, and the mask unit 440 have already been explained using FIG. 4, so the explanation is omitted here.

A trained object recognition unit 650 is a trained object recognition unit that has been trained by the training unit 460 using the training data 500 . The trained object recognition unit 650 receives the mask image notified from the mask unit 440, performs object recognition processing, and outputs a recognition result.

The robot control unit 660 generates a control command for controlling the motion of the robot 110 based on the recognition result output by the trained object recognition unit 650 and transmits it to the robot 110 . In generating the control command based on the recognition result, the robot control unit 660 refers to the work rule information and generates the control command according to the work rule information.

<Specific example of object recognition processing>
Next, a specific example of object recognition processing by the image processing device 130 in the depalletizing phase will be described. FIG. 7 is a diagram showing a specific example of object recognition processing.

In FIG. 7, reference numeral 710 indicates the RGB image acquired by the RGB image acquisition section 430. Further, in FIG. 7, reference numeral 711 indicates processing range information received in advance by the mask region specifying unit 420 on an RGB image indicated by reference numeral 710 . As indicated by reference numeral 711, the processing range information indicates the area on the pallet 140 where the cardboard is located.

Also, in FIG. 7, reference numeral 720 denotes a mask image generated by performing mask processing on the RGB image indicated by reference numeral 710 . In the mask image indicated by reference numeral 720, reference numeral 721 denotes the non-masked area identified by the masked area identification section 420, and reference numeral 722 indicates the masked area identified by the masked area identification section 420. FIG.

It should be noted that the non-masked area indicated by reference numeral 721 is the top surface area of the cardboard with the highest height. In the example of reference numeral 721, since the heights of the two cardboards are approximately the same, the upper surface area of the two cardboards is specified as the non-masked area.

Further, in the mask area indicated by reference numeral 722, among the pixels of the RGB image indicated by reference numeral 710, the color information associated with the pixels other than the pixels included in the non-mask area (pixels included in the mask area) is black. , indicating that it has been converted into color information indicating

In FIG. 7, reference numeral 730 denotes object recognition processing using the mask image indicated by reference numeral 720, whereby the cardboard indicated by reference numeral 731 and the cardboard indicated by reference numeral 732 are each different objects (another instance). It shows how it is recognized as

<Processing flow of the entire robot control system>
Next, the flow of processing of the entire robot control system 100 will be described. FIG. 8 is a first flow chart showing the flow of processing of the entire robot control system.

In step S801, the robot control system 100 shifts to the training phase and executes training processing. Details of the flowchart of the training process will be described later.

At step S802, the robot control system 100 shifts to the depalletizing phase and executes the depalletizing process. Details of the flowchart of the depalletizing process will be described later.

<Flow of training process>
Next, the details of the flowchart of the training process (step S801 in FIG. 8) will be described. FIG. 9 is a first flowchart showing the flow of training processing.

In step S<b>901 , the image processing device 230 acquires the RGB image captured by the RGB camera 121 or the virtual RGB image generated by the CG simulator 210 from the acquired data storage unit 220 . The image processing device 230 also acquires 3D information captured by the depth camera 122 or virtual 3D information generated by the CG simulator 210 from the acquired data storage unit 220 .

In step S902, the image processing device 230 identifies a non-masked area based on three-dimensional information in the RGB image based on the processing range information and work rule information.

In step S903, the image processing device 230 performs mask processing on the color information associated with the pixels other than the pixels included in the non-mask area in the RGB image to generate a mask image.

In step S904, the image processing device 230 acquires correct data.

In step S905, the image processing device 230 generates training data by associating the mask image with the correct data.

In step S906, the image processing device 230 uses the training data to train the object recognition unit.

In step S907, the image processing device 230 determines whether or not the training end condition is satisfied. If it is determined in step S907 that the training end condition is not met (NO in step S907), the process returns to step S901.

On the other hand, if it is determined in step S907 that the training end condition is satisfied (if YES in step S907), the process proceeds to step S908.

In step S908, the image processing device 230 sets the trained object recognition unit and ends the training process.

<Flow of depalletizing process>
Next, details of the flowchart of the depalletizing process (step S802 in FIG. 8) will be described. FIG. 10 is a first flowchart showing the flow of depalletizing processing.

In step S<b>1001 , the image processing device 130 acquires the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 .

In step S1002, the image processing device 130 identifies a non-masked area in the RGB image within the range specified by the processing range information, based on the three-dimensional information corresponding to the work rule information.

In step S1003, the image processing apparatus 130 performs mask processing on the color information associated with pixels other than the pixels included in the non-mask area in the RGB image to generate a mask image.

In step S1004, the image processing device 130 performs object recognition processing by inputting the generated mask image to the trained object recognition unit.

In step S1005, the image processing device 130 identifies cardboard boxes to be depalletized based on the results of the object recognition processing. For example, when a plurality of cardboard boxes are recognized as a result of the object recognition process, the image processing device 130 identifies the plurality of cardboard boxes as objects to be depalletized.

In step S<b>1006 , the image processing device 130 generates a control command for depalletizing the identified cardboard to be depalletized in the mask image according to the three-dimensional information and work rule information about the identified cardboard, and outputs it to the robot 110 . . For example, when a plurality of cardboard boxes are specified as objects to be depalletized, the image processing device 130 determines the depalletizing order according to the work rule information, and generates a control command according to the three-dimensional information of the cardboard boxes to be depalletized.

In step S1007, the image processing apparatus 130 determines whether or not there is another cardboard box to be depalletized. Specifically, the image processing apparatus 130 determines whether or not the next cardboard to be depalletized is on the pallet 140 after all the cardboard to be depalletized identified in step S1005 has been depalletized.

If it is determined in step S1007 that there is another cardboard to be depalletized (if YES in step S1007), the process proceeds to step S1008.

In step S1008, the image processing device 130 determines whether or not it is time to perform object recognition processing. If it is determined in step S1008 that it is not the time to perform the object recognition processing (NO in step S1008), the process waits until the timing to perform the object recognition processing.

On the other hand, if it is determined in step S1008 that it is time to perform object recognition processing (if YES in step S1008), the process returns to step S1001.

On the other hand, if it is determined in step S1007 that there is no cardboard to be depalletized (NO in step S1007), the depalletizing process ends.

<Specific example of depalletizing process>
Next, a specific example of depalletizing processing by the robot control system 100 will be described. FIG. 11 is a diagram showing a specific example of the depalletizing process. As shown in FIG. 11A, when four cardboards are stacked on the pallet 140, the robot control system 100 recognizes the cardboard 1101 as the tallest cardboard. FIG. 11B shows how the recognized cardboard 1101 is picked up.

Subsequently, as shown in FIG. 11C, three cardboard boxes are stacked on the pallet 140 after picking up the cardboard box 1101, and the robot control system 100 selects the cardboard box 1102 as the cardboard with the highest height. recognize. FIG. 11(d) shows how the recognized cardboard 1102 is picked up.

Subsequently, as shown in FIG. 11E, two cardboard boxes are placed on the pallet 140 after the cardboard box 1102 has been picked up. to recognize FIG. 11(f) shows how the recognized cardboard 1103 is picked up.

Subsequently, as shown in FIG. 11G, one cardboard is placed on the pallet 140 after picking up the cardboard 1103, and the robot control system 100 selects the cardboard 1104 as the cardboard with the highest height. to recognize FIG. 11(h) shows how the recognized cardboard 1104 is picked up.

Thus, according to the robot control system 100, it is possible to realize depalletizing according to the work rule of "pick up the cardboard boxes one by one in order from the top".

<Summary>
As is clear from the above description, the robot control system 100 according to the first embodiment includes:
- For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes.
- Based on the acquired three-dimensional information, mask processing is performed on part of the color information of the RGB image.
- Object recognition processing is performed using the mask image after mask processing.

Thus, according to the robot control system 100 according to the first embodiment, it is possible to improve recognition accuracy in object recognition processing.

[Second embodiment]
In the first embodiment, the case where the mask unit 440 performs mask processing on color information associated with pixels included in the mask region has been described. However, the target of the masking process performed by the masking unit 440 is not limited to the color information, and the masking process may be performed on the color information and the three-dimensional information. The second embodiment will be described below, focusing on differences from the first embodiment.

<Details of the functional configuration of the image processing device in the training phase>
First, the details of the functional configuration of the image processing device 230 in the training phase will be described. FIG. 12 is a second diagram showing an example of the functional configuration of the image processing device in the training phase. As in the first embodiment, an image processing program is installed in the image processing device 230, and by executing the program, the same functions as in the first embodiment are realized in the training phase. be.

4 described in the first embodiment is that, in the case of FIG. is.

4 described in the first embodiment is that the mask unit 440 performs mask processing on color information and three-dimensional information associated with pixels included in the mask region, The point is to generate a mask image and mask three-dimensional information. Note that the mask processing performed on the three-dimensional information includes deleting the three-dimensional information (x-coordinate, y-coordinate, z-coordinate) associated with the pixels included in the mask area.

4 described in the first embodiment is that the training data generation unit 450 associates the mask image and mask three-dimensional information notified from the mask unit 440 with the correct data for training. The difference is that training data is generated and stored in the training data storage unit 470 .

<Functional structure of the training department>
Next, the details of the functional configuration of the training unit 460 according to the second embodiment will be described. FIG. 13 is a second diagram illustrating an example of the functional configuration of the training unit; The difference from FIG. 5 described in the first embodiment is that the training data 1300 includes “three-dimensional mask information” as an information item, and three-dimensional mask information is stored. 5 described in the first embodiment is that the "mask image" and "mask three-dimensional information" of the training data 1300 are input to the object recognition unit 510. .

<Details of the functional configuration of the image processing device in the depalletizing phase>
Next, the details of the functional configuration of the image processing device 130 in the depalletizing phase will be described. FIG. 14 is a second diagram showing an example of the functional configuration of the image processing device in the depalletizing phase. As in the first embodiment, an image processing program is installed in the image processing apparatus 130, and by executing the program, the same functions as in the first embodiment are realized in the depalletizing phase. be.

The difference from FIG. 6 explained in the first embodiment is that in the case of FIG. is.

6 described in the first embodiment is that the mask unit 440 performs mask processing on color information and three-dimensional information associated with pixels included in the mask region, The point is to generate a mask image and mask three-dimensional information.

6 described in the first embodiment is that the trained object recognition unit 650 performs object recognition processing based on the mask image and mask three-dimensional information notified by the mask unit 440, This is the point of outputting the recognition result.

<Flow of training process>
Next, the details of the flowchart of the training process in the second embodiment will be described. FIG. 15 is a second flowchart showing the flow of training processing. The difference from the first flowchart described using FIG. 9 is steps S1501 and S1502.

In step S1501, the image processing device 230 performs mask processing on color information and three-dimensional information associated with pixels other than pixels included in the non-mask area to generate a mask image and mask three-dimensional information.

In step S1502, the image processing device 230 generates training data by associating the mask image and mask three-dimensional information with the correct data.

<Flow of depalletizing process>
Next, the details of the flowchart of the depalletizing process in the second embodiment will be described. FIG. 16 is a second flowchart showing the flow of depalletizing processing. The difference from the first flowchart described using FIG. 10 is steps S1601 and S1602.

In step S1601, the image processing apparatus 130 performs mask processing on the color information and three-dimensional information associated with pixels other than the pixels included in the non-mask area, and generates a mask image and mask three-dimensional information.

In step S1602, the image processing device 130 performs object recognition processing by inputting the generated mask image and mask three-dimensional information to a trained object recognition unit.

<Summary>
As is clear from the above description, the robot control system 100 according to the second embodiment includes:
- For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes.
- Based on the acquired three-dimensional information, mask processing is performed on the color information and three-dimensional information of a part of the RGB image.
- Object recognition processing is performed using the mask image after mask processing and mask three-dimensional information.

Thus, according to the robot control system 100 according to the second embodiment, it is possible to improve the recognition accuracy in the object recognition process.

[Third embodiment]
In the above-described first embodiment, the photographing conditions (for example, white balance, exposure, focus, etc.) of the RGB camera 121 when performing the training process (step S801 in FIG. 8) are not mentioned, and are appropriately adjusted. explained as.

On the other hand, in the third embodiment, a case will be described in which a shooting condition adjustment phase is provided before the training process and the shooting conditions of the RGB camera 121 are adjusted. Note that the description will focus on the differences from the first embodiment.

<Functional Configuration of Image Processing Apparatus in Shooting Condition Adjustment Phase>
First, the details of the functional configuration of the image processing device 130 in the imaging condition adjustment phase will be described. FIG. 17 is a diagram illustrating an example of the functional configuration of the image processing apparatus in the imaging condition adjustment phase. By executing the image processing program in the imaging condition adjustment phase, the image processing device 130 performs the following operations as shown in FIG.
- Three-dimensional information acquisition unit 410,
- Mask region identification unit 420,
- RGB image acquisition unit 430,
a mask unit 440,
- Imaging condition adjustment unit 1250,
function as

Of these, the functions of the three-dimensional information acquisition unit 410, the mask area identification unit 420, the RGB image acquisition unit 430, and the mask unit 440 have already been explained, so explanations thereof will be omitted here. The mask unit 440 performs mask processing on color information associated with pixels included in the mask region of the RGB image, and notifies the shooting condition adjustment unit 1750 of the generated mask image.

The imaging condition adjustment unit 1750 adjusts the imaging conditions based on the mask image notified from the mask unit 440. The shooting conditions adjusted by the shooting condition adjusting unit 1750 include the white balance, exposure, focus, etc. of the RGB camera 121 .

Also, the shooting condition adjustment unit 1750 transmits and sets the adjusted shooting conditions to the RGB camera 121 .

As described above, the robot control system 100 according to the third embodiment adjusts the imaging conditions of the RGB camera 121 based on the mask image. Accordingly, it is possible to set shooting conditions suitable for object recognition processing. As a result, according to the robot control system 100 according to the third embodiment, it is possible to improve recognition accuracy in object recognition processing.

<Processing flow of the entire robot control system>
Next, the overall processing flow of the robot control system 100 according to the third embodiment will be described. FIG. 18 is a second flowchart showing the flow of processing of the entire robot control system. The difference from the first flowchart shown in FIG. 8 is that shooting condition adjustment processing (step S1801) is included before training processing (step S801).

In step S1801, the robot control system 100 shifts to the imaging condition adjustment phase and executes imaging condition adjustment processing. Details of the flowchart of the shooting condition adjustment process will be described with reference to FIG. 19 .

<Flow of shooting condition adjustment processing>
FIG. 19 is a flowchart showing the flow of imaging condition adjustment processing.

In step S<b>1901 , the image processing device 130 acquires the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 .

In step S1902, the image processing device 130 identifies a non-masked area based on three-dimensional information in the RGB image based on the processing range information and work rule information.

In step S1903, the image processing apparatus 130 performs mask processing on color information associated with pixels other than pixels included in the non-mask area in the RGB image to generate a mask image.

In step S1904, the image processing device 130 adjusts the shooting conditions of the RGB camera 121 based on the mask image.

In step S1905, the image processing device 130 acquires an RGB image captured by the RGB camera 121 under the shooting conditions adjusted in step S1904.

In step S1906, in the RGB image acquired in step S1905, the image processing apparatus 130 performs mask processing on color information associated with pixels other than the pixels included in the non-masked region specified in step S1902. . Thereby, the image processing device 130 generates a mask image.

In step S1907, the image processing device 130 evaluates the mask image generated in step S1906.

In step S1908, the image processing apparatus 130 determines whether or not the imaging conditions have been optimized based on the evaluation result of the mask image. ), the process returns to step S1904. At this time, if the arrangement of the cardboard boxes stacked on the pallet 140 is to be changed, the process returns to step S1901.

On the other hand, if it is determined in step S1908 that it has been optimized (if YES in step S1908), the process proceeds to step S1909.

In step S1909, the image processing device 130 transmits and sets the optimized shooting conditions to the RGB camera 121, and ends the shooting condition adjustment processing. As a result, in the training phase and the depalletizing phase, it is possible to acquire RGB images shot by the RGB camera 121 under optimized shooting conditions.

<Summary>
As is clear from the above description, the robot control system 100 according to the third embodiment
- For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes.
- Based on the acquired three-dimensional information, mask processing is performed on part of the color information of the RGB image.
・Adjust the shooting conditions of the RGB camera using the mask image after the mask processing.

As a result, according to the robot control system 100 according to the third embodiment, it is possible to set shooting conditions suitable for object recognition processing, and to improve recognition accuracy in object recognition processing.

[Fourth embodiment]
In each of the above embodiments, the RGB camera 121 and the depth camera 122 are fixedly attached to the ceiling of the space where the robot 110 is arranged. Not limited to ceilings. Also, the mounting positions of the RGB camera 121 and the depth camera 122 are not limited to right above the pallet 140 . Furthermore, attachment destinations of the RGB camera 121 and the depth camera 122 are not limited to non-movable objects, and may be movable objects (for example, the robot 110).

Also, in each of the above embodiments, the three-dimensional information acquisition unit 410 acquires three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate) captured by the depth camera 122 as three-dimensional information. However, the three-dimensional information acquisition section 410 may acquire three-dimensional information other than the three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate). Three-dimensional information other than three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate) includes, for example, point cloud information, mesh information, and the like.

Also, in the first and second embodiments, the image processing device 130 has the robot control unit 660 in the depalletizing phase. However, in the depalletizing phase, robot controller 660 may be implemented within robot 110 .

Also, in the first and second embodiments, the image processing device 130 has the trained object recognition unit 650 in the depalletizing phase. However, in the depalletizing phase, trained object recognizer 650 may be implemented in a device other than image processing device 130 .

Also, in the above-described third embodiment, the image processing apparatus 130 has the imaging condition adjustment unit 1750 in the imaging condition adjustment phase. However, in the imaging condition adjustment phase, the imaging condition adjustment unit 1750 may be implemented in a device other than the image processing device 130. FIG.

In the first and second embodiments described above, the object recognition unit is trained using masked data (mask images and mask three-dimensional information) as training data, and the trained object recognition unit In the description above, masked data (a mask image and mask three-dimensional information) is input. However, as training data, the object recognition unit is trained using data (RGB images and 3D information) that have not undergone mask processing, and the masked data (mask An image or three-dimensional mask information) may be input to perform object recognition processing.

[Other embodiments]
In the present specification (including claims), the expression "at least one (one) of a, b and c" or "at least one (one) of a, b or c" (including similar expressions) Where used, includes any of a, b, c, ab, ac, bc, or abc. It may also include multiple instances of any element, such as aa, abb, aabbbcc, and so on. It also includes the addition of elements other than the listed elements (a, b and c), such as having d as in abcd.

In addition, in this specification (including claims), when expressions such as "data as input/based on data/according to/according to" (including similar expressions) are used, there is no particular notice The case includes the case where various data itself is used as input, and the case where various data subjected to some processing (for example, noise added, normalized, intermediate representation of various data, etc.) is used as input. In addition, if it is stated that some result can be obtained "based on/according to/depending on the data", this includes cases where the result is obtained based only on the data, other data other than the data, It may also include cases where the result is obtained under the influence of causes, conditions, and/or states. In addition, if it is stated that "data will be output", unless otherwise specified, if the various data themselves are used as output, or if the various data have undergone some processing (for example, noise addition, normalization, etc.) This also includes the case where the output is a converted version, an intermediate representation of various data, etc.).

Also, in this specification (including claims), when the terms "connected" and "coupled" are used, they refer to direct connection/coupling, indirect connection including but not limited to /coupled, electrically connected/coupled, communicatively connected/coupled, operatively connected/coupled, physically connected/coupled, etc. intended as a generic term. The term should be interpreted appropriately according to the context in which the term is used, but any form of connection/bonding that is not intentionally or naturally excluded is not included in the term. should be interpreted restrictively.

Also, in this specification (including claims), when the expression "A configured to B" is used, the physical structure of element A performs operation B have possible configuration and that the permanent or temporary setting/configuration of element A is configured/set to actually perform action B may contain. For example, when element A is a general-purpose processor, the processor has a hardware configuration capable of executing operation B, and operation B is performed by setting a permanent or temporary program (instruction). It just needs to be configured to actually run. In addition, when the element A is a dedicated processor or a dedicated arithmetic circuit, etc., regardless of whether or not control instructions and data are actually attached, the circuit structure of the processor actually executes the operation B. It just needs to be implemented.

Also, in the present specification (including claims), when a term meaning containing or possessing (e.g., "comprising/including", "having", etc.) is used, the purpose of the term is It is intended as an open-ended term, including when it contains or possesses things other than the object indicated by the word. When the object of these terms of inclusion or possession is an expression that does not specify a quantity or implies a singular number (an expression with the article a or an), the expression shall be construed as not being limited to a specific number. It should be.

In addition, in this specification (including claims), expressions such as "one or more" or "at least one" are used in some places, and in other places Even if an expression that does not specify or suggests a singular number (an expression with the articles a or an) is used, it is not intended that the latter expression means "one." In general, expressions that do not specify a quantity or imply a singular number (indicative of the articles a or an) should be construed as not necessarily being limited to a particular number.

Further, in this specification, when it is stated that a particular configuration of an embodiment has a particular effect (advantage/result), unless there is a specific reason otherwise, another one having that configuration Alternatively, it should be understood that the same effect can be obtained for a plurality of embodiments. However, it should be understood that the presence or absence of the effect generally depends on various causes, conditions, and/or states, and that the configuration does not always provide the effect. The effect is only obtained by the configuration described in the embodiment when various causes, conditions, and/or states are satisfied, and in the claimed invention defining the configuration or a similar configuration , the effect is not necessarily obtained.

In this specification (including claims), when terms such as "optimize/optimization" are used, it means to obtain a global optimum value, to obtain an approximation of a global optimum value. Including determining, determining a local optimum, and determining an approximation of a local optimum, should be interpreted appropriately depending on the context in which the term is used. It also includes stochastically or heuristically approximating these optimum values.

Further, in this specification (including the claims), when a plurality of pieces of hardware perform predetermined processing, each piece of hardware may work together to perform the predetermined processing. You may perform all the processing of. Also, some hardware may perform a part of the predetermined processing, and another hardware may perform the rest of the predetermined processing. In the present specification (including claims), when expressions such as "one or more pieces of hardware perform a first process and the one or more pieces of hardware perform a second process" are used , the hardware that performs the first process and the hardware that performs the second process may be the same or different. In other words, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more pieces of hardware. The hardware may include electronic circuits, devices including electronic circuits, and the like.

Further, in this specification (including claims), when a plurality of storage devices (memories) store data, individual storage devices (memories) among the plurality of storage devices (memories) represent part of the data. Only the data may be stored, or the entire data may be stored.

Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the individual embodiments described above. Various additions, changes, replacements, partial deletions, etc. are possible without departing from the conceptual idea and spirit of the present invention derived from the content defined in the claims and equivalents thereof. For example, in all the embodiments described above, when numerical values or formulas are used for explanation, they are shown as an example and are not limited to these. Also, the order of each operation in the embodiment is shown as an example, and is not limited to these.

This application claims priority based on Japanese Patent Application No. 2022-009614 filed on January 25, 2022. invoke.

Claims

one or more memories;
one or more processors;
The one or more processors are
Acquiring a gradation image of the predetermined space and three-dimensional information of the predetermined space for the predetermined space containing one or more objects;
masking a portion of the grayscale image based on the three-dimensional information;
performing a predetermined process using the masked gradation image;
An image processing device that executes
masking the portion of the grayscale image;
a process of identifying the part of the gradation image based on the three-dimensional information;
a process of masking the portion of the gradation image specified by the specifying process;
The image processing device according to claim 1, comprising:
3. The image processing according to claim 2, wherein said specifying process specifies, as said part of said grayscale image, two-dimensional coordinates of said grayscale image corresponding to specific three-dimensional coordinates in said predetermined space. Device.
The image processing apparatus according to claim 2, wherein the masking process is to mask the gradation information of the partial pixels of the gradation image.
masking the portion of the grayscale image;
A process of identifying an area of an object that satisfies a predetermined condition from among the one or more objects based on the three-dimensional information;
and a process of masking grayscale information associated with pixels other than pixels included in the region of the identified object among the grayscale information associated with each pixel of the grayscale image. The image processing device according to .
The one or more processors further comprise:
performing masking a portion of the three-dimensional information based on the three-dimensional information;
6. The image processing apparatus according to claim 5, wherein said predetermined processing includes performing predetermined processing using said masked gradation image and said masked three-dimensional information.
Masking a portion of the three-dimensional information includes:
A process of masking three-dimensional information associated with pixels other than pixels included in the region of the object that satisfies the predetermined condition among the three-dimensional information associated with each pixel of the gradation image;
7. The image processing device according to claim 6, comprising:
The image processing apparatus according to claim 6, wherein the area of the object satisfying the predetermined condition is an area of the object having predetermined three-dimensional information.
9. The image processing apparatus according to claim 8, wherein the area of the object having the predetermined three-dimensional information is an area of the object having the predetermined three-dimensional information, among the one or more objects, the height of the top surface of which is the area of the object. .
The image processing apparatus according to claim 8, wherein the predetermined processing includes object recognition processing or adjustment processing for adjusting shooting conditions when shooting the gradation image.
11. The image processing apparatus according to claim 10, wherein when the predetermined processing is object recognition processing, an area of the object that satisfies the predetermined condition is specified within a range in which the one or more objects can be placed.
The image processing apparatus according to claim 10, wherein the object recognition processing is performed using a deep neural network.
The image processing apparatus according to claim 12, wherein the deep neural network is trained using a masked gradation image as training data.
The image processing apparatus according to claim 12, wherein the deep neural network is trained using a masked gradation image and masked three-dimensional information as training data.
The image processing apparatus according to claim 10, wherein the shooting conditions adjusted in the adjustment process include white balance, exposure, and focus of an image generating apparatus that shoots the gradation image.
The image processing device according to claim 1;
an image generation device that generates the gradation image;
a three-dimensional information generating device that generates the three-dimensional information;
and a robot that depalletizes the one or more objects.