WO2023145599A1 - Image processing device, image processing method, image processing program, and robot control system - Google Patents

Image processing device, image processing method, image processing program, and robot control system Download PDF

Info

Publication number
WO2023145599A1
WO2023145599A1 PCT/JP2023/001517 JP2023001517W WO2023145599A1 WO 2023145599 A1 WO2023145599 A1 WO 2023145599A1 JP 2023001517 W JP2023001517 W JP 2023001517W WO 2023145599 A1 WO2023145599 A1 WO 2023145599A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image processing
dimensional information
mask
processing device
Prior art date
Application number
PCT/JP2023/001517
Other languages
French (fr)
Japanese (ja)
Inventor
天毅 手嶋
Original Assignee
株式会社Preferred Networks
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Preferred Networks filed Critical 株式会社Preferred Networks
Publication of WO2023145599A1 publication Critical patent/WO2023145599A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to an image processing device, an image processing method, an image processing program, and a robot control system.
  • an RGB image captured by an RGB camera is used to perform object recognition processing for identifying parcels to be depalletized.
  • the present disclosure improves recognition accuracy in object recognition processing.
  • An image processing apparatus has, for example, the following configuration. Namely one or more memories; one or more processors; The one or more processors are Acquiring a gradation image of the predetermined space and three-dimensional information of the predetermined space for the predetermined space containing one or more objects; masking a portion of the grayscale image based on the three-dimensional information; performing a predetermined process using the masked gradation image; to run.
  • FIG. 1 is a diagram showing an example of a usage scene of a robot control system.
  • FIG. 2 is a diagram showing an example of the system configuration of each phase of the robot control system.
  • FIG. 3 is a diagram illustrating an example of a hardware configuration of an image processing apparatus;
  • FIG. 4 is a first diagram showing an example of the functional configuration of the image processing device in the training phase.
  • FIG. 5 is a first diagram showing an example of the functional configuration of the training section.
  • FIG. 6 is a first diagram showing an example of the functional configuration of the image processing apparatus in the depalletizing phase.
  • FIG. 7 is a diagram showing a specific example of object recognition processing.
  • FIG. 8 is a first flow chart showing the flow of processing of the entire robot control system.
  • FIG. 8 is a first flow chart showing the flow of processing of the entire robot control system.
  • FIG. 9 is a flowchart showing the flow of training processing.
  • FIG. 10 is a first flowchart showing the flow of depalletizing processing.
  • FIG. 11 is a diagram showing a specific example of the depalletizing process.
  • FIG. 12 is a second diagram showing an example of the functional configuration of the image processing device in the training phase.
  • FIG. 13 is a second diagram illustrating an example of the functional configuration of the training unit;
  • FIG. 14 is a second diagram showing an example of the functional configuration of the image processing device in the depalletizing phase.
  • FIG. 15 is a second flowchart showing the flow of training processing.
  • FIG. 16 is a second flowchart showing the flow of depalletizing processing.
  • FIG. 17 is a diagram illustrating an example of the functional configuration of the image processing apparatus in the imaging condition adjustment phase.
  • FIG. 18 is a second flowchart showing the flow of processing of the entire robot control system.
  • FIG. 19 is a flowchart showing the flow of imaging condition adjustment processing.
  • FIG. 1 is a diagram showing an example of a usage scene of a robot control system.
  • the example of FIG. 1 shows a scene in which the robot control system 100 is used for automating depalletizing.
  • FIG. 1 shows a state in which one or more cardboard boxes having a rectangular parallelepiped shape are stacked on a pallet 140 as a package, which is an example of an object.
  • the example of FIG. 1 also shows how the robot 110 picks up cardboard boxes one by one from the top and lowers them onto a conveying unit 160 such as a conveyor (depalletizing).
  • the usage scene shown in FIG. 1 is an example, and the work automated by the robot control system 100 is not limited to depalletizing, and may be other work. Further, the type of packages to be depalletized by the robot control system 100 is not limited to cardboard, and other types of packages may be used. Also, the shape of the luggage is not limited to a rectangular parallelepiped, and any other shape is permitted.
  • the order of picking up the cardboard does not have to be from the top, and for example, cardboard at a predetermined height may be picked up preferentially.
  • a specific area on the pallet 140 may be prioritized and the cardboard may be picked up in order from the top in the prioritized area.
  • the number of cardboards to be picked up in one operation is not limited to one, and multiple cardboards may be picked up in one operation.
  • These work rules are defined in advance as work rule information, and the robot control system 100 performs work according to the work rule information.
  • the robot control system 100 has a robot 110, an RGB camera 121, a depth camera 122, and an image processing device .
  • FIG. 1A shows how the robot control system 100 is viewed in the plus direction of the y-axis (the direction from the front to the back of the page) when the directions indicated by reference numeral 171 are the x-axis direction and the z-axis direction, respectively. is shown.
  • FIG. 1B shows the robot control system 100 viewed in the negative direction of the x-axis (the direction from the front to the back of the paper) when the directions indicated by reference numeral 172 are the y-axis direction and the z-axis direction, respectively. It shows how it looks.
  • the RGB camera 121 shown in FIGS. 1A and 1B is an example of an image generation device that generates an image.
  • the RGB camera 121 is, for example, fixedly attached to the ceiling of the space where the robot 110 is arranged, and notifies the image processing device 130 of the RGB image by photographing the cardboard 150 stacked on the pallet 140 from directly above. do.
  • an example of handling an RGB image having color information for each pixel will be described, but a grayscale image having brightness information for each pixel may be handled instead of the RGB image. good.
  • a color image having color or brightness gradation information (gradation value) for each pixel a color image having color information, specifically, as color information, ⁇ R (Red), ⁇ G (Green), ⁇ B (Blue), An RGB image represented by three primary color values will be described.
  • Other examples of color images include Lab images in which color information is represented by values in the Lab color space, and HSL images in which color information is represented by values of hue, saturation, and luminance.
  • the depth camera 122 shown in FIGS. 1(a) and 1(b) is an example of a three-dimensional information generating device that generates three-dimensional information.
  • the depth camera 122 is, for example, fixedly attached to the ceiling of the space where the robot 110 is arranged, and captures the cardboard 150 stacked on the pallet 140 from directly above, thereby providing three-dimensional information to the image processing device 130. Notice.
  • each pixel (two-dimensional coordinates (X coordinate, Y coordinate)) of the RGB image contains color information (R value, G value, B value).
  • three-dimensional information (x-coordinate, y-coordinate, z-coordinate) is associated.
  • pixels refer to pixels in an RGB image.
  • the image processing device 130 shown in FIGS. 1(a) and 1(b) outputs a control command for controlling the robot 110 based on the notified RGB image and three-dimensional information.
  • the image processing device 130 identifies the area of the top surface of the cardboard with the highest height based on the three-dimensional information in the RGB image.
  • the image processing device 130 performs mask processing on color information associated with pixels other than the pixels included in the specified region in the RGB image.
  • a masked RGB image generated by performing mask processing on color information associated with pixels other than the pixels included in the specified region in the RGB image is referred to as a “mask image”.
  • the image processing device 130 also uses the mask image to perform object recognition processing. Furthermore, the image processing device 130 generates a control command for the robot 110 to depalletize the corrugated cardboard identified as a depalletizing target based on the result of the object recognition processing, and transmits the control command to the robot 110 .
  • FIG. 1(a) shows how the robot 110 rotates in the direction of the thick arrow 180 after picking up one of the tallest cardboard boxes based on the control command sent from the image processing device 130.
  • FIG. 1(a) shows how the robot 110 rotates in the direction of the thick arrow 180 after picking up one of the tallest cardboard boxes based on the control command sent from the image processing device 130.
  • FIG. 1(b) shows how the robot 110 completes rotation in the direction of the thick arrow 180 and lowers the picked up cardboard onto the transport unit 160 .
  • the robot control system 100 performs mask processing based on three-dimensional information in the RGB image, and performs object recognition processing using the mask image.
  • erroneous recognition can be suppressed and recognition accuracy can be improved, compared to the case where object recognition processing is performed using an RGB image that has not undergone mask processing.
  • FIG. 2 is a diagram illustrating an example of a system configuration of a robot control system
  • the image processing device 130 performs object recognition processing using mask images.
  • the object recognition unit used when the image processing device 130 performs object recognition processing is configured by, for example, a DNN (Deep Neural Network).
  • the object recognition unit is then trained using the mask image as training data.
  • the system configuration of the robot control system 100 is, as shown in FIG. can be divided into
  • the robot control system 100 for example, - RGB camera 121; - a depth camera 122; A CG (Computer Graphic) simulator 210; Acquisition data storage unit 220; - an image processing device 230; Consists of
  • the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 may be stored in the acquired data storage unit 220 .
  • the CG simulator 210 reproduces the environment in which the robot 110 is placed, thereby generating a virtual RGB image and virtual three-dimensional information, which may be stored in the acquired data storage unit 220. .
  • the RGB image and the three-dimensional image stored in the acquired data storage unit 220 are read by the image processing device 230, mask processing is performed by the image processing device 230, and training data is obtained. generated. Further, in the training phase, the image processing device 230 trains an object recognition unit (details will be described later) using the generated training data to generate a trained object recognition unit.
  • the robot control system 100 in the case of the depalletizing phase, the robot control system 100 - RGB camera 121; - a depth camera 122; an image processing device 130 (including a trained object recognizer); a robot 110; Consists of
  • the depalletizing phase the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 are respectively notified to the image processing device 130, and mask processing is performed by the image processing device 130. , object recognition processing is performed. Furthermore, in the case of the depalletizing phase, a control command is generated based on the result of the object recognition processing, and the robot 110 is controlled. As a result, the robot 110 depalletizes the cardboard to be depalletized.
  • the timing of photographing by the RGB camera 121 and the depth camera 122 is arbitrary, as long as the photographing is performed before the cardboard boxes to be depalletized are picked up.
  • the RGB camera 121 and the depth camera 122 can take pictures at any frequency. You can shoot once.
  • FIG. 2 shows the case where different image processing devices are used in the training phase and the depalletizing phase.
  • the image processing device 230 used in the training phase and the image processing device 130 used in the depalletizing phase may be the same image processing device.
  • FIG. 3 is a diagram showing an example of the hardware configuration of the image processing device.
  • the image processing apparatus 130 has a processor 301, a main storage device (memory) 302, an auxiliary storage device 303, a network interface 304, and a device interface 305 as components.
  • Image processing apparatus 130 is implemented as a computer in which these components are connected via bus 306 .
  • the image processing device 130 is shown as including each component, but the image processing device 130 may include a plurality of the same components.
  • the image processing program is installed in a plurality of image processing apparatuses, and each of the plurality of image processing apparatuses executes the image processing program. It may be configured to perform some of the same or different processes. In this case, a form of distributed computing may be adopted in which each image processing apparatus communicates via the network interface 304 or the like to execute the entire process.
  • the image processing apparatus 130 may be configured as a system in which functions are realized by one or more computers executing instructions stored in one or more storage devices. Also, various data transmitted from the RGB camera 121 and the depth camera 122 are processed by one or more image processing devices provided on the cloud, and the processing results are transmitted to the customer's image processing device. good too.
  • Various operations of the image processing device 130 may be executed in parallel using one or more processors or using multiple image processing devices via the communication network 310 . Also, various operations may be distributed to a plurality of operation cores in the processor 301 and executed in parallel. In addition, part or all of the processing, means, etc. of the present disclosure is executed by an external device 320 (at least one of a processor and a storage device) provided on a cloud that can communicate with the image processing device 130 via the communication network 310. may be As such, the image processor 130 may take the form of parallel computing by one or more computers.
  • the processor 301 may be an electronic circuit (processing circuit, processing circuitry, CPU, GPU, FPGA, ASIC, etc.) that performs at least computer control or computation.
  • Processor 301 may also be a general-purpose processor, a dedicated processing circuit designed to perform a particular operation, or a semiconductor device containing both a general-purpose processor and dedicated processing circuitry.
  • the processor 301 may include an optical circuit, or may include an arithmetic function based on quantum computing.
  • the processor 301 may perform various calculations based on various data and instructions input from each device of the internal configuration of the image processing device 130, and may output calculation results and control signals to each device.
  • the processor 301 may control each component included in the image processing apparatus 130 by executing an OS (Operating System), an application, or the like.
  • OS Operating System
  • Processor 301 may also refer to one or more electronic circuits located on one chip, or one or more electronic circuits located on two or more chips or two or more devices. may When multiple electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.
  • the main storage device 302 is a storage device that stores commands executed by the processor 301 and various data.
  • Auxiliary storage device 303 is a storage device other than main storage device 302 . Note that these storage devices mean arbitrary electronic components capable of storing various data, and may be semiconductor memories. The semiconductor memory may be either volatile memory or non-volatile memory.
  • a storage device for storing various data in the image processing apparatus 130 may be implemented by the main storage device 302 or the auxiliary storage device 303 , or may be implemented by an internal memory built into the processor 301 .
  • a plurality of processors 301 may be connected (coupled) to one main storage device 302, or a single processor 301 may be connected.
  • a plurality of main storage devices 302 may be connected (coupled) to one processor 301 .
  • the processor may include a configuration that is connected (coupled) to at least one main memory device 302 . Also, this configuration may be realized by the main storage device 302 and the processor 301 included in the plurality of image processing apparatuses 130 .
  • a configuration in which the main memory device 302 is integrated with the processor for example, a cache memory including an L1 cache and an L2 cache) may be included.
  • the network interface 304 is an interface for connecting to the communication network 310 wirelessly or by wire. Any suitable interface, such as one conforming to existing communication standards, may be used for network interface 304 .
  • the network interface 304 may exchange various data with the robot 110 and other external devices 320 connected via the communication network 310 .
  • the communication network 310 may be one or a combination of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), etc., and the computer, robot 110, and other external devices 320 may be used as long as information is exchanged. Examples of WANs include the Internet, examples of LANs include IEEE 802.11 and Ethernet (registered trademark), and examples of PANs include Bluetooth (registered trademark) and NFC (Near Field Communication).
  • the device interface 305 is an interface such as USB that directly connects to the external device 330 .
  • the external device 330 is a device connected to the computer.
  • the external device 330 may be an input device, as an example.
  • the input device is, for example, a device such as a camera (including the RGB camera 121 and depth camera 122 of this embodiment), microphone, motion capture, various sensors, keyboard, mouse, touch panel, etc., and provides acquired information to the computer.
  • a device such as a personal computer, a tablet terminal, a smartphone, or the like, which includes an input unit, a memory, and a processor.
  • the external device 330 may be an output device, for example.
  • the output device may be, for example, a display device such as an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) panel, or a speaker or the like for outputting sound.
  • a device such as a personal computer, a tablet terminal, a smartphone, or the like, which includes an output unit, a memory, and a processor may be used.
  • the external device 330 may be a storage device (memory).
  • the external device 330 may be a network storage or the like, and the external device 330 may be a storage such as an HDD.
  • the external device 330 may be a device having the functions of some of the components of the image processing device 130 . That is, the computer may transmit or receive part or all of the processing results of the external device 330 .
  • FIG. 4 is a first diagram showing an example of the functional configuration of the image processing device in the training phase.
  • the image processing program is installed in the image processing device 230, and in the training phase, by executing the program, as shown in FIG. - Three-dimensional information acquisition unit 410, - Mask region identification unit 420, - RGB image acquisition unit 430, a mask unit 440, - Training data generation unit 450, a training unit 460, function as
  • the three-dimensional information acquisition unit 410 acquires the three-dimensional information read from the acquired data storage unit 220 and notifies the mask area identification unit 420 of it.
  • the mask area specifying unit 420 accepts "processing range information" and "work rule information" in advance.
  • the processing range information is a range specified in the RGB image, and indicates, for example, a range on the pallet 140 and corresponding to a predetermined space in which cardboard boxes can be placed.
  • the work rule information received by the mask region specifying unit 420 is information used when specifying pixels to be included in the mask region. In the case of this embodiment, the mask area specifying unit 420 receives "pick up cardboard boxes one by one in order from the top" as the work rule information.
  • the mask region specifying unit 420 specifies a range in the RGB image based on the processing range information.
  • the mask area specifying unit 420 obtains three-dimensional information corresponding to the work rule information (three-dimensional information of the top surface of the highest corrugated cardboard in this embodiment) among the three-dimensional information notified from the three-dimensional information acquisition unit 410. to extract
  • the mask region specifying unit 420 selects pixels (predetermined condition are identified as pixels included in an area that satisfies
  • the masked area identifying unit 420 selects the conditions determined based on the work rule information among the pixels included in the RGB image within the range designated based on the processing range information.
  • a pixel associated with satisfying three-dimensional information is specified as a pixel to be included in the “non-mask region”.
  • the masked area identifying unit 420 identifies the non-masked area in the RGB image based on the three-dimensional information corresponding to the processing range information and the work rule information.
  • the masked area identifying unit 420 identifies areas other than the non-masked areas as "masked areas" and notifies the masking unit 440 of them.
  • the RGB image acquisition unit 430 acquires the RGB image read from the acquired data storage unit 220 and notifies the mask area identification unit 420 and the mask unit 440 of it.
  • the masking unit 440 performs mask processing on the color information associated with the pixels included in the “masked area” notified by the masked area identifying unit 420 in the RGB image notified by the RGB image acquiring unit 430 . , to generate a "mask image”.
  • the mask processing performed by the mask unit 440 on the color information associated with the pixels included in the mask area includes: The color information (R value, G value, B value) associated with pixels other than the pixels included in the non-masked area (pixels included in the masked area) in the RGB image is converted into a single predetermined color information.
  • R value, G value, B value indicating black or gray, etc. -Processing an image (masked area image) of an RGB image other than the non-masked area into an image (for example, a gradation image) containing a plurality of predetermined color information, or - Filtering processing such as blurring is performed on an image (image of the masked region) other than the non-masked region of the RGB image, and the image of the masked region is processed into an image that is more blurred than the image of the non-masked region; etc. are included.
  • the mask unit 440 notifies the training data generation unit 450 of the mask image.
  • the training data generation unit 450 stores the mask image notified from the mask unit 440 in the training data storage unit 470 as training data in association with the correct data.
  • the correct data here refers to a pixel-by-pixel recognition result that is appropriately recognized when object recognition processing is performed using a mask image, for example.
  • the pixel-by-pixel recognition result may be, for example, annotation data in which a different label is assigned to each pixel of an object (instance) to be recognized.
  • the training unit 460 uses the training data stored in the training data storage unit 470 to train the object recognition unit and generate a trained object recognition unit. Further, the training unit 460 sets the generated trained object recognition unit to the image processing device 130 that operates in the depalletizing phase.
  • FIG. 5 is a first diagram showing an example of the functional configuration of the training section.
  • the training unit 460 has an object recognition unit 510 and a comparison/modification unit 520, and uses the training data 500 to train the object recognition unit 510.
  • the training data 500 includes "mask image” and "correct data” as information items.
  • a mask image is stored in “mask image”.
  • the "correct data” stores the pixel-by-pixel recognition result that should be appropriately recognized based on the corresponding "mask image”.
  • the object recognition unit 510 is configured by a DNN, and outputs output data by inputting the "mask image" of the training data 500.
  • This output data may be, for example, segmentation data in which the probability of each label, which is different for each object, is assigned to each pixel.
  • the comparison/change unit 520 compares the output data output from the object recognition unit 510 with the "correct data" of the training data 500, and updates the model parameters of the object recognition unit 510 based on the comparison result. As a result, the training unit 460 generates a trained object recognition unit 510 .
  • FIG. 6 is a first diagram showing an example of the functional configuration of the image processing apparatus in the depalletizing phase.
  • the image processing program is installed in the image processing device 130, and in the depalletizing phase, by executing the program, as shown in FIG. - Three-dimensional information acquisition unit 410, - Mask region identification unit 420, - RGB image acquisition unit 430, a mask unit 440, a trained object recognizer 650; - Robot control unit 660, function as
  • the three-dimensional information acquisition unit 410 the mask area identification unit 420, the RGB image acquisition unit 430, and the mask unit 440 have already been explained using FIG. 4, so the explanation is omitted here.
  • a trained object recognition unit 650 is a trained object recognition unit that has been trained by the training unit 460 using the training data 500 .
  • the trained object recognition unit 650 receives the mask image notified from the mask unit 440, performs object recognition processing, and outputs a recognition result.
  • the robot control unit 660 generates a control command for controlling the motion of the robot 110 based on the recognition result output by the trained object recognition unit 650 and transmits it to the robot 110 .
  • the robot control unit 660 refers to the work rule information and generates the control command according to the work rule information.
  • FIG. 7 is a diagram showing a specific example of object recognition processing.
  • reference numeral 710 indicates the RGB image acquired by the RGB image acquisition section 430.
  • reference numeral 711 indicates processing range information received in advance by the mask region specifying unit 420 on an RGB image indicated by reference numeral 710 .
  • the processing range information indicates the area on the pallet 140 where the cardboard is located.
  • reference numeral 720 denotes a mask image generated by performing mask processing on the RGB image indicated by reference numeral 710 .
  • reference numeral 721 denotes the non-masked area identified by the masked area identification section 420
  • reference numeral 722 indicates the masked area identified by the masked area identification section 420.
  • non-masked area indicated by reference numeral 721 is the top surface area of the cardboard with the highest height.
  • reference numeral 721 since the heights of the two cardboards are approximately the same, the upper surface area of the two cardboards is specified as the non-masked area.
  • the color information associated with the pixels other than the pixels included in the non-mask area is black. , indicating that it has been converted into color information indicating
  • reference numeral 730 denotes object recognition processing using the mask image indicated by reference numeral 720, whereby the cardboard indicated by reference numeral 731 and the cardboard indicated by reference numeral 732 are each different objects (another instance). It shows how it is recognized as
  • FIG. 8 is a first flow chart showing the flow of processing of the entire robot control system.
  • step S801 the robot control system 100 shifts to the training phase and executes training processing. Details of the flowchart of the training process will be described later.
  • step S802 the robot control system 100 shifts to the depalletizing phase and executes the depalletizing process. Details of the flowchart of the depalletizing process will be described later.
  • FIG. 9 is a first flowchart showing the flow of training processing.
  • step S ⁇ b>901 the image processing device 230 acquires the RGB image captured by the RGB camera 121 or the virtual RGB image generated by the CG simulator 210 from the acquired data storage unit 220 .
  • the image processing device 230 also acquires 3D information captured by the depth camera 122 or virtual 3D information generated by the CG simulator 210 from the acquired data storage unit 220 .
  • step S902 the image processing device 230 identifies a non-masked area based on three-dimensional information in the RGB image based on the processing range information and work rule information.
  • step S903 the image processing device 230 performs mask processing on the color information associated with the pixels other than the pixels included in the non-mask area in the RGB image to generate a mask image.
  • step S904 the image processing device 230 acquires correct data.
  • step S905 the image processing device 230 generates training data by associating the mask image with the correct data.
  • step S906 the image processing device 230 uses the training data to train the object recognition unit.
  • step S907 the image processing device 230 determines whether or not the training end condition is satisfied. If it is determined in step S907 that the training end condition is not met (NO in step S907), the process returns to step S901.
  • step S907 if it is determined in step S907 that the training end condition is satisfied (if YES in step S907), the process proceeds to step S908.
  • step S908 the image processing device 230 sets the trained object recognition unit and ends the training process.
  • FIG. 10 is a first flowchart showing the flow of depalletizing processing.
  • step S ⁇ b>1001 the image processing device 130 acquires the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 .
  • step S1002 the image processing device 130 identifies a non-masked area in the RGB image within the range specified by the processing range information, based on the three-dimensional information corresponding to the work rule information.
  • step S1003 the image processing apparatus 130 performs mask processing on the color information associated with pixels other than the pixels included in the non-mask area in the RGB image to generate a mask image.
  • step S1004 the image processing device 130 performs object recognition processing by inputting the generated mask image to the trained object recognition unit.
  • step S1005 the image processing device 130 identifies cardboard boxes to be depalletized based on the results of the object recognition processing. For example, when a plurality of cardboard boxes are recognized as a result of the object recognition process, the image processing device 130 identifies the plurality of cardboard boxes as objects to be depalletized.
  • step S ⁇ b>1006 the image processing device 130 generates a control command for depalletizing the identified cardboard to be depalletized in the mask image according to the three-dimensional information and work rule information about the identified cardboard, and outputs it to the robot 110 .
  • the image processing device 130 determines the depalletizing order according to the work rule information, and generates a control command according to the three-dimensional information of the cardboard boxes to be depalletized.
  • step S1007 the image processing apparatus 130 determines whether or not there is another cardboard box to be depalletized. Specifically, the image processing apparatus 130 determines whether or not the next cardboard to be depalletized is on the pallet 140 after all the cardboard to be depalletized identified in step S1005 has been depalletized.
  • step S1007 If it is determined in step S1007 that there is another cardboard to be depalletized (if YES in step S1007), the process proceeds to step S1008.
  • step S1008 the image processing device 130 determines whether or not it is time to perform object recognition processing. If it is determined in step S1008 that it is not the time to perform the object recognition processing (NO in step S1008), the process waits until the timing to perform the object recognition processing.
  • step S1008 determines whether it is time to perform object recognition processing (if YES in step S1008). If it is determined in step S1008 that it is time to perform object recognition processing (if YES in step S1008), the process returns to step S1001.
  • step S1007 if it is determined in step S1007 that there is no cardboard to be depalletized (NO in step S1007), the depalletizing process ends.
  • FIG. 11 is a diagram showing a specific example of the depalletizing process. As shown in FIG. 11A, when four cardboards are stacked on the pallet 140, the robot control system 100 recognizes the cardboard 1101 as the tallest cardboard. FIG. 11B shows how the recognized cardboard 1101 is picked up.
  • FIG. 11C shows how the recognized cardboard 1102 is picked up.
  • FIG. 11E two cardboard boxes are placed on the pallet 140 after the cardboard box 1102 has been picked up.
  • FIG. 11(f) shows how the recognized cardboard 1103 is picked up.
  • one cardboard is placed on the pallet 140 after picking up the cardboard 1103, and the robot control system 100 selects the cardboard 1104 as the cardboard with the highest height.
  • FIG. 11(h) shows how the recognized cardboard 1104 is picked up.
  • the robot control system 100 includes: - For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes. - Based on the acquired three-dimensional information, mask processing is performed on part of the color information of the RGB image. - Object recognition processing is performed using the mask image after mask processing.
  • the mask unit 440 performs mask processing on color information associated with pixels included in the mask region has been described.
  • the target of the masking process performed by the masking unit 440 is not limited to the color information, and the masking process may be performed on the color information and the three-dimensional information.
  • the second embodiment will be described below, focusing on differences from the first embodiment.
  • FIG. 12 is a second diagram showing an example of the functional configuration of the image processing device in the training phase.
  • an image processing program is installed in the image processing device 230, and by executing the program, the same functions as in the first embodiment are realized in the training phase. be.
  • the mask unit 440 performs mask processing on color information and three-dimensional information associated with pixels included in the mask region, The point is to generate a mask image and mask three-dimensional information.
  • the mask processing performed on the three-dimensional information includes deleting the three-dimensional information (x-coordinate, y-coordinate, z-coordinate) associated with the pixels included in the mask area.
  • the training data generation unit 450 associates the mask image and mask three-dimensional information notified from the mask unit 440 with the correct data for training. The difference is that training data is generated and stored in the training data storage unit 470 .
  • FIG. 13 is a second diagram illustrating an example of the functional configuration of the training unit;
  • the difference from FIG. 5 described in the first embodiment is that the training data 1300 includes “three-dimensional mask information” as an information item, and three-dimensional mask information is stored.
  • 5 described in the first embodiment is that the "mask image” and "mask three-dimensional information" of the training data 1300 are input to the object recognition unit 510. .
  • FIG. 14 is a second diagram showing an example of the functional configuration of the image processing device in the depalletizing phase.
  • an image processing program is installed in the image processing apparatus 130, and by executing the program, the same functions as in the first embodiment are realized in the depalletizing phase. be.
  • the mask unit 440 performs mask processing on color information and three-dimensional information associated with pixels included in the mask region, The point is to generate a mask image and mask three-dimensional information.
  • the trained object recognition unit 650 performs object recognition processing based on the mask image and mask three-dimensional information notified by the mask unit 440, This is the point of outputting the recognition result.
  • FIG. 15 is a second flowchart showing the flow of training processing.
  • the difference from the first flowchart described using FIG. 9 is steps S1501 and S1502.
  • step S1501 the image processing device 230 performs mask processing on color information and three-dimensional information associated with pixels other than pixels included in the non-mask area to generate a mask image and mask three-dimensional information.
  • step S1502 the image processing device 230 generates training data by associating the mask image and mask three-dimensional information with the correct data.
  • FIG. 16 is a second flowchart showing the flow of depalletizing processing. The difference from the first flowchart described using FIG. 10 is steps S1601 and S1602.
  • step S1601 the image processing apparatus 130 performs mask processing on the color information and three-dimensional information associated with pixels other than the pixels included in the non-mask area, and generates a mask image and mask three-dimensional information.
  • step S1602 the image processing device 130 performs object recognition processing by inputting the generated mask image and mask three-dimensional information to a trained object recognition unit.
  • the robot control system 100 includes: - For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes. - Based on the acquired three-dimensional information, mask processing is performed on the color information and three-dimensional information of a part of the RGB image. - Object recognition processing is performed using the mask image after mask processing and mask three-dimensional information.
  • the photographing conditions for example, white balance, exposure, focus, etc.
  • the RGB camera 121 when performing the training process (step S801 in FIG. 8) are not mentioned, and are appropriately adjusted. explained as.
  • the third embodiment a case will be described in which a shooting condition adjustment phase is provided before the training process and the shooting conditions of the RGB camera 121 are adjusted. Note that the description will focus on the differences from the first embodiment.
  • FIG. 17 is a diagram illustrating an example of the functional configuration of the image processing apparatus in the imaging condition adjustment phase.
  • the image processing device 130 executes the image processing program in the imaging condition adjustment phase, the image processing device 130 performs the following operations as shown in FIG. - Three-dimensional information acquisition unit 410, - Mask region identification unit 420, - RGB image acquisition unit 430, a mask unit 440, - Imaging condition adjustment unit 1250, function as
  • the mask unit 440 performs mask processing on color information associated with pixels included in the mask region of the RGB image, and notifies the shooting condition adjustment unit 1750 of the generated mask image.
  • the imaging condition adjustment unit 1750 adjusts the imaging conditions based on the mask image notified from the mask unit 440.
  • the shooting conditions adjusted by the shooting condition adjusting unit 1750 include the white balance, exposure, focus, etc. of the RGB camera 121 .
  • the shooting condition adjustment unit 1750 transmits and sets the adjusted shooting conditions to the RGB camera 121 .
  • the robot control system 100 adjusts the imaging conditions of the RGB camera 121 based on the mask image. Accordingly, it is possible to set shooting conditions suitable for object recognition processing. As a result, according to the robot control system 100 according to the third embodiment, it is possible to improve recognition accuracy in object recognition processing.
  • FIG. 18 is a second flowchart showing the flow of processing of the entire robot control system. The difference from the first flowchart shown in FIG. 8 is that shooting condition adjustment processing (step S1801) is included before training processing (step S801).
  • step S1801 the robot control system 100 shifts to the imaging condition adjustment phase and executes imaging condition adjustment processing. Details of the flowchart of the shooting condition adjustment process will be described with reference to FIG. 19 .
  • FIG. 19 is a flowchart showing the flow of imaging condition adjustment processing.
  • step S ⁇ b>1901 the image processing device 130 acquires the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 .
  • step S1902 the image processing device 130 identifies a non-masked area based on three-dimensional information in the RGB image based on the processing range information and work rule information.
  • step S1903 the image processing apparatus 130 performs mask processing on color information associated with pixels other than pixels included in the non-mask area in the RGB image to generate a mask image.
  • step S1904 the image processing device 130 adjusts the shooting conditions of the RGB camera 121 based on the mask image.
  • step S1905 the image processing device 130 acquires an RGB image captured by the RGB camera 121 under the shooting conditions adjusted in step S1904.
  • step S1906 in the RGB image acquired in step S1905, the image processing apparatus 130 performs mask processing on color information associated with pixels other than the pixels included in the non-masked region specified in step S1902. . Thereby, the image processing device 130 generates a mask image.
  • step S1907 the image processing device 130 evaluates the mask image generated in step S1906.
  • step S1908 the image processing apparatus 130 determines whether or not the imaging conditions have been optimized based on the evaluation result of the mask image. ), the process returns to step S1904. At this time, if the arrangement of the cardboard boxes stacked on the pallet 140 is to be changed, the process returns to step S1901.
  • step S1908 determines whether it has been optimized (if YES in step S1908). If it is determined in step S1908 that it has been optimized (if YES in step S1908), the process proceeds to step S1909.
  • step S1909 the image processing device 130 transmits and sets the optimized shooting conditions to the RGB camera 121, and ends the shooting condition adjustment processing.
  • the image processing device 130 transmits and sets the optimized shooting conditions to the RGB camera 121, and ends the shooting condition adjustment processing.
  • the robot control system 100 For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes. - Based on the acquired three-dimensional information, mask processing is performed on part of the color information of the RGB image. ⁇ Adjust the shooting conditions of the RGB camera using the mask image after the mask processing.
  • the robot control system 100 according to the third embodiment, it is possible to set shooting conditions suitable for object recognition processing, and to improve recognition accuracy in object recognition processing.
  • the RGB camera 121 and the depth camera 122 are fixedly attached to the ceiling of the space where the robot 110 is arranged. Not limited to ceilings. Also, the mounting positions of the RGB camera 121 and the depth camera 122 are not limited to right above the pallet 140 . Furthermore, attachment destinations of the RGB camera 121 and the depth camera 122 are not limited to non-movable objects, and may be movable objects (for example, the robot 110).
  • the three-dimensional information acquisition unit 410 acquires three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate) captured by the depth camera 122 as three-dimensional information.
  • the three-dimensional information acquisition section 410 may acquire three-dimensional information other than the three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate).
  • Three-dimensional information other than three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate) includes, for example, point cloud information, mesh information, and the like.
  • the image processing device 130 has the robot control unit 660 in the depalletizing phase.
  • robot controller 660 may be implemented within robot 110 .
  • the image processing device 130 has the trained object recognition unit 650 in the depalletizing phase.
  • trained object recognizer 650 may be implemented in a device other than image processing device 130 .
  • the image processing apparatus 130 has the imaging condition adjustment unit 1750 in the imaging condition adjustment phase.
  • the imaging condition adjustment unit 1750 may be implemented in a device other than the image processing device 130. FIG.
  • the object recognition unit is trained using masked data (mask images and mask three-dimensional information) as training data, and the trained object recognition unit
  • masked data a mask image and mask three-dimensional information
  • the object recognition unit is trained using data (RGB images and 3D information) that have not undergone mask processing, and the masked data (mask An image or three-dimensional mask information) may be input to perform object recognition processing.
  • the expression "at least one (one) of a, b and c" or “at least one (one) of a, b or c" includes any of a, b, c, ab, ac, bc, or abc. It may also include multiple instances of any element, such as aa, abb, aabbbcc, and so on. It also includes the addition of elements other than the listed elements (a, b and c), such as having d as in abcd.
  • connection and “coupled” refer to direct connection/coupling, indirect connection including but not limited to /coupled, electrically connected/coupled, communicatively connected/coupled, operatively connected/coupled, physically connected/coupled, etc. intended as a generic term.
  • the term should be interpreted appropriately according to the context in which the term is used, but any form of connection/bonding that is not intentionally or naturally excluded is not included in the term. should be interpreted restrictively.
  • the physical structure of element A performs operation B have possible configuration and that the permanent or temporary setting/configuration of element A is configured/set to actually perform action B may contain.
  • element A is a general-purpose processor
  • the processor has a hardware configuration capable of executing operation B, and operation B is performed by setting a permanent or temporary program (instruction). It just needs to be configured to actually run.
  • the element A is a dedicated processor or a dedicated arithmetic circuit, etc., regardless of whether or not control instructions and data are actually attached, the circuit structure of the processor actually executes the operation B. It just needs to be implemented.
  • each piece of hardware may work together to perform the predetermined processing. You may perform all the processing of. Also, some hardware may perform a part of the predetermined processing, and another hardware may perform the rest of the predetermined processing.
  • the hardware that performs the first process and the hardware that performs the second process may be the same or different. In other words, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more pieces of hardware.
  • the hardware may include electronic circuits, devices including electronic circuits, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present invention comprises increasing the recognition accuracy in object recognition processing. This image processing device comprises one or a plurality of memories and one or a plurality of processors. The one or a plurality of processors execute, regarding a predetermined space including at least one object: acquisition of a multiple-tone image and three-dimensional information of the predetermined space; masking of a part of the multiple-tone image on the basis of the three-dimensional information; and predetermined processing using the masked multiple-tone image.

Description

画像処理装置、画像処理方法、画像処理プログラム及びロボット制御システムImage processing device, image processing method, image processing program, and robot control system
 本開示は、画像処理装置、画像処理方法、画像処理プログラム及びロボット制御システムに関する。 The present disclosure relates to an image processing device, an image processing method, an image processing program, and a robot control system.
 パレットに積まれた荷物を下ろす作業(デパレタイズ)を自動化するロボット制御システムの開発が進められている。当該ロボット制御システムでは、例えば、RGBカメラで撮影されたRGB画像を用いて、デパレタイズ対象となる荷物を特定する物体認識処理が行われる。 Development of a robot control system that automates the work of unloading pallets (depalletizing) is underway. In the robot control system, for example, an RGB image captured by an RGB camera is used to perform object recognition processing for identifying parcels to be depalletized.
特開2020-075340号公報JP 2020-075340 A 特許第6211734号Patent No. 6211734
 本開示は、物体認識処理における認識精度を向上させる。 The present disclosure improves recognition accuracy in object recognition processing.
 本開示の一態様による画像処理装置は、例えば、以下のような構成を有する。即ち、
 1又は複数のメモリと、
 1又は複数のプロセッサと、を備え、
 前記1又は複数のプロセッサは、
  1以上の物体が含まれる所定の空間について、当該所定の空間の階調画像と当該所定の空間の3次元情報とを取得することと、
  前記3次元情報に基づいて前記階調画像の一部をマスクすることと、
  マスク後の前記階調画像を用いて、所定の処理を行うことと、
 を実行する。
An image processing apparatus according to one aspect of the present disclosure has, for example, the following configuration. Namely
one or more memories;
one or more processors;
The one or more processors are
Acquiring a gradation image of the predetermined space and three-dimensional information of the predetermined space for the predetermined space containing one or more objects;
masking a portion of the grayscale image based on the three-dimensional information;
performing a predetermined process using the masked gradation image;
to run.
図1は、ロボット制御システムの利用シーンの一例を示す図である。FIG. 1 is a diagram showing an example of a usage scene of a robot control system. 図2は、ロボット制御システムの各フェーズのシステム構成の一例を示す図である。FIG. 2 is a diagram showing an example of the system configuration of each phase of the robot control system. 図3は、画像処理装置のハードウェア構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of a hardware configuration of an image processing apparatus; 図4は、訓練フェーズにおける画像処理装置の機能構成の一例を示す第1の図である。FIG. 4 is a first diagram showing an example of the functional configuration of the image processing device in the training phase. 図5は、訓練部の機能構成の一例を示す第1の図である。FIG. 5 is a first diagram showing an example of the functional configuration of the training section. 図6は、デパレタイズフェーズにおける画像処理装置の機能構成の一例を示す第1の図である。FIG. 6 is a first diagram showing an example of the functional configuration of the image processing apparatus in the depalletizing phase. 図7は、物体認識処理の具体例を示す図である。FIG. 7 is a diagram showing a specific example of object recognition processing. 図8は、ロボット制御システム全体の処理の流れを示す第1のフローチャートである。FIG. 8 is a first flow chart showing the flow of processing of the entire robot control system. 図9は、訓練処理の流れを示すフローチャートである。FIG. 9 is a flowchart showing the flow of training processing. 図10は、デパレタイズ処理の流れを示す第1のフローチャートである。FIG. 10 is a first flowchart showing the flow of depalletizing processing. 図11は、デパレタイズ処理の具体例を示す図である。FIG. 11 is a diagram showing a specific example of the depalletizing process. 図12は、訓練フェーズにおける画像処理装置の機能構成の一例を示す第2の図である。FIG. 12 is a second diagram showing an example of the functional configuration of the image processing device in the training phase. 図13は、訓練部の機能構成の一例を示す第2の図である。FIG. 13 is a second diagram illustrating an example of the functional configuration of the training unit; 図14は、デパレタイズフェーズにおける画像処理装置の機能構成の一例を示す第2の図である。FIG. 14 is a second diagram showing an example of the functional configuration of the image processing device in the depalletizing phase. 図15は、訓練処理の流れを示す第2のフローチャートである。FIG. 15 is a second flowchart showing the flow of training processing. 図16は、デパレタイズ処理の流れを示す第2のフローチャートである。FIG. 16 is a second flowchart showing the flow of depalletizing processing. 図17は、撮影条件調整フェーズにおける画像処理装置の機能構成の一例を示す図である。FIG. 17 is a diagram illustrating an example of the functional configuration of the image processing apparatus in the imaging condition adjustment phase. 図18は、ロボット制御システム全体の処理の流れを示す第2のフローチャートである。FIG. 18 is a second flowchart showing the flow of processing of the entire robot control system. 図19は、撮影条件調整処理の流れを示すフローチャートである。FIG. 19 is a flowchart showing the flow of imaging condition adjustment processing.
 以下、各実施形態について添付の図面を参照しながら説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複した説明を省略する。 Each embodiment will be described below with reference to the attached drawings. In the present specification and drawings, constituent elements having substantially the same functional configuration are denoted by the same reference numerals, thereby omitting redundant description.
 [第1の実施形態]
 <ロボット制御システムの利用シーン>
 はじめに、第1の実施形態に係るロボット制御システムの利用シーンについて説明する。図1は、ロボット制御システムの利用シーンの一例を示す図である。図1の例は、ロボット制御システム100を、デパレタイズの自動化に利用したシーンを示している。
[First embodiment]
<Usage scene of the robot control system>
First, a usage scene of the robot control system according to the first embodiment will be described. FIG. 1 is a diagram showing an example of a usage scene of a robot control system. The example of FIG. 1 shows a scene in which the robot control system 100 is used for automating depalletizing.
 なお、図1の例は、物体の一例である荷物として、直方体の形状を有する1以上の段ボールがパレット140に積まれている様子を示している。また、図1の例は、ロボット110が、上から順番に、段ボールを1つずつピックアップして、コンベヤなどの搬送ユニット160に下ろす作業(デパレタイズ)を行う様子を示している。 Note that the example of FIG. 1 shows a state in which one or more cardboard boxes having a rectangular parallelepiped shape are stacked on a pallet 140 as a package, which is an example of an object. The example of FIG. 1 also shows how the robot 110 picks up cardboard boxes one by one from the top and lowers them onto a conveying unit 160 such as a conveyor (depalletizing).
 ただし、図1に示す利用シーンは一例であり、ロボット制御システム100により自動化される作業は、デパレタイズに限定されず、他の作業であってもよい。また、ロボット制御システム100によりデパレタイズされる対象となる荷物の種類は、段ボールに限定されず、他の種類の荷物であってもよい。また、荷物の形状は直方体に限定されず、他の任意の形状が許容される。 However, the usage scene shown in FIG. 1 is an example, and the work automated by the robot control system 100 is not limited to depalletizing, and may be other work. Further, the type of packages to be depalletized by the robot control system 100 is not limited to cardboard, and other types of packages may be used. Also, the shape of the luggage is not limited to a rectangular parallelepiped, and any other shape is permitted.
 また、段ボールをピックアップする順番は、上から順番である必要はなく、例えば、所定の高さにある段ボールを優先してピックアップしてもよい。あるいは、パレット140上の特定の領域を優先して、優先した領域において段ボールを上から順番にピックアップしてもよい。 Also, the order of picking up the cardboard does not have to be from the top, and for example, cardboard at a predetermined height may be picked up preferentially. Alternatively, a specific area on the pallet 140 may be prioritized and the cardboard may be picked up in order from the top in the prioritized area.
 更に、1回の作業でピックアップする段ボールの個数は1つに限定されず、1回の作業で複数の段ボールをピックアップしてもよい。これらの作業ルールは、作業ルール情報として予め規定されているものとし、ロボット制御システム100では、当該作業ルール情報に従って作業を行う。 Furthermore, the number of cardboards to be picked up in one operation is not limited to one, and multiple cardboards may be picked up in one operation. These work rules are defined in advance as work rule information, and the robot control system 100 performs work according to the work rule information.
 なお、本実施形態において、以下では、作業ルール情報の一例として、“段ボールを上から順番に1つずつピックアップする”という作業ルールが規定されているものとして説明する。 In the present embodiment, it is assumed that a work rule "pick up cardboard boxes one by one in order from the top" is defined as an example of work rule information.
 図1の説明に戻る。図1に示すように、ロボット制御システム100は、ロボット110と、RGBカメラ121と、深度カメラ122と、画像処理装置130とを有する。図1(a)は、符号171に示す方向をそれぞれx軸方向及びz軸方向とした場合の、y軸のプラス方向(紙面の手前から奥に向かう方向)にロボット制御システム100を見た様子を示している。一方、図1(b)は、符号172に示す方向をそれぞれy軸方向及びz軸方向とした場合の、x軸のマイナス方向(紙面の手前から奥に向かう方向)にロボット制御システム100を見た様子を示している。 Return to the description of Figure 1. As shown in FIG. 1, the robot control system 100 has a robot 110, an RGB camera 121, a depth camera 122, and an image processing device . FIG. 1A shows how the robot control system 100 is viewed in the plus direction of the y-axis (the direction from the front to the back of the page) when the directions indicated by reference numeral 171 are the x-axis direction and the z-axis direction, respectively. is shown. On the other hand, FIG. 1B shows the robot control system 100 viewed in the negative direction of the x-axis (the direction from the front to the back of the paper) when the directions indicated by reference numeral 172 are the y-axis direction and the z-axis direction, respectively. It shows how it looks.
 図1(a)、(b)に示すRGBカメラ121は、画像を生成する画像生成装置の一例である。RGBカメラ121は、例えば、ロボット110が配された空間の天井に固定して取り付けられ、パレット140に積まれた段ボール150を、真上から撮影することで、RGB画像を画像処理装置130に通知する。なお、本実施形態を含め、本開示では、画素ごとに色情報を持つRGB画像を扱う例について説明するが、RGB画像に代えて、画素ごとに明るさ情報を持つグレースケール画像を扱ってもよい。すなわち、本開示では、色あるいは明るさの階調情報(階調値)を画素ごとに有する階調画像の一例として、色情報を有するカラー画像、具体的には色情報として、
・R(Red:赤)、
・G(Green:緑)、
・B(Blue:青)、
の三原色の値で表現したRGB画像について説明する。なお、カラー画像の他の例としては、例えば、色情報をLab色空間の値で表現したLab画像や、色情報を色相・彩度・輝度の値で表現したHSL画像等が挙げられる。
The RGB camera 121 shown in FIGS. 1A and 1B is an example of an image generation device that generates an image. The RGB camera 121 is, for example, fixedly attached to the ceiling of the space where the robot 110 is arranged, and notifies the image processing device 130 of the RGB image by photographing the cardboard 150 stacked on the pallet 140 from directly above. do. Note that in the present disclosure, including the present embodiment, an example of handling an RGB image having color information for each pixel will be described, but a grayscale image having brightness information for each pixel may be handled instead of the RGB image. good. That is, in the present disclosure, as an example of a gradation image having color or brightness gradation information (gradation value) for each pixel, a color image having color information, specifically, as color information,
・R (Red),
・G (Green),
・B (Blue),
An RGB image represented by three primary color values will be described. Other examples of color images include Lab images in which color information is represented by values in the Lab color space, and HSL images in which color information is represented by values of hue, saturation, and luminance.
 また、図1(a)、(b)に示す深度カメラ122は、3次元情報を生成する3次元情報生成装置の一例である。深度カメラ122は、例えば、ロボット110が配された空間の天井に固定して取り付けられ、パレット140に積まれた段ボール150を、真上から撮影することで、3次元情報を画像処理装置130に通知する。 Also, the depth camera 122 shown in FIGS. 1(a) and 1(b) is an example of a three-dimensional information generating device that generates three-dimensional information. The depth camera 122 is, for example, fixedly attached to the ceiling of the space where the robot 110 is arranged, and captures the cardboard 150 stacked on the pallet 140 from directly above, thereby providing three-dimensional information to the image processing device 130. Notice.
 なお、RGBカメラ121により撮影されるRGB画像と、深度カメラ122により撮影される3次元情報とは、同じ分解能により構成されていても、異なる分解能により構成されていてもよい。ただし、画像処理装置130においてRGB画像及び3次元情報を処理するにあたり、RGB画像の各画素(2次元座標(X座標、Y座標))には、色情報(R値、G値、B値)に加えて、3次元情報(x座標、y座標、z座標)が対応付けられている。なお、以下の説明において、特に断りがなければ、画素とは、RGB画像中の画素を指すものとする。 Note that the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 may have the same resolution or may have different resolutions. However, in processing the RGB image and the three-dimensional information in the image processing device 130, each pixel (two-dimensional coordinates (X coordinate, Y coordinate)) of the RGB image contains color information (R value, G value, B value). , three-dimensional information (x-coordinate, y-coordinate, z-coordinate) is associated. In the following description, unless otherwise specified, pixels refer to pixels in an RGB image.
 また、図1(a)、(b)に示す画像処理装置130は、通知されたRGB画像及び3次元情報に基づいて、ロボット110を制御する制御指令を出力する。 Also, the image processing device 130 shown in FIGS. 1(a) and 1(b) outputs a control command for controlling the robot 110 based on the notified RGB image and three-dimensional information.
 具体的には、画像処理装置130は、RGB画像において、3次元情報に基づいて、高さが最も高い段ボールの上面の領域を特定する。 Specifically, the image processing device 130 identifies the area of the top surface of the cardboard with the highest height based on the three-dimensional information in the RGB image.
 また、画像処理装置130は、RGB画像において、特定した領域に含まれる画素以外の画素に対応付けられた色情報に対してマスク処理を行う。なお、以下では、RGB画像において、特定した領域に含まれる画素以外の画素に対応付けられた色情報に対してマスク処理を行うことで生成した、マスク後のRGB画像を「マスク画像」と称す。 In addition, the image processing device 130 performs mask processing on color information associated with pixels other than the pixels included in the specified region in the RGB image. Note that hereinafter, a masked RGB image generated by performing mask processing on color information associated with pixels other than the pixels included in the specified region in the RGB image is referred to as a “mask image”. .
 また、画像処理装置130は、マスク画像を用いて、物体認識処理を行う。更に、画像処理装置130は、物体認識処理の結果に基づいて、デパレタイズ対象として特定した段ボールをロボット110がデパレタイズするための制御指令を生成し、ロボット110に送信する。 The image processing device 130 also uses the mask image to perform object recognition processing. Furthermore, the image processing device 130 generates a control command for the robot 110 to depalletize the corrugated cardboard identified as a depalletizing target based on the result of the object recognition processing, and transmits the control command to the robot 110 .
 図1(a)は、画像処理装置130から送信された制御指令に基づいて、ロボット110が、高さが最も高い段ボールを1つピックアップした後、太矢印180方向に回転する様子を示している。 FIG. 1(a) shows how the robot 110 rotates in the direction of the thick arrow 180 after picking up one of the tallest cardboard boxes based on the control command sent from the image processing device 130. FIG. .
 また、図1(b)は、太矢印180方向へのロボット110の回転が完了し、ピックアップしている段ボールを、搬送ユニット160上に下ろす様子を示している。 Also, FIG. 1(b) shows how the robot 110 completes rotation in the direction of the thick arrow 180 and lowers the picked up cardboard onto the transport unit 160 .
 このように、第1の実施形態に係るロボット制御システム100は、RGB画像において、3次元情報に基づいてマスク処理を行い、マスク画像を用いて物体認識処理を行う。これにより、例えば、マスク処理を行っていないRGB画像を用いて物体認識処理を行う場合と比較して、誤認識を抑え、認識精度を向上させることができる。 In this way, the robot control system 100 according to the first embodiment performs mask processing based on three-dimensional information in the RGB image, and performs object recognition processing using the mask image. As a result, for example, erroneous recognition can be suppressed and recognition accuracy can be improved, compared to the case where object recognition processing is performed using an RGB image that has not undergone mask processing.
 <ロボット制御システムのシステム構成>
 次に、ロボット制御システム100のシステム構成について説明する。図2は、ロボット制御システムのシステム構成の一例を示す図である。
<System configuration of robot control system>
Next, the system configuration of the robot control system 100 will be described. FIG. 2 is a diagram illustrating an example of a system configuration of a robot control system;
 上述したように、画像処理装置130は、マスク画像を用いて物体認識処理を行う。ここで、画像処理装置130が物体認識処理(インスタンスセグメンテーション)を行う際に用いる物体認識部は、例えば、DNN(Deep Neural Network:深層ニューラルネットワーク)により構成される。そして、当該物体認識部は、マスク画像を訓練用データとして用いることで訓練される。 As described above, the image processing device 130 performs object recognition processing using mask images. Here, the object recognition unit used when the image processing device 130 performs object recognition processing (instance segmentation) is configured by, for example, a DNN (Deep Neural Network). The object recognition unit is then trained using the mask image as training data.
 つまり、ロボット制御システム100のシステム構成は、図2に示すように、「訓練フェーズ」におけるシステム構成(図2(a))と、「デパレタイズフェーズ」におけるシステム構成(図2(b))とに分けることができる。 That is, the system configuration of the robot control system 100 is, as shown in FIG. can be divided into
 図2(a)に示すように、訓練フェーズの場合、ロボット制御システム100は、例えば、
・RGBカメラ121と、
・深度カメラ122と、
・CG(Computer Graphic)シミュレータ210と、
・取得データ格納部220と、
・画像処理装置230と、
により構成される。
As shown in FIG. 2(a), in the training phase, the robot control system 100, for example,
- RGB camera 121;
- a depth camera 122;
A CG (Computer Graphic) simulator 210;
Acquisition data storage unit 220;
- an image processing device 230;
Consists of
 訓練フェーズの場合、RGBカメラ121により撮影されたRGB画像、及び、深度カメラ122により撮影された3次元情報は、取得データ格納部220に格納されてもよい。また、訓練フェーズの場合、CGシミュレータ210によりロボット110が配される環境が再現されることで、仮想のRGB画像及び仮想の3次元情報が生成され、取得データ格納部220に格納されてもよい。 In the training phase, the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 may be stored in the acquired data storage unit 220 . In the training phase, the CG simulator 210 reproduces the environment in which the robot 110 is placed, thereby generating a virtual RGB image and virtual three-dimensional information, which may be stored in the acquired data storage unit 220. .
 また、訓練フェーズの場合、取得データ格納部220に格納されたRGB画像及び3次元画像は、それぞれ画像処理装置230に読み出され、画像処理装置230にてマスク処理が行われ、訓練用データが生成される。更に、訓練フェーズの場合、画像処理装置230では、生成された訓練用データを用いて、物体認識部(詳細は後述)が訓練され、訓練済み物体認識部が生成される。 In the training phase, the RGB image and the three-dimensional image stored in the acquired data storage unit 220 are read by the image processing device 230, mask processing is performed by the image processing device 230, and training data is obtained. generated. Further, in the training phase, the image processing device 230 trains an object recognition unit (details will be described later) using the generated training data to generate a trained object recognition unit.
 一方、図2(b)に示すように、デパレタイズフェーズの場合、ロボット制御システム100は、
・RGBカメラ121と、
・深度カメラ122と、
・画像処理装置130(訓練済み物体認識部を含む)と、
・ロボット110と、
により構成される。
On the other hand, as shown in FIG. 2(b), in the case of the depalletizing phase, the robot control system 100
- RGB camera 121;
- a depth camera 122;
an image processing device 130 (including a trained object recognizer);
a robot 110;
Consists of
 デパレタイズフェーズの場合、RGBカメラ121により撮影されたRGB画像、及び、深度カメラ122により撮影された3次元情報は、それぞれ画像処理装置130に通知され、画像処理装置130にてマスク処理が行われ、物体認識処理が行われる。更に、デパレタイズフェーズの場合、物体認識処理の結果に基づいて制御指令が生成され、ロボット110が制御される。これにより、ロボット110では、デパレタイズ対象の段ボールをデパレタイズする。 In the case of the depalletizing phase, the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 are respectively notified to the image processing device 130, and mask processing is performed by the image processing device 130. , object recognition processing is performed. Furthermore, in the case of the depalletizing phase, a control command is generated based on the result of the object recognition processing, and the robot 110 is controlled. As a result, the robot 110 depalletizes the cardboard to be depalletized.
 なお、デパレタイズフェーズにおいて、RGBカメラ121及び深度カメラ122による撮影のタイミングは任意であり、デパレタイズ対象の段ボールをピックアップする前までに撮影が行われていればよい。 It should be noted that in the depalletizing phase, the timing of photographing by the RGB camera 121 and the depth camera 122 is arbitrary, as long as the photographing is performed before the cardboard boxes to be depalletized are picked up.
 また、デパレタイズフェーズにおいて、RGBカメラ121及び深度カメラ122による撮影頻度も任意であり、例えば、1つの段ボールをデパレタイズするごとに、毎回撮影してもよいし、複数個の段ボールをデパレタイズするごとに1回撮影してもよい。 In the depalletizing phase, the RGB camera 121 and the depth camera 122 can take pictures at any frequency. You can shoot once.
 また、図2では、訓練フェーズとデパレタイズフェーズとで、異なる画像処理装置を用いる場合について示した。しかしながら、訓練フェーズで用いる画像処理装置230と、デパレタイズフェーズで用いる画像処理装置130とは同じ画像処理装置であってもよい。 Also, FIG. 2 shows the case where different image processing devices are used in the training phase and the depalletizing phase. However, the image processing device 230 used in the training phase and the image processing device 130 used in the depalletizing phase may be the same image processing device.
 <画像処理装置のハードウェア構成>
 次に、画像処理装置130、230のハードウェア構成について説明する。なお、画像処理装置130と画像処理装置230とは、同様のハードウェア構成を有することから、ここでは、代表して、画像処理装置130のハードウェア構成について説明する。
<Hardware Configuration of Image Processing Apparatus>
Next, hardware configurations of the image processing apparatuses 130 and 230 will be described. Since the image processing device 130 and the image processing device 230 have the same hardware configuration, the hardware configuration of the image processing device 130 will be described here as a representative.
 図3は、画像処理装置のハードウェア構成の一例を示す図である。図3に示すように、画像処理装置130は、構成要素として、プロセッサ301、主記憶装置(メモリ)302、補助記憶装置303、ネットワークインタフェース304、デバイスインタフェース305を有する。画像処理装置130は、これらの構成要素がバス306を介して接続されたコンピュータとして実現される。 FIG. 3 is a diagram showing an example of the hardware configuration of the image processing device. As shown in FIG. 3, the image processing apparatus 130 has a processor 301, a main storage device (memory) 302, an auxiliary storage device 303, a network interface 304, and a device interface 305 as components. Image processing apparatus 130 is implemented as a computer in which these components are connected via bus 306 .
 なお、図3の例では、画像処理装置130は、各構成要素を1つずつ備えるものとして示しているが、画像処理装置130は、同じ構成要素を複数備えていてもよい。また、図3の例では、1台の画像処理装置130が示されているが、画像処理プログラムが複数台の画像処理装置にインストールされて、当該複数台の画像処理装置それぞれが画像処理プログラムの同一の又は異なる一部の処理を実行するように構成してもよい。この場合、画像処理装置それぞれがネットワークインタフェース304等を介して通信して全体の処理を実行する分散コンピューティングの形態をとってもよい。つまり、画像処理装置130は、1又は複数の記憶装置に記憶された命令を1台又は複数台のコンピュータが実行することで機能を実現するシステムとして構成されてもよい。また、RGBカメラ121及び深度カメラ122から送信された各種データをクラウド上に設けられた1台又は複数台の画像処理装置で処理し、処理結果を顧客の画像処理装置に送信する構成であってもよい。 In the example of FIG. 3, the image processing device 130 is shown as including each component, but the image processing device 130 may include a plurality of the same components. Further, although one image processing apparatus 130 is shown in the example of FIG. 3, the image processing program is installed in a plurality of image processing apparatuses, and each of the plurality of image processing apparatuses executes the image processing program. It may be configured to perform some of the same or different processes. In this case, a form of distributed computing may be adopted in which each image processing apparatus communicates via the network interface 304 or the like to execute the entire process. In other words, the image processing apparatus 130 may be configured as a system in which functions are realized by one or more computers executing instructions stored in one or more storage devices. Also, various data transmitted from the RGB camera 121 and the depth camera 122 are processed by one or more image processing devices provided on the cloud, and the processing results are transmitted to the customer's image processing device. good too.
 画像処理装置130の各種演算は、1又は複数のプロセッサを用いて、又は通信ネットワーク310を介した複数台の画像処理装置を用いて、並列処理で実行されてもよい。また、各種演算は、プロセッサ301内に複数ある演算コアに振り分けられて、並列処理で実行されてもよい。また、本開示の処理、手段等の一部又は全部は、通信ネットワーク310を介して画像処理装置130と通信可能なクラウド上に設けられた外部装置320(プロセッサ及び記憶装置の少なくとも一方)により実行されてもよい。このように、画像処理装置130は、1台又は複数台のコンピュータによる並列コンピューティングの形態をとってもよい。 Various operations of the image processing device 130 may be executed in parallel using one or more processors or using multiple image processing devices via the communication network 310 . Also, various operations may be distributed to a plurality of operation cores in the processor 301 and executed in parallel. In addition, part or all of the processing, means, etc. of the present disclosure is executed by an external device 320 (at least one of a processor and a storage device) provided on a cloud that can communicate with the image processing device 130 via the communication network 310. may be As such, the image processor 130 may take the form of parallel computing by one or more computers.
 プロセッサ301は、少なくともコンピュータの制御又は演算の何れかを行う電子回路(処理回路、Processing circuit、Processing circuitry、CPU、GPU、FPGA、ASIC等)であってもよい。また、プロセッサ301は、汎用プロセッサ、特定の演算を実行するために設計された専用の処理回路又は汎用プロセッサと専用の処理回路との両方を含む半導体装置のいずれであってもよい。また、プロセッサ301は、光回路を含むものであってもよいし、量子コンピューティングに基づく演算機能を含むものであってもよい。 The processor 301 may be an electronic circuit (processing circuit, processing circuitry, CPU, GPU, FPGA, ASIC, etc.) that performs at least computer control or computation. Processor 301 may also be a general-purpose processor, a dedicated processing circuit designed to perform a particular operation, or a semiconductor device containing both a general-purpose processor and dedicated processing circuitry. Also, the processor 301 may include an optical circuit, or may include an arithmetic function based on quantum computing.
 プロセッサ301は、画像処理装置130の内部構成の各装置等から入力された各種データや命令に基づいて各種演算を行ってもよく、演算結果や制御信号を各装置等に出力してもよい。プロセッサ301は、OS(Operating System)や、アプリケーション等を実行することにより、画像処理装置130が備える各構成要素を制御してもよい。 The processor 301 may perform various calculations based on various data and instructions input from each device of the internal configuration of the image processing device 130, and may output calculation results and control signals to each device. The processor 301 may control each component included in the image processing apparatus 130 by executing an OS (Operating System), an application, or the like.
 また、プロセッサ301は、1チップ上に配置された1又は複数の電子回路を指してもよいし、2つ以上のチップあるいは2つ以上のデバイス上に配置された1又は複数の電子回路を指してもよい。複数の電子回路を用いる場合、各電子回路は有線又は無線により通信してもよい。 Processor 301 may also refer to one or more electronic circuits located on one chip, or one or more electronic circuits located on two or more chips or two or more devices. may When multiple electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.
 主記憶装置302は、プロセッサ301が実行する命令及び各種データ等を記憶する記憶装置であり、主記憶装置302に記憶された各種データがプロセッサ301により読み出されてもよい。補助記憶装置303は、主記憶装置302以外の記憶装置である。なお、これらの記憶装置は、各種データを格納可能な任意の電子部品を意味するものとし、半導体のメモリでもよい。半導体のメモリは、揮発性メモリ、不揮発性メモリのいずれでもよい。画像処理装置130において各種データを保存するための記憶装置は、主記憶装置302又は補助記憶装置303により実現されてもよく、プロセッサ301に内蔵される内蔵メモリにより実現されてもよい。 The main storage device 302 is a storage device that stores commands executed by the processor 301 and various data. Auxiliary storage device 303 is a storage device other than main storage device 302 . Note that these storage devices mean arbitrary electronic components capable of storing various data, and may be semiconductor memories. The semiconductor memory may be either volatile memory or non-volatile memory. A storage device for storing various data in the image processing apparatus 130 may be implemented by the main storage device 302 or the auxiliary storage device 303 , or may be implemented by an internal memory built into the processor 301 .
 また、1つの主記憶装置302に対して、複数のプロセッサ301が接続(結合)されてもよいし、単数のプロセッサ301が接続されてもよい。あるいは、1つのプロセッサ301に対して、複数の主記憶装置302が接続(結合)されてもよい。画像処理装置130が、少なくとも1つの主記憶装置302と、この少なくとも1つの主記憶装置302に接続(結合)される複数のプロセッサ301とで構成される場合、複数のプロセッサ301のうち少なくとも1つのプロセッサが、少なくとも1つの主記憶装置302に接続(結合)される構成を含んでもよい。また、複数台の画像処理装置130に含まれる主記憶装置302とプロセッサ301とによって、この構成が実現されてもよい。さらに、主記憶装置302がプロセッサと一体になっている構成(例えば、L1キャッシュ、L2キャッシュを含むキャッシュメモリ)を含んでもよい。 Also, a plurality of processors 301 may be connected (coupled) to one main storage device 302, or a single processor 301 may be connected. Alternatively, a plurality of main storage devices 302 may be connected (coupled) to one processor 301 . When the image processing apparatus 130 is composed of at least one main storage device 302 and a plurality of processors 301 connected (coupled) to this at least one main storage device 302, at least one of the plurality of processors 301 The processor may include a configuration that is connected (coupled) to at least one main memory device 302 . Also, this configuration may be realized by the main storage device 302 and the processor 301 included in the plurality of image processing apparatuses 130 . Furthermore, a configuration in which the main memory device 302 is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.
 ネットワークインタフェース304は、無線又は有線により、通信ネットワーク310に接続するためのインタフェースである。ネットワークインタフェース304には、既存の通信規格に適合したもの等、適切なインタフェースが用いられてよい。ネットワークインタフェース304により、通信ネットワーク310を介して接続されたロボット110やその他の外部装置320と各種データのやり取りが行われてもよい。なお、通信ネットワーク310は、WAN(Wide Area Network)、LAN(Local Area Network)、PAN(Personal Area Network)等のいずれか又はそれらの組み合わせであってもよく、コンピュータとロボット110やその他の外部装置320との間で情報のやり取りが行われるものであればよい。WANの一例としてインタネット等があり、LANの一例としてIEEE802.11やイーサネット(登録商標)等があり、PANの一例としてBluetooth(登録商標)やNFC(Near Field Communication)等がある。 The network interface 304 is an interface for connecting to the communication network 310 wirelessly or by wire. Any suitable interface, such as one conforming to existing communication standards, may be used for network interface 304 . The network interface 304 may exchange various data with the robot 110 and other external devices 320 connected via the communication network 310 . In addition, the communication network 310 may be one or a combination of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), etc., and the computer, robot 110, and other external devices 320 may be used as long as information is exchanged. Examples of WANs include the Internet, examples of LANs include IEEE 802.11 and Ethernet (registered trademark), and examples of PANs include Bluetooth (registered trademark) and NFC (Near Field Communication).
 デバイスインタフェース305は、外部装置330と直接接続するUSB等のインタフェースである。 The device interface 305 is an interface such as USB that directly connects to the external device 330 .
 外部装置330はコンピュータと接続されている装置である。外部装置330は、一例として、入力装置であってもよい。入力装置は、例えば、カメラ(本実施形態のRGBカメラ121、深度カメラ122を含む)、マイクロフォン、モーションキャプチャ、各種センサ、キーボード、マウス、タッチパネル等のデバイスであり、取得した情報をコンピュータに与える。また、パーソナルコンピュータ、タブレット端末、スマートフォン等の入力部とメモリとプロセッサとを備えるデバイス等であってもよい。 The external device 330 is a device connected to the computer. The external device 330 may be an input device, as an example. The input device is, for example, a device such as a camera (including the RGB camera 121 and depth camera 122 of this embodiment), microphone, motion capture, various sensors, keyboard, mouse, touch panel, etc., and provides acquired information to the computer. Alternatively, it may be a device such as a personal computer, a tablet terminal, a smartphone, or the like, which includes an input unit, a memory, and a processor.
 また、外部装置330は、一例として、出力装置であってもよい。出力装置は、例えば、LCD(Liquid Crystal Display)、有機EL(Electro Luminescence)パネル等の表示装置であってもよいし、音声等を出力するスピーカ等であってもよい。また、パーソナルコンピュータ、タブレット端末、スマートフォン等の、出力部とメモリとプロセッサとを備えるデバイスであってもよい。 Also, the external device 330 may be an output device, for example. The output device may be, for example, a display device such as an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) panel, or a speaker or the like for outputting sound. Alternatively, a device such as a personal computer, a tablet terminal, a smartphone, or the like, which includes an output unit, a memory, and a processor may be used.
 また、外部装置330は、記憶装置(メモリ)であってもよい。例えば、外部装置330はネットワークストレージ等であってもよく、外部装置330はHDD等のストレージであってもよい。 Also, the external device 330 may be a storage device (memory). For example, the external device 330 may be a network storage or the like, and the external device 330 may be a storage such as an HDD.
 また、外部装置330は、画像処理装置130の構成要素の一部の機能を有する装置でもよい。つまり、コンピュータは、外部装置330の処理結果の一部又は全部を送信又は受信してもよい。 Also, the external device 330 may be a device having the functions of some of the components of the image processing device 130 . That is, the computer may transmit or receive part or all of the processing results of the external device 330 .
 <訓練フェーズにおける画像処理装置の機能構成の詳細>
 次に、訓練フェーズにおける画像処理装置230の機能構成の詳細について説明する。図4は、訓練フェーズにおける画像処理装置の機能構成の一例を示す第1の図である。上述したように、画像処理装置230には画像処理プログラムがインストールされており、訓練フェーズでは、当該プログラムが実行されることで、図4に示すように、
・3次元情報取得部410、
・マスク領域特定部420、
・RGB画像取得部430、
・マスク部440、
・訓練用データ生成部450、
・訓練部460、
として機能する。
<Details of the functional configuration of the image processing device in the training phase>
Next, the details of the functional configuration of the image processing device 230 in the training phase will be described. FIG. 4 is a first diagram showing an example of the functional configuration of the image processing device in the training phase. As described above, the image processing program is installed in the image processing device 230, and in the training phase, by executing the program, as shown in FIG.
- Three-dimensional information acquisition unit 410,
- Mask region identification unit 420,
- RGB image acquisition unit 430,
a mask unit 440,
- Training data generation unit 450,
a training unit 460,
function as
 3次元情報取得部410は、取得データ格納部220より読み出した3次元情報を取得し、マスク領域特定部420に通知する。 The three-dimensional information acquisition unit 410 acquires the three-dimensional information read from the acquired data storage unit 220 and notifies the mask area identification unit 420 of it.
 マスク領域特定部420は、予め「処理範囲情報」と「作業ルール情報」とを受け付けておく。処理範囲情報とは、RGB画像において指定される範囲であり、例えば、パレット140上であって、かつ、段ボールが載置されうる所定の空間に対応する範囲等を指す。また、マスク領域特定部420が受け付ける作業ルール情報とは、マスク領域に含めるべき画素を特定する際に用いる情報である。本実施形態の場合、マスク領域特定部420は作業ルール情報として、“段ボールを上から順番に1つずつピックアップする”を受け付ける。 The mask area specifying unit 420 accepts "processing range information" and "work rule information" in advance. The processing range information is a range specified in the RGB image, and indicates, for example, a range on the pallet 140 and corresponding to a predetermined space in which cardboard boxes can be placed. Also, the work rule information received by the mask region specifying unit 420 is information used when specifying pixels to be included in the mask region. In the case of this embodiment, the mask area specifying unit 420 receives "pick up cardboard boxes one by one in order from the top" as the work rule information.
 また、マスク領域特定部420は、処理範囲情報に基づいて、RGB画像において範囲を指定する。また、マスク領域特定部420は、3次元情報取得部410から通知された3次元情報のうち、作業ルール情報に応じた3次元情報(本実施形態では、最も高い段ボールの上面の3次元情報)を抽出する。また、マスク領域特定部420は、処理範囲情報に基づいて指定された範囲のRGB画像において、抽出した3次元情報が対応付けられた画素を、「非マスク領域」に含めるべき画素(所定の条件を満たす領域に含まれる画素)として特定する。 Also, the mask region specifying unit 420 specifies a range in the RGB image based on the processing range information. In addition, the mask area specifying unit 420 obtains three-dimensional information corresponding to the work rule information (three-dimensional information of the top surface of the highest corrugated cardboard in this embodiment) among the three-dimensional information notified from the three-dimensional information acquisition unit 410. to extract In addition, the mask region specifying unit 420 selects pixels (predetermined condition are identified as pixels included in an area that satisfies
 また、非マスク領域を特定する処理の別例では、マスク領域特定部420は、処理範囲情報に基づいて指定される範囲のRGB画像に含まれる画素のうち、作業ルール情報に基づいて定まる条件を満たす3次元情報が対応付けられる画素を、「非マスク領域」に含めるべき画素として特定する。例えば、本実施形態において、マスク領域特定部420は、処理範囲情報に基づいて指定される範囲のRGB画像に含まれる画素のうち、高さ(z座標)が上位Xパーセンタイル(例えば、X=99)以上の3次元情報に対応付けられる画素を特定する。 In another example of the process of identifying the non-masked area, the masked area identifying unit 420 selects the conditions determined based on the work rule information among the pixels included in the RGB image within the range designated based on the processing range information. A pixel associated with satisfying three-dimensional information is specified as a pixel to be included in the “non-mask region”. For example, in the present embodiment, the mask region specifying unit 420 determines that the height (z coordinate) of the pixels included in the RGB image in the range specified based on the processing range information is the top X percentile (for example, X=99). ) Identify pixels associated with the above three-dimensional information.
 つまり、マスク領域特定部420は、RGB画像において、処理範囲情報及び作業ルール情報に応じた3次元情報に基づいて、非マスク領域を特定する。 That is, the masked area identifying unit 420 identifies the non-masked area in the RGB image based on the three-dimensional information corresponding to the processing range information and the work rule information.
 更に、マスク領域特定部420は、非マスク領域以外の領域を、「マスク領域」として特定し、マスク部440に通知する。 Further, the masked area identifying unit 420 identifies areas other than the non-masked areas as "masked areas" and notifies the masking unit 440 of them.
 RGB画像取得部430は、取得データ格納部220より読み出したRGB画像を取得し、マスク領域特定部420及びマスク部440に通知する。 The RGB image acquisition unit 430 acquires the RGB image read from the acquired data storage unit 220 and notifies the mask area identification unit 420 and the mask unit 440 of it.
 マスク部440は、RGB画像取得部430より通知されたRGB画像のうち、マスク領域特定部420より通知された「マスク領域」に含まれる画素に対応付けられた色情報に対してマスク処理を行い、「マスク画像」を生成する。 The masking unit 440 performs mask processing on the color information associated with the pixels included in the “masked area” notified by the masked area identifying unit 420 in the RGB image notified by the RGB image acquiring unit 430 . , to generate a "mask image".
 なお、マスク部440により、マスク領域に含まれる画素に対応付けられた色情報に対して行われるマスク処理には、
・RGB画像のうち、非マスク領域に含まれる画素以外の画素(マスク領域に含まれる画素)に対応付けられた色情報(R値、G値、B値)を、所定の単一の色情報(例えば、黒色あるいは灰色等を示すR値、G値、B値)に変換すること、あるいは、
・RGB画像のうち、非マスク領域以外の画像(マスク領域の画像)を、所定の複数の色情報を含む画像(例えば、グラデーション画像)に加工すること、あるいは、
・RGB画像のうち、非マスク領域以外の画像(マスク領域の画像)に対してぼかし処理等のフィルタリング処理を施し、マスク領域の画像を非マスク領域の画像よりもぼかした画像に加工すること、
等が含まれる。
The mask processing performed by the mask unit 440 on the color information associated with the pixels included in the mask area includes:
The color information (R value, G value, B value) associated with pixels other than the pixels included in the non-masked area (pixels included in the masked area) in the RGB image is converted into a single predetermined color information. (For example, R value, G value, B value indicating black or gray, etc.), or
-Processing an image (masked area image) of an RGB image other than the non-masked area into an image (for example, a gradation image) containing a plurality of predetermined color information, or
- Filtering processing such as blurring is performed on an image (image of the masked region) other than the non-masked region of the RGB image, and the image of the masked region is processed into an image that is more blurred than the image of the non-masked region;
etc. are included.
 更に、マスク部440は、マスク画像を訓練用データ生成部450に通知する。 Furthermore, the mask unit 440 notifies the training data generation unit 450 of the mask image.
 訓練用データ生成部450は、マスク部440より通知されたマスク画像を、正解データと対応付けて、訓練用データとして、訓練用データ格納部470に格納する。なお、ここでいう正解データとは、例えば、マスク画像を用いて物体認識処理を行った場合に、適切に認識された画素単位の認識結果を指す。なお、画素単位の認識結果は、例えば、認識されるべき物体(インスタンス)ごとに異なるラベルが画素単位に付与されたアノテーションデータであってもよい。 The training data generation unit 450 stores the mask image notified from the mask unit 440 in the training data storage unit 470 as training data in association with the correct data. Note that the correct data here refers to a pixel-by-pixel recognition result that is appropriately recognized when object recognition processing is performed using a mask image, for example. Note that the pixel-by-pixel recognition result may be, for example, annotation data in which a different label is assigned to each pixel of an object (instance) to be recognized.
 訓練部460は、訓練用データ格納部470に格納された訓練用データを用いて、物体認識部を訓練し、訓練済みの物体認識部を生成する。また、訓練部460は、生成した訓練済みの物体認識部を、デパレタイズフェーズにおいて動作する画像処理装置130に設定する。 The training unit 460 uses the training data stored in the training data storage unit 470 to train the object recognition unit and generate a trained object recognition unit. Further, the training unit 460 sets the generated trained object recognition unit to the image processing device 130 that operates in the depalletizing phase.
 <訓練部の機能構成>
 次に、訓練部460の機能構成の詳細について説明する。図5は、訓練部の機能構成の一例を示す第1の図である。
<Functional structure of the training department>
Next, the details of the functional configuration of the training unit 460 will be described. FIG. 5 is a first diagram showing an example of the functional configuration of the training section.
 図5に示すように、訓練部460は、物体認識部510と比較/変更部520とを有し、訓練用データ500を用いて、物体認識部510を訓練する。 As shown in FIG. 5, the training unit 460 has an object recognition unit 510 and a comparison/modification unit 520, and uses the training data 500 to train the object recognition unit 510.
 訓練用データ500には、情報の項目として、“マスク画像”、“正解データ”が含まれる。“マスク画像”には、マスク画像が格納される。また、“正解データ”には、対応する“マスク画像”に基づいて適切に認識されるべき画素単位の認識結果が格納される。 The training data 500 includes "mask image" and "correct data" as information items. A mask image is stored in "mask image". Further, the "correct data" stores the pixel-by-pixel recognition result that should be appropriately recognized based on the corresponding "mask image".
 物体認識部510は、DNNにより構成されており、訓練用データ500の“マスク画像”が入力されることで、出力データを出力する。この出力データは、例えば、物体ごとに異なるラベルそれぞれの確率が画素単位に付与されたセグメンテーションデータであってもよい。 The object recognition unit 510 is configured by a DNN, and outputs output data by inputting the "mask image" of the training data 500. This output data may be, for example, segmentation data in which the probability of each label, which is different for each object, is assigned to each pixel.
 比較/変更部520は、物体認識部510より出力された出力データを、訓練用データ500の“正解データ”と比較し、比較結果に基づいて、物体認識部510のモデルパラメータを更新する。これにより、訓練部460では、訓練済みの物体認識部510を生成する。 The comparison/change unit 520 compares the output data output from the object recognition unit 510 with the "correct data" of the training data 500, and updates the model parameters of the object recognition unit 510 based on the comparison result. As a result, the training unit 460 generates a trained object recognition unit 510 .
 <デパレタイズフェーズにおける画像処理装置の機能構成の詳細>
 次に、デパレタイズフェーズにおける画像処理装置130の機能構成の詳細について説明する。図6は、デパレタイズフェーズにおける画像処理装置の機能構成の一例を示す第1の図である。上述したように、画像処理装置130には画像処理プログラムがインストールされており、デパレタイズフェーズでは、当該プログラムが実行されることで、図6に示すように、
・3次元情報取得部410、
・マスク領域特定部420、
・RGB画像取得部430、
・マスク部440、
・訓練済み物体認識部650、
・ロボット制御部660、
として機能する。
<Details of the functional configuration of the image processing device in the depalletizing phase>
Next, the details of the functional configuration of the image processing device 130 in the depalletizing phase will be described. FIG. 6 is a first diagram showing an example of the functional configuration of the image processing apparatus in the depalletizing phase. As described above, the image processing program is installed in the image processing device 130, and in the depalletizing phase, by executing the program, as shown in FIG.
- Three-dimensional information acquisition unit 410,
- Mask region identification unit 420,
- RGB image acquisition unit 430,
a mask unit 440,
a trained object recognizer 650;
- Robot control unit 660,
function as
 このうち、3次元情報取得部410、マスク領域特定部420、RGB画像取得部430、マスク部440は、図4を用いて説明済みであるため、ここでは説明を省略する。 Of these, the three-dimensional information acquisition unit 410, the mask area identification unit 420, the RGB image acquisition unit 430, and the mask unit 440 have already been explained using FIG. 4, so the explanation is omitted here.
 訓練済み物体認識部650は、訓練部460により訓練用データ500を用いて訓練された訓練済みの物体認識部である。訓練済み物体認識部650は、マスク部440より通知されたマスク画像が入力されることで物体認識処理を行い、認識結果を出力する。 A trained object recognition unit 650 is a trained object recognition unit that has been trained by the training unit 460 using the training data 500 . The trained object recognition unit 650 receives the mask image notified from the mask unit 440, performs object recognition processing, and outputs a recognition result.
 ロボット制御部660は、訓練済み物体認識部650により出力された認識結果に基づいて、ロボット110の動作を制御するための制御指令を生成し、ロボット110に送信する。なお、ロボット制御部660では、認識結果に基づいて制御指令を生成するにあたり、作業ルール情報を参照し、作業ルール情報に従った制御指令を生成する。 The robot control unit 660 generates a control command for controlling the motion of the robot 110 based on the recognition result output by the trained object recognition unit 650 and transmits it to the robot 110 . In generating the control command based on the recognition result, the robot control unit 660 refers to the work rule information and generates the control command according to the work rule information.
 <物体認識処理の具体例>
 次に、デパレタイズフェーズにおける画像処理装置130による物体認識処理の具体例について説明する。図7は、物体認識処理の具体例を示す図である。
<Specific example of object recognition processing>
Next, a specific example of object recognition processing by the image processing device 130 in the depalletizing phase will be described. FIG. 7 is a diagram showing a specific example of object recognition processing.
 図7において、符号710は、RGB画像取得部430により取得されたRGB画像を示している。また、図7において、符号711は、マスク領域特定部420が予め受け付けた処理範囲情報を、符号710に示すRGB画像上に示したものである。符号711に示すように、処理範囲情報は、パレット140上であって、かつ、段ボールがある領域を指している。 In FIG. 7, reference numeral 710 indicates the RGB image acquired by the RGB image acquisition section 430. Further, in FIG. 7, reference numeral 711 indicates processing range information received in advance by the mask region specifying unit 420 on an RGB image indicated by reference numeral 710 . As indicated by reference numeral 711, the processing range information indicates the area on the pallet 140 where the cardboard is located.
 また、図7において、符号720は、符号710に示すRGB画像において、マスク処理が行われることで生成されたマスク画像である。符号720に示すマスク画像において、符号721は、マスク領域特定部420により特定された非マスク領域を表しており、符号722は、マスク領域特定部420により特定されたマスク領域を表している。 Also, in FIG. 7, reference numeral 720 denotes a mask image generated by performing mask processing on the RGB image indicated by reference numeral 710 . In the mask image indicated by reference numeral 720, reference numeral 721 denotes the non-masked area identified by the masked area identification section 420, and reference numeral 722 indicates the masked area identified by the masked area identification section 420. FIG.
 なお、符号721に示す非マスク領域は、高さが最も高い段ボールの上面の領域である。符号721の例では、2つの段ボールの高さが互いに概ね同じであるため、当該2つの段ボールの上面の領域が非マスク領域として特定されている。 It should be noted that the non-masked area indicated by reference numeral 721 is the top surface area of the cardboard with the highest height. In the example of reference numeral 721, since the heights of the two cardboards are approximately the same, the upper surface area of the two cardboards is specified as the non-masked area.
 また、符号722に示すマスク領域は、符号710に示すRGB画像の各画素のうち、非マスク領域に含まれる画素以外の画素(マスク領域に含まれる画素)に対応付けられた色情報が、黒色を示す色情報に変換されたことを示している。 Further, in the mask area indicated by reference numeral 722, among the pixels of the RGB image indicated by reference numeral 710, the color information associated with the pixels other than the pixels included in the non-mask area (pixels included in the mask area) is black. , indicating that it has been converted into color information indicating
 また、図7において、符号730は、符号720に示すマスク画像を用いて、物体認識処理が行われることで、符号731に示す段ボールと、符号732に示す段ボールとがそれぞれ別物体(別インスタンス)として認識された様子を示している。 In FIG. 7, reference numeral 730 denotes object recognition processing using the mask image indicated by reference numeral 720, whereby the cardboard indicated by reference numeral 731 and the cardboard indicated by reference numeral 732 are each different objects (another instance). It shows how it is recognized as
 <ロボット制御システム全体の処理の流れ>
 次に、ロボット制御システム100全体の処理の流れについて説明する。図8は、ロボット制御システム全体の処理の流れを示す第1のフローチャートである。
<Processing flow of the entire robot control system>
Next, the flow of processing of the entire robot control system 100 will be described. FIG. 8 is a first flow chart showing the flow of processing of the entire robot control system.
 ステップS801において、ロボット制御システム100は、訓練フェーズに移行し、訓練処理を実行する。なお、訓練処理のフローチャートの詳細は、後述する。 In step S801, the robot control system 100 shifts to the training phase and executes training processing. Details of the flowchart of the training process will be described later.
 ステップS802において、ロボット制御システム100は、デパレタイズフェーズに移行し、デパレタイズ処理を実行する。なお、デパレタイズ処理のフローチャートの詳細は、後述する。 At step S802, the robot control system 100 shifts to the depalletizing phase and executes the depalletizing process. Details of the flowchart of the depalletizing process will be described later.
 <訓練処理の流れ>
 次に、訓練処理(図8のステップS801)のフローチャートの詳細について説明する。図9は、訓練処理の流れを示す第1のフローチャートである。
<Flow of training process>
Next, the details of the flowchart of the training process (step S801 in FIG. 8) will be described. FIG. 9 is a first flowchart showing the flow of training processing.
 ステップS901において、画像処理装置230は、取得データ格納部220から、RGBカメラ121により撮影されたRGB画像又はCGシミュレータ210により生成された仮想のRGB画像を取得する。また、画像処理装置230は、取得データ格納部220から、深度カメラ122により撮影された3次元情報又はCGシミュレータ210により生成された仮想の3次元情報を取得する。 In step S<b>901 , the image processing device 230 acquires the RGB image captured by the RGB camera 121 or the virtual RGB image generated by the CG simulator 210 from the acquired data storage unit 220 . The image processing device 230 also acquires 3D information captured by the depth camera 122 or virtual 3D information generated by the CG simulator 210 from the acquired data storage unit 220 .
 ステップS902において、画像処理装置230は、処理範囲情報及び作業ルール情報のもと、RGB画像において、3次元情報に基づいて、非マスク領域を特定する。 In step S902, the image processing device 230 identifies a non-masked area based on three-dimensional information in the RGB image based on the processing range information and work rule information.
 ステップS903において、画像処理装置230は、RGB画像において、非マスク領域に含まれる画素以外の画素に対応付けられた色情報に対してマスク処理を行い、マスク画像を生成する。 In step S903, the image processing device 230 performs mask processing on the color information associated with the pixels other than the pixels included in the non-mask area in the RGB image to generate a mask image.
 ステップS904において、画像処理装置230は、正解データを取得する。 In step S904, the image processing device 230 acquires correct data.
 ステップS905において、画像処理装置230は、マスク画像と、正解データとを対応付けて、訓練用データを生成する。 In step S905, the image processing device 230 generates training data by associating the mask image with the correct data.
 ステップS906において、画像処理装置230は、訓練用データを用いて、物体認識部を訓練する。 In step S906, the image processing device 230 uses the training data to train the object recognition unit.
 ステップS907において、画像処理装置230は、訓練の終了条件が成立したか否かを判定する。ステップS907において、訓練の終了条件が成立していないと判定した場合には(ステップS907においてNOの場合には)、ステップS901に戻る。 In step S907, the image processing device 230 determines whether or not the training end condition is satisfied. If it is determined in step S907 that the training end condition is not met (NO in step S907), the process returns to step S901.
 一方、ステップS907において、訓練の終了条件が成立したと判定した場合には(ステップS907においてYESの場合には)、ステップS908に進む。 On the other hand, if it is determined in step S907 that the training end condition is satisfied (if YES in step S907), the process proceeds to step S908.
 ステップS908において、画像処理装置230は、訓練済みの物体認識部を設定し、訓練処理を終了する。 In step S908, the image processing device 230 sets the trained object recognition unit and ends the training process.
 <デパレタイズ処理の流れ>
 次に、デパレタイズ処理(図8のステップS802)のフローチャートの詳細について説明する。図10は、デパレタイズ処理の流れを示す第1のフローチャートである。
<Flow of depalletizing process>
Next, details of the flowchart of the depalletizing process (step S802 in FIG. 8) will be described. FIG. 10 is a first flowchart showing the flow of depalletizing processing.
 ステップS1001において、画像処理装置130は、RGBカメラ121により撮影されたRGB画像及び深度カメラ122により撮影された3次元情報を取得する。 In step S<b>1001 , the image processing device 130 acquires the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 .
 ステップS1002において、画像処理装置130は、処理範囲情報により指定された範囲のRGB画像において、作業ルール情報に応じた3次元情報に基づいて、非マスク領域を特定する。 In step S1002, the image processing device 130 identifies a non-masked area in the RGB image within the range specified by the processing range information, based on the three-dimensional information corresponding to the work rule information.
 ステップS1003において、画像処理装置130は、RGB画像において、非マスク領域に含まれる画素以外の画素に対応付けられた色情報に対してマスク処理を行い、マスク画像を生成する。 In step S1003, the image processing apparatus 130 performs mask processing on the color information associated with pixels other than the pixels included in the non-mask area in the RGB image to generate a mask image.
 ステップS1004において、画像処理装置130は、生成したマスク画像を訓練済みの物体認識部に入力することで、物体認識処理を行う。 In step S1004, the image processing device 130 performs object recognition processing by inputting the generated mask image to the trained object recognition unit.
 ステップS1005において、画像処理装置130は、物体認識処理の結果に基づいて、デパレタイズ対象の段ボールを特定する。例えば、物体認識処理の結果、複数個の段ボールが認識された場合、画像処理装置130では、当該複数個の段ボールをデパレタイズ対象として特定する。 In step S1005, the image processing device 130 identifies cardboard boxes to be depalletized based on the results of the object recognition processing. For example, when a plurality of cardboard boxes are recognized as a result of the object recognition process, the image processing device 130 identifies the plurality of cardboard boxes as objects to be depalletized.
 ステップS1006において、画像処理装置130は、マスク画像において、特定したデパレタイズ対象の段ボールを、特定した段ボールについての3次元情報及び作業ルール情報に従ってデパレタイズするための制御指令を生成し、ロボット110に出力する。例えば、複数個の段ボールをデパレタイズ対象として特定した場合、画像処理装置130では、作業ルール情報に従ってデパレタイズする順序を決定し、デパレタイズする段ボールの3次元情報に応じた制御指令を生成する。 In step S<b>1006 , the image processing device 130 generates a control command for depalletizing the identified cardboard to be depalletized in the mask image according to the three-dimensional information and work rule information about the identified cardboard, and outputs it to the robot 110 . . For example, when a plurality of cardboard boxes are specified as objects to be depalletized, the image processing device 130 determines the depalletizing order according to the work rule information, and generates a control command according to the three-dimensional information of the cardboard boxes to be depalletized.
 ステップS1007において、画像処理装置130は、次のデパレタイズ対象の段ボールがあるか否かを判定する。具体的には、画像処理装置130では、ステップS1005において特定したデパレタイズ対象の段ボールが全てデパレタイズされた後の次のデパレタイズ対象の段ボールがパレット140上にあるか否かを判定する。 In step S1007, the image processing apparatus 130 determines whether or not there is another cardboard box to be depalletized. Specifically, the image processing apparatus 130 determines whether or not the next cardboard to be depalletized is on the pallet 140 after all the cardboard to be depalletized identified in step S1005 has been depalletized.
 ステップS1007において、次のデパレタイズ対象の段ボールがあると判定された場合には(ステップS1007においてYESの場合には)、ステップS1008に進む。 If it is determined in step S1007 that there is another cardboard to be depalletized (if YES in step S1007), the process proceeds to step S1008.
 ステップS1008において、画像処理装置130は、物体認識処理を行うタイミングか否かを判定する。ステップS1008において、物体認識処理を行うタイミングでないと判定した場合には(ステップS1008においてNOの場合には)、物体認識処理を行うタイミングとなるまで待機する。 In step S1008, the image processing device 130 determines whether or not it is time to perform object recognition processing. If it is determined in step S1008 that it is not the time to perform the object recognition processing (NO in step S1008), the process waits until the timing to perform the object recognition processing.
 一方、ステップS1008において、物体認識処理を行うタイミングであると判定した場合には(ステップS1008においてYESの場合には)、ステップS1001に戻る。 On the other hand, if it is determined in step S1008 that it is time to perform object recognition processing (if YES in step S1008), the process returns to step S1001.
 一方、ステップS1007において、次のデパレタイズ対象の段ボールがないと判定された場合には(ステップS1007においてNOの場合には)、デパレタイズ処理を終了する。 On the other hand, if it is determined in step S1007 that there is no cardboard to be depalletized (NO in step S1007), the depalletizing process ends.
 <デパレタイズ処理の具体例>
 次に、ロボット制御システム100によるデパレタイズ処理の具体例について説明する。図11は、デパレタイズ処理の具体例を示す図である。図11(a)に示すように、パレット140に4つの段ボールが積まれていた場合、ロボット制御システム100では、高さが最も高い段ボールとして、段ボール1101を認識する。図11(b)は、認識した段ボール1101を、ピックアップした様子を示している。
<Specific example of depalletizing process>
Next, a specific example of depalletizing processing by the robot control system 100 will be described. FIG. 11 is a diagram showing a specific example of the depalletizing process. As shown in FIG. 11A, when four cardboards are stacked on the pallet 140, the robot control system 100 recognizes the cardboard 1101 as the tallest cardboard. FIG. 11B shows how the recognized cardboard 1101 is picked up.
 続いて、図11(c)に示すように、段ボール1101をピックアップした後のパレット140には3つの段ボールが積まれており、ロボット制御システム100では、高さが最も高い段ボールとして、段ボール1102を認識する。図11(d)は、認識した段ボール1102を、ピックアップした様子を示している。 Subsequently, as shown in FIG. 11C, three cardboard boxes are stacked on the pallet 140 after picking up the cardboard box 1101, and the robot control system 100 selects the cardboard box 1102 as the cardboard with the highest height. recognize. FIG. 11(d) shows how the recognized cardboard 1102 is picked up.
 続いて、図11(e)に示すように、段ボール1102をピックアップした後のパレット140には2つの段ボールが載置されており、ロボット制御システム100では、高さが最も高い段ボールとして、段ボール1103を認識する。図11(f)は、認識した段ボール1103を、ピックアップした様子を示している。 Subsequently, as shown in FIG. 11E, two cardboard boxes are placed on the pallet 140 after the cardboard box 1102 has been picked up. to recognize FIG. 11(f) shows how the recognized cardboard 1103 is picked up.
 続いて、図11(g)に示すように、段ボール1103をピックアップした後のパレット140には1つの段ボールが載置されており、ロボット制御システム100では、高さが最も高い段ボールとして、段ボール1104を認識する。図11(h)は、認識した段ボール1104を、ピックアップした様子を示している。 Subsequently, as shown in FIG. 11G, one cardboard is placed on the pallet 140 after picking up the cardboard 1103, and the robot control system 100 selects the cardboard 1104 as the cardboard with the highest height. to recognize FIG. 11(h) shows how the recognized cardboard 1104 is picked up.
 このように、ロボット制御システム100によれば、“段ボールを上から順番に1つずつピックアップする”という作業ルールに従ったデパレタイズを実現することができる。 Thus, according to the robot control system 100, it is possible to realize depalletizing according to the work rule of "pick up the cardboard boxes one by one in order from the top".
 <まとめ>
 以上の説明から明らかなように、第1の実施形態に係るロボット制御システム100は、
・パレットに1以上の段ボールが積まれた所定の空間について、1以上の段ボールを撮影することで当該所定の空間のRGB画像と当該所定の空間の3次元情報とを取得する。
・取得した3次元情報に基づいて、RGB画像の一部の色情報に対してマスク処理を行う。
・マスク処理後のマスク画像を用いて、物体認識処理を行う。
<Summary>
As is clear from the above description, the robot control system 100 according to the first embodiment includes:
- For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes.
- Based on the acquired three-dimensional information, mask processing is performed on part of the color information of the RGB image.
- Object recognition processing is performed using the mask image after mask processing.
 これにより、第1の実施形態に係るロボット制御システム100によれば、物体認識処理における認識精度を向上させることができる。 Thus, according to the robot control system 100 according to the first embodiment, it is possible to improve recognition accuracy in object recognition processing.
 [第2の実施形態]
 上記第1の実施形態では、マスク部440がマスク領域に含まれる画素に対応付けられた色情報に対してマスク処理を行う場合について説明した。しかしながら、マスク部440がマスク処理を行う対象は、色情報に限定されず、色情報と3次元情報とに対してマスク処理を行ってもよい。以下、第2の実施形態について、上記第1の実施形態との相違点を中心に説明する。
[Second embodiment]
In the first embodiment, the case where the mask unit 440 performs mask processing on color information associated with pixels included in the mask region has been described. However, the target of the masking process performed by the masking unit 440 is not limited to the color information, and the masking process may be performed on the color information and the three-dimensional information. The second embodiment will be described below, focusing on differences from the first embodiment.
 <訓練フェーズにおける画像処理装置の機能構成の詳細>
 はじめに、訓練フェーズにおける画像処理装置230の機能構成の詳細について説明する。図12は、訓練フェーズにおける画像処理装置の機能構成の一例を示す第2の図である。上記第1の実施形態同様、画像処理装置230には画像処理プログラムがインストールされており、当該プログラムが実行されることで、訓練フェーズにおいて、上記第1の実施形態と同様の各機能が実現される。
<Details of the functional configuration of the image processing device in the training phase>
First, the details of the functional configuration of the image processing device 230 in the training phase will be described. FIG. 12 is a second diagram showing an example of the functional configuration of the image processing device in the training phase. As in the first embodiment, an image processing program is installed in the image processing device 230, and by executing the program, the same functions as in the first embodiment are realized in the training phase. be.
 なお、上記第1の実施形態において説明した図4との相違点は、図12の場合、3次元情報取得部410が、3次元情報をマスク領域特定部420とマスク部440とに通知する点である。 4 described in the first embodiment is that, in the case of FIG. is.
 また、上記第1の実施形態において説明した図4との相違点は、マスク部440が、マスク領域に含まれる画素に対応付けられた色情報と3次元情報とに対してマスク処理を行い、マスク画像とマスク3次元情報とを生成する点である。なお、3次元情報に対して行われるマスク処理には、マスク領域に含まれる画素に対応付けられた3次元情報(x座標、y座標、z座標)を削除すること、等が含まれる。 4 described in the first embodiment is that the mask unit 440 performs mask processing on color information and three-dimensional information associated with pixels included in the mask region, The point is to generate a mask image and mask three-dimensional information. Note that the mask processing performed on the three-dimensional information includes deleting the three-dimensional information (x-coordinate, y-coordinate, z-coordinate) associated with the pixels included in the mask area.
 また、上記第1の実施形態において説明した図4との相違点は、訓練用データ生成部450が、マスク部440より通知されたマスク画像及びマスク3次元情報を、正解データと対応付けて訓練用データを生成し、訓練用データ格納部470に格納する点である。 4 described in the first embodiment is that the training data generation unit 450 associates the mask image and mask three-dimensional information notified from the mask unit 440 with the correct data for training. The difference is that training data is generated and stored in the training data storage unit 470 .
 <訓練部の機能構成>
 次に、第2の実施形態における訓練部460の機能構成の詳細について説明する。図13は、訓練部の機能構成の一例を示す第2の図である。上記第1の実施形態において説明した図5との相違点は、訓練用データ1300が、情報の項目として“マスク3次元情報”を含み、マスク3次元情報が格納されている点である。また、上記第1の実施形態において説明した図5との相違点は、物体認識部510に対して、訓練用データ1300の“マスク画像”及び“マスク3次元情報”が入力される点である。
<Functional structure of the training department>
Next, the details of the functional configuration of the training unit 460 according to the second embodiment will be described. FIG. 13 is a second diagram illustrating an example of the functional configuration of the training unit; The difference from FIG. 5 described in the first embodiment is that the training data 1300 includes “three-dimensional mask information” as an information item, and three-dimensional mask information is stored. 5 described in the first embodiment is that the "mask image" and "mask three-dimensional information" of the training data 1300 are input to the object recognition unit 510. .
 <デパレタイズフェーズにおける画像処理装置の機能構成の詳細>
 次に、デパレタイズフェーズにおける画像処理装置130の機能構成の詳細について説明する。図14は、デパレタイズフェーズにおける画像処理装置の機能構成の一例を示す第2の図である。上記第1の実施形態同様、画像処理装置130には画像処理プログラムインストールされており、当該プログラムが実行されることで、デパレタイズフェーズにおいて、上記第1の実施形態と同様の各機能が実現される。
<Details of the functional configuration of the image processing device in the depalletizing phase>
Next, the details of the functional configuration of the image processing device 130 in the depalletizing phase will be described. FIG. 14 is a second diagram showing an example of the functional configuration of the image processing device in the depalletizing phase. As in the first embodiment, an image processing program is installed in the image processing apparatus 130, and by executing the program, the same functions as in the first embodiment are realized in the depalletizing phase. be.
 なお、上記第1の実施形態において説明した図6との相違点は、図14の場合、3次元情報取得部410が、3次元情報をマスク領域特定部420とマスク部440とに通知する点である。 The difference from FIG. 6 explained in the first embodiment is that in the case of FIG. is.
 また、上記第1の実施形態において説明した図6との相違点は、マスク部440が、マスク領域に含まれる画素に対応付けられた色情報と3次元情報とに対してマスク処理を行い、マスク画像とマスク3次元情報とを生成する点である。 6 described in the first embodiment is that the mask unit 440 performs mask processing on color information and three-dimensional information associated with pixels included in the mask region, The point is to generate a mask image and mask three-dimensional information.
 また、上記第1の実施形態において説明した図6との相違点は、訓練済み物体認識部650が、マスク部440より通知されたマスク画像及びマスク3次元情報に基づいて物体認識処理を行い、認識結果を出力する点である。 6 described in the first embodiment is that the trained object recognition unit 650 performs object recognition processing based on the mask image and mask three-dimensional information notified by the mask unit 440, This is the point of outputting the recognition result.
 <訓練処理の流れ>
 次に、第2の実施形態における訓練処理のフローチャートの詳細について説明する。図15は、訓練処理の流れを示す第2のフローチャートである。図9を用いて説明した第1のフローチャートとの相違点は、ステップS1501、S1502である。
<Flow of training process>
Next, the details of the flowchart of the training process in the second embodiment will be described. FIG. 15 is a second flowchart showing the flow of training processing. The difference from the first flowchart described using FIG. 9 is steps S1501 and S1502.
 ステップS1501において、画像処理装置230は、非マスク領域に含まれる画素以外の画素に対応付けられた色情報及び3次元情報に対してマスク処理を行い、マスク画像及びマスク3次元情報を生成する。 In step S1501, the image processing device 230 performs mask processing on color information and three-dimensional information associated with pixels other than pixels included in the non-mask area to generate a mask image and mask three-dimensional information.
 ステップS1502において、画像処理装置230は、マスク画像とマスク3次元情報とを、正解データと対応付けることで、訓練用データを生成する。 In step S1502, the image processing device 230 generates training data by associating the mask image and mask three-dimensional information with the correct data.
 <デパレタイズ処理の流れ>
 次に、第2の実施形態におけるデパレタイズ処理のフローチャートの詳細について説明する。図16は、デパレタイズ処理の流れを示す第2のフローチャートである。図10を用いて説明した第1のフローチャートとの相違点は、ステップS1601、S1602である。
<Flow of depalletizing process>
Next, the details of the flowchart of the depalletizing process in the second embodiment will be described. FIG. 16 is a second flowchart showing the flow of depalletizing processing. The difference from the first flowchart described using FIG. 10 is steps S1601 and S1602.
 ステップS1601において、画像処理装置130は、非マスク領域に含まれる画素以外の画素に対応付けられた色情報及び3次元情報に対してマスク処理を行い、マスク画像及びマスク3次元情報を生成する。 In step S1601, the image processing apparatus 130 performs mask processing on the color information and three-dimensional information associated with pixels other than the pixels included in the non-mask area, and generates a mask image and mask three-dimensional information.
 ステップS1602において、画像処理装置130は、生成したマスク画像及びマスク3次元情報を訓練済みの物体認識部に入力することで、物体認識処理を行う。 In step S1602, the image processing device 130 performs object recognition processing by inputting the generated mask image and mask three-dimensional information to a trained object recognition unit.
 <まとめ>
 以上の説明から明らかなように、第2の実施形態に係るロボット制御システム100は、
・パレットに1以上の段ボールが積まれた所定の空間について、1以上の段ボールを撮影することで当該所定の空間のRGB画像と当該所定の空間の3次元情報とを取得する。
・取得した3次元情報に基づいて、RGB画像の一部の色情報及び3次元情報に対してマスク処理を行う。
・マスク処理後のマスク画像及びマスク3次元情報を用いて、物体認識処理を行う。
<Summary>
As is clear from the above description, the robot control system 100 according to the second embodiment includes:
- For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes.
- Based on the acquired three-dimensional information, mask processing is performed on the color information and three-dimensional information of a part of the RGB image.
- Object recognition processing is performed using the mask image after mask processing and mask three-dimensional information.
 これにより、第2の実施形態に係るロボット制御システム100によれば、物体認識処理における認識精度を向上させることができる。 Thus, according to the robot control system 100 according to the second embodiment, it is possible to improve the recognition accuracy in the object recognition process.
 [第3の実施形態]
 上記第1の実施形態では、訓練処理(図8のステップS801)を行う際のRGBカメラ121の撮影条件(例えば、ホワイトバランス、露光、焦点等)について言及せず、適切に調整されているものとして説明した。
[Third embodiment]
In the above-described first embodiment, the photographing conditions (for example, white balance, exposure, focus, etc.) of the RGB camera 121 when performing the training process (step S801 in FIG. 8) are not mentioned, and are appropriately adjusted. explained as.
 これに対して、第3の実施形態では、訓練処理の前に撮影条件調整フェーズを設け、RGBカメラ121の撮影条件を調整する場合について説明する。なお、説明は、上記第1の実施形態との相違点を中心に行う。 On the other hand, in the third embodiment, a case will be described in which a shooting condition adjustment phase is provided before the training process and the shooting conditions of the RGB camera 121 are adjusted. Note that the description will focus on the differences from the first embodiment.
 <撮影条件調整フェーズにおける画像処理装置の機能構成>
 はじめに、撮影条件調整フェーズにおける画像処理装置130の機能構成の詳細について説明する。図17は、撮影条件調整フェーズにおける画像処理装置の機能構成の一例を示す図である。撮影条件調整フェーズにおいて画像処理装置130は、画像処理プログラムが実行されることで、図17に示すように、
・3次元情報取得部410、
・マスク領域特定部420、
・RGB画像取得部430、
・マスク部440、
・撮影条件調整部1250、
として機能する。
<Functional Configuration of Image Processing Apparatus in Shooting Condition Adjustment Phase>
First, the details of the functional configuration of the image processing device 130 in the imaging condition adjustment phase will be described. FIG. 17 is a diagram illustrating an example of the functional configuration of the image processing apparatus in the imaging condition adjustment phase. By executing the image processing program in the imaging condition adjustment phase, the image processing device 130 performs the following operations as shown in FIG.
- Three-dimensional information acquisition unit 410,
- Mask region identification unit 420,
- RGB image acquisition unit 430,
a mask unit 440,
- Imaging condition adjustment unit 1250,
function as
 このうち、3次元情報取得部410、マスク領域特定部420、RGB画像取得部430、マスク部440の機能は、既に説明済みであるため、ここでは説明を省略する。なお、マスク部440は、RGB画像のマスク領域に含まれる画素に対応づけられた色情報に対してマスク処理を行い、生成したマスク画像を撮影条件調整部1750に通知する。 Of these, the functions of the three-dimensional information acquisition unit 410, the mask area identification unit 420, the RGB image acquisition unit 430, and the mask unit 440 have already been explained, so explanations thereof will be omitted here. The mask unit 440 performs mask processing on color information associated with pixels included in the mask region of the RGB image, and notifies the shooting condition adjustment unit 1750 of the generated mask image.
 撮影条件調整部1750は、マスク部440より通知されたマスク画像に基づいて、撮影条件を調整する。撮影条件調整部1750が調整する撮影条件には、RGBカメラ121のホワイトバランス、露光、焦点等が含まれる。 The imaging condition adjustment unit 1750 adjusts the imaging conditions based on the mask image notified from the mask unit 440. The shooting conditions adjusted by the shooting condition adjusting unit 1750 include the white balance, exposure, focus, etc. of the RGB camera 121 .
 また、撮影条件調整部1750は、調整後の撮影条件を、RGBカメラ121に送信して設定する。 Also, the shooting condition adjustment unit 1750 transmits and sets the adjusted shooting conditions to the RGB camera 121 .
 このように、第3の実施形態に係るロボット制御システム100では、マスク画像に基づいてRGBカメラ121の撮影条件を調整する。これにより、物体認識処理に適した撮影条件とすることができる。この結果、第3の実施形態に係るロボット制御システム100によれば、物体認識処理における認識精度を向上させることができる。 As described above, the robot control system 100 according to the third embodiment adjusts the imaging conditions of the RGB camera 121 based on the mask image. Accordingly, it is possible to set shooting conditions suitable for object recognition processing. As a result, according to the robot control system 100 according to the third embodiment, it is possible to improve recognition accuracy in object recognition processing.
 <ロボット制御システム全体の処理の流れ>
 次に、第3の実施形態に係るロボット制御システム100全体の処理の流れについて説明する。図18は、ロボット制御システム全体の処理の流れを示す第2のフローチャートである。図8に示した第1のフローチャートとの相違点は、訓練処理(ステップS801)の前に、撮影条件調整処理(ステップS1801)が含まれている点である。
<Processing flow of the entire robot control system>
Next, the overall processing flow of the robot control system 100 according to the third embodiment will be described. FIG. 18 is a second flowchart showing the flow of processing of the entire robot control system. The difference from the first flowchart shown in FIG. 8 is that shooting condition adjustment processing (step S1801) is included before training processing (step S801).
 ステップS1801において、ロボット制御システム100は、撮影条件調整フェーズに移行し、撮影条件調整処理を実行する。なお、撮影条件調整処理のフローチャートの詳細は、図19を用いて説明する。 In step S1801, the robot control system 100 shifts to the imaging condition adjustment phase and executes imaging condition adjustment processing. Details of the flowchart of the shooting condition adjustment process will be described with reference to FIG. 19 .
 <撮影条件調整処理の流れ>
 図19は、撮影条件調整処理の流れを示すフローチャートである。
<Flow of shooting condition adjustment processing>
FIG. 19 is a flowchart showing the flow of imaging condition adjustment processing.
 ステップS1901において、画像処理装置130は、RGBカメラ121により撮影されたRGB画像及び深度カメラ122により撮影された3次元情報を取得する。 In step S<b>1901 , the image processing device 130 acquires the RGB image captured by the RGB camera 121 and the three-dimensional information captured by the depth camera 122 .
 ステップS1902において、画像処理装置130は、処理範囲情報及び作業ルール情報のもと、RGB画像において、3次元情報に基づいて非マスク領域を特定する。 In step S1902, the image processing device 130 identifies a non-masked area based on three-dimensional information in the RGB image based on the processing range information and work rule information.
 ステップS1903において、画像処理装置130は、RGB画像において、非マスク領域に含まれる画素以外の画素に対応付けられた色情報に対してマスク処理を行い、マスク画像を生成する。 In step S1903, the image processing apparatus 130 performs mask processing on color information associated with pixels other than pixels included in the non-mask area in the RGB image to generate a mask image.
 ステップS1904において、画像処理装置130は、マスク画像に基づいて、RGBカメラ121の撮影条件を調整する。 In step S1904, the image processing device 130 adjusts the shooting conditions of the RGB camera 121 based on the mask image.
 ステップS1905において、画像処理装置130は、ステップS1904において調整された撮影条件のもとで、RGBカメラ121により撮影されたRGB画像を取得する。 In step S1905, the image processing device 130 acquires an RGB image captured by the RGB camera 121 under the shooting conditions adjusted in step S1904.
 ステップS1906において、画像処理装置130は、ステップS1905において取得されたRGB画像において、ステップS1902において特定された非マスク領域に含まれる画素以外の画素に対応付けられた色情報に対してマスク処理を行。これにより、画像処理装置130は、マスク画像を生成する。 In step S1906, in the RGB image acquired in step S1905, the image processing apparatus 130 performs mask processing on color information associated with pixels other than the pixels included in the non-masked region specified in step S1902. . Thereby, the image processing device 130 generates a mask image.
 ステップS1907において、画像処理装置130は、ステップS1906において生成したマスク画像を評価する。 In step S1907, the image processing device 130 evaluates the mask image generated in step S1906.
 ステップS1908において、画像処理装置130は、マスク画像の評価結果に基づいて、撮影条件が最適化されたか否かを判定し、最適化されていないと判定した場合には(ステップS1908においてNOの場合には)、ステップS1904に戻る。なお、このとき、パレット140に積まれている段ボールの配置を変更する場合にあっては、ステップS1901に戻る。 In step S1908, the image processing apparatus 130 determines whether or not the imaging conditions have been optimized based on the evaluation result of the mask image. ), the process returns to step S1904. At this time, if the arrangement of the cardboard boxes stacked on the pallet 140 is to be changed, the process returns to step S1901.
 一方、ステップS1908において、最適化されたと判定した場合には(ステップS1908においてYESの場合には)、ステップS1909に進む。 On the other hand, if it is determined in step S1908 that it has been optimized (if YES in step S1908), the process proceeds to step S1909.
 ステップS1909において、画像処理装置130は、最適化された撮影条件をRGBカメラ121に送信して設定し、撮影条件調整処理を終了する。これにより、訓練フェーズ及びデパレタイズフェーズでは、最適化された撮影条件のもとで、RGBカメラ121により撮影されたRGB画像を取得することができる。 In step S1909, the image processing device 130 transmits and sets the optimized shooting conditions to the RGB camera 121, and ends the shooting condition adjustment processing. As a result, in the training phase and the depalletizing phase, it is possible to acquire RGB images shot by the RGB camera 121 under optimized shooting conditions.
 <まとめ>
 以上の説明から明らかなように、第3の実施形態に係るロボット制御システム100は、
・パレットに1以上の段ボールが積まれた所定の空間について、1以上の段ボールを撮影することで当該所定の空間のRGB画像と当該所定の空間の3次元情報とを取得する。
・取得した3次元情報に基づいて、RGB画像の一部の色情報に対してマスク処理を行う。
・マスク処理後のマスク画像を用いて、RGBカメラの撮影条件を調整する。
<Summary>
As is clear from the above description, the robot control system 100 according to the third embodiment
- For a predetermined space in which one or more cardboard boxes are stacked on a pallet, an RGB image of the predetermined space and three-dimensional information of the predetermined space are acquired by photographing the one or more cardboard boxes.
- Based on the acquired three-dimensional information, mask processing is performed on part of the color information of the RGB image.
・Adjust the shooting conditions of the RGB camera using the mask image after the mask processing.
 これにより、第3の実施形態に係るロボット制御システム100によれば、物体認識処理に適した撮影条件とすることが可能となり、物体認識処理における認識精度を向上させることができる。 As a result, according to the robot control system 100 according to the third embodiment, it is possible to set shooting conditions suitable for object recognition processing, and to improve recognition accuracy in object recognition processing.
 [第4の実施形態]
 上記各実施形態では、RGBカメラ121及び深度カメラ122が、ロボット110が配された空間の天井に固定して取り付けられるものとして説明したが、RGBカメラ121及び深度カメラ122の取り付け位置は、空間の天井に限定されない。また、RGBカメラ121及び深度カメラ122の取り付け位置は、パレット140の真上に限定されない。更に、RGBカメラ121及び深度カメラ122の取り付け先は、非可動物体に限定されず、可動物体(例えば、ロボット110)であってもよい。
[Fourth embodiment]
In each of the above embodiments, the RGB camera 121 and the depth camera 122 are fixedly attached to the ceiling of the space where the robot 110 is arranged. Not limited to ceilings. Also, the mounting positions of the RGB camera 121 and the depth camera 122 are not limited to right above the pallet 140 . Furthermore, attachment destinations of the RGB camera 121 and the depth camera 122 are not limited to non-movable objects, and may be movable objects (for example, the robot 110).
 また、上記各実施形態において、3次元情報取得部410は、3次元情報として、深度カメラ122により撮影された3次元座標(x座標、y座標、z座標)を取得するものとして説明した。しかしながら、3次元情報取得部410は、3次元座標(x座標、y座標、z座標)以外の3次元情報を取得してもよい。3次元座標(x座標、y座標、z座標)以外の3次元情報には、例えば、点群情報、メッシュ情報等が含まれる。 Also, in each of the above embodiments, the three-dimensional information acquisition unit 410 acquires three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate) captured by the depth camera 122 as three-dimensional information. However, the three-dimensional information acquisition section 410 may acquire three-dimensional information other than the three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate). Three-dimensional information other than three-dimensional coordinates (x-coordinate, y-coordinate, z-coordinate) includes, for example, point cloud information, mesh information, and the like.
 また、上記第1及び第2の実施形態では、デパレタイズフェーズにおいて、画像処理装置130がロボット制御部660を有するものとして説明した。しかしながら、デパレタイズフェーズにおいて、ロボット制御部660は、ロボット110内において実現されてもよい。 Also, in the first and second embodiments, the image processing device 130 has the robot control unit 660 in the depalletizing phase. However, in the depalletizing phase, robot controller 660 may be implemented within robot 110 .
 また、上記第1及び第2の実施形態では、デパレタイズフェーズにおいて、画像処理装置130が訓練済み物体認識部650を有するものとして説明した。しかしながら、デパレタイズフェーズにおいて、訓練済み物体認識部650は、画像処理装置130以外の装置において実現されてもよい。 Also, in the first and second embodiments, the image processing device 130 has the trained object recognition unit 650 in the depalletizing phase. However, in the depalletizing phase, trained object recognizer 650 may be implemented in a device other than image processing device 130 .
 また、上記第3の実施形態では、撮影条件調整フェーズにおいて、画像処理装置130が撮影条件調整部1750を有するものとして説明した。しかしながら、撮影条件調整フェーズにおいて、撮影条件調整部1750は、画像処理装置130以外の装置において実現されてもよい。 Also, in the above-described third embodiment, the image processing apparatus 130 has the imaging condition adjustment unit 1750 in the imaging condition adjustment phase. However, in the imaging condition adjustment phase, the imaging condition adjustment unit 1750 may be implemented in a device other than the image processing device 130. FIG.
 また、上記第1及び第2の実施形態では、訓練用データとして、マスク処理済のデータ(マスク画像やマスク3次元情報)を用いて物体認識部を訓練し、訓練済みの物体認識部に対して、マスク処理済のデータ(マスク画像やマスク3次元情報)を入力するものとして説明した。しかしながら、訓練用データとして、マスク処理が行われていないデータ(RGB画像や3次元情報)を用いて物体認識部を訓練し、訓練済みの物体認識部に対して、マスク処理済のデータ(マスク画像やマスク3次元情報)を入力して物体認識処理を行ってもよい。 In the first and second embodiments described above, the object recognition unit is trained using masked data (mask images and mask three-dimensional information) as training data, and the trained object recognition unit In the description above, masked data (a mask image and mask three-dimensional information) is input. However, as training data, the object recognition unit is trained using data (RGB images and 3D information) that have not undergone mask processing, and the masked data (mask An image or three-dimensional mask information) may be input to perform object recognition processing.
 [その他の実施形態]
 本明細書(請求項を含む)において、「a、b及びcの少なくとも1つ(一方)」又は「a、b又はcの少なくとも1つ(一方)」の表現(同様な表現を含む)が用いられる場合は、a、b、c、a-b、a-c、b-c、又はa-b-cのいずれかを含む。また、a-a、a-b-b、a-a-b-b-c-c等のように、いずれかの要素について複数のインスタンスを含んでもよい。さらに、a-b-c-dのようにdを有する等、列挙された要素(a、b及びc)以外の他の要素を加えることも含む。
[Other embodiments]
In the present specification (including claims), the expression "at least one (one) of a, b and c" or "at least one (one) of a, b or c" (including similar expressions) Where used, includes any of a, b, c, ab, ac, bc, or abc. It may also include multiple instances of any element, such as aa, abb, aabbbcc, and so on. It also includes the addition of elements other than the listed elements (a, b and c), such as having d as in abcd.
 また、本明細書(請求項を含む)において、「データを入力として/データに基づいて/に従って/に応じて」等の表現(同様な表現を含む)が用いられる場合は、特に断りがない場合、各種データそのものを入力として用いる場合や、各種データに何らかの処理を行ったもの(例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等)を入力として用いる場合を含む。また「データに基づいて/に従って/に応じて」何らかの結果が得られる旨が記載されている場合、当該データのみに基づいて当該結果が得られる場合を含むとともに、当該データ以外の他のデータ、原因、条件、及び/又は状態等にも影響を受けて当該結果が得られる場合をも含み得る。また、「データを出力する」旨が記載されている場合、特に断りがない場合、各種データそのものを出力として用いる場合や、各種データに何らかの処理を行ったもの(例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等)を出力とする場合も含む。 In addition, in this specification (including claims), when expressions such as "data as input/based on data/according to/according to" (including similar expressions) are used, there is no particular notice The case includes the case where various data itself is used as input, and the case where various data subjected to some processing (for example, noise added, normalized, intermediate representation of various data, etc.) is used as input. In addition, if it is stated that some result can be obtained "based on/according to/depending on the data", this includes cases where the result is obtained based only on the data, other data other than the data, It may also include cases where the result is obtained under the influence of causes, conditions, and/or states. In addition, if it is stated that "data will be output", unless otherwise specified, if the various data themselves are used as output, or if the various data have undergone some processing (for example, noise addition, normalization, etc.) This also includes the case where the output is a converted version, an intermediate representation of various data, etc.).
 また、本明細書(請求項を含む)において、「接続される(connected)」及び「結合される(coupled)」との用語が用いられる場合は、直接的な接続/結合、間接的な接続/結合、電気的(electrically)な接続/結合、通信的(communicatively)な接続/結合、機能的(operatively)な接続/結合、物理的(physically)な接続/結合等のいずれをも含む非限定的な用語として意図される。当該用語は、当該用語が用いられた文脈に応じて適宜解釈されるべきであるが、意図的に或いは当然に排除されるのではない接続/結合形態は、当該用語に含まれるものして非限定的に解釈されるべきである。 Also, in this specification (including claims), when the terms "connected" and "coupled" are used, they refer to direct connection/coupling, indirect connection including but not limited to /coupled, electrically connected/coupled, communicatively connected/coupled, operatively connected/coupled, physically connected/coupled, etc. intended as a generic term. The term should be interpreted appropriately according to the context in which the term is used, but any form of connection/bonding that is not intentionally or naturally excluded is not included in the term. should be interpreted restrictively.
 また、本明細書(請求項を含む)において、「AがBするよう構成される(A configured to B)」との表現が用いられる場合は、要素Aの物理的構造が、動作Bを実行可能な構成を有するとともに、要素Aの恒常的(permanent)又は一時的(temporary)な設定(setting/configuration)が、動作Bを実際に実行するように設定(configured/set)されていることを含んでよい。例えば、要素Aが汎用プロセッサである場合、当該プロセッサが動作Bを実行可能なハードウェア構成を有するとともに、恒常的(permanent)又は一時的(temporary)なプログラム(命令)の設定により、動作Bを実際に実行するように設定(configured)されていればよい。また、要素Aが専用プロセッサ又は専用演算回路等である場合、制御用命令及びデータが実際に付属しているか否かとは無関係に、当該プロセッサの回路的構造が動作Bを実際に実行するように構築(implemented)されていればよい。 Also, in this specification (including claims), when the expression "A configured to B" is used, the physical structure of element A performs operation B have possible configuration and that the permanent or temporary setting/configuration of element A is configured/set to actually perform action B may contain. For example, when element A is a general-purpose processor, the processor has a hardware configuration capable of executing operation B, and operation B is performed by setting a permanent or temporary program (instruction). It just needs to be configured to actually run. In addition, when the element A is a dedicated processor or a dedicated arithmetic circuit, etc., regardless of whether or not control instructions and data are actually attached, the circuit structure of the processor actually executes the operation B. It just needs to be implemented.
 また、本明細書(請求項を含む)において、含有又は所有を意味する用語(例えば、「含む(comprising/including)」、「有する(having)」等)が用いられる場合は、当該用語の目的語により示される対象物以外の物を含有又は所有する場合を含む、open-endedな用語として意図される。これらの含有又は所有を意味する用語の目的語が数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)である場合は、当該表現は特定の数に限定されないものとして解釈されるべきである。 Also, in the present specification (including claims), when a term meaning containing or possessing (e.g., "comprising/including", "having", etc.) is used, the purpose of the term is It is intended as an open-ended term, including when it contains or possesses things other than the object indicated by the word. When the object of these terms of inclusion or possession is an expression that does not specify a quantity or implies a singular number (an expression with the article a or an), the expression shall be construed as not being limited to a specific number. It should be.
 また、本明細書(請求項を含む)において、ある箇所において「1つ又は複数(one or more)」又は「少なくとも1つ(at least one)」等の表現が用いられ、他の箇所において数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)が用いられているとしても、後者の表現が「1つ」を意味することを意図しない。一般に、数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)は、必ずしも特定の数に限定されないものとして解釈されるべきである。 In addition, in this specification (including claims), expressions such as "one or more" or "at least one" are used in some places, and in other places Even if an expression that does not specify or suggests a singular number (an expression with the articles a or an) is used, it is not intended that the latter expression means "one." In general, expressions that do not specify a quantity or imply a singular number (indicative of the articles a or an) should be construed as not necessarily being limited to a particular number.
 また、本明細書において、ある実施例の有する特定の構成について特定の効果(advantage/result)が得られる旨が記載されている場合、別段の理由がない限り、当該構成を有する他の1つ又は複数の実施例についても当該効果が得られると理解されるべきである。但し当該効果の有無は、一般に種々の原因、条件、及び/又は状態等に依存し、当該構成により必ず当該効果が得られるものではないと理解されるべきである。当該効果は、種々の原因、条件、及び/又は状態等が満たされたときに実施例に記載の当該構成により得られるものに過ぎず、当該構成又は類似の構成を規定したクレームに係る発明において、当該効果が必ずしも得られるものではない。 Further, in this specification, when it is stated that a particular configuration of an embodiment has a particular effect (advantage/result), unless there is a specific reason otherwise, another one having that configuration Alternatively, it should be understood that the same effect can be obtained for a plurality of embodiments. However, it should be understood that the presence or absence of the effect generally depends on various causes, conditions, and/or states, and that the configuration does not always provide the effect. The effect is only obtained by the configuration described in the embodiment when various causes, conditions, and/or states are satisfied, and in the claimed invention defining the configuration or a similar configuration , the effect is not necessarily obtained.
 本明細書(請求項を含む)において、「最適化する(optimize)/最適化(optimization)」等の用語が用いられる場合は、グローバルな最適値を求めること、グローバルな最適値の近似値を求めること、ローカルな最適値を求めること、及びローカルな最適値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最適値の近似値を確率的又はヒューリスティックに求めることを含む。 In this specification (including claims), when terms such as "optimize/optimization" are used, it means to obtain a global optimum value, to obtain an approximation of a global optimum value. Including determining, determining a local optimum, and determining an approximation of a local optimum, should be interpreted appropriately depending on the context in which the term is used. It also includes stochastically or heuristically approximating these optimum values.
 また、本明細書(請求項を含む)において、複数のハードウェアが所定の処理を行う場合、各ハードウェアが協働して所定の処理を行ってもよいし、一部のハードウェアが所定の処理の全てを行ってもよい。また、一部のハードウェアが所定の処理の一部を行い、別のハードウェアが所定の処理の残りを行ってもよい。本明細書(請求項を含む)において、「1又は複数のハードウェアが第1の処理を行い、前記1又は複数のハードウェアが第2の処理を行う」等の表現が用いられている場合、第1の処理を行うハードウェアと第2の処理を行うハードウェアは同じものであってもよいし、異なるものであってもよい。つまり、第1の処理を行うハードウェア及び第2の処理を行うハードウェアが、前記1又は複数のハードウェアに含まれていればよい。なお、ハードウェアは、電子回路、電子回路を含む装置等を含んでよい。 Further, in this specification (including the claims), when a plurality of pieces of hardware perform predetermined processing, each piece of hardware may work together to perform the predetermined processing. You may perform all the processing of. Also, some hardware may perform a part of the predetermined processing, and another hardware may perform the rest of the predetermined processing. In the present specification (including claims), when expressions such as "one or more pieces of hardware perform a first process and the one or more pieces of hardware perform a second process" are used , the hardware that performs the first process and the hardware that performs the second process may be the same or different. In other words, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more pieces of hardware. The hardware may include electronic circuits, devices including electronic circuits, and the like.
 また、本明細書(請求項を含む)において、複数の記憶装置(メモリ)がデータの記憶を行う場合、複数の記憶装置(メモリ)のうち個々の記憶装置(メモリ)は、データの一部のみを記憶してもよいし、データの全体を記憶してもよい。 Further, in this specification (including claims), when a plurality of storage devices (memories) store data, individual storage devices (memories) among the plurality of storage devices (memories) represent part of the data. Only the data may be stored, or the entire data may be stored.
 以上、本開示の実施形態について詳述したが、本開示は上記した個々の実施形態に限定されるものではない。特許請求の範囲に規定された内容及びその均等物から導き出される本発明の概念的な思想と趣旨を逸脱しない範囲において種々の追加、変更、置き換え、部分的削除等が可能である。例えば、前述した全ての実施形態において、数値又は数式を説明に用いている場合は、一例として示したものであり、これらに限られるものではない。また、実施形態における各動作の順序は、一例として示したものであり、これらに限られるものではない。 Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the individual embodiments described above. Various additions, changes, replacements, partial deletions, etc. are possible without departing from the conceptual idea and spirit of the present invention derived from the content defined in the claims and equivalents thereof. For example, in all the embodiments described above, when numerical values or formulas are used for explanation, they are shown as an example and are not limited to these. Also, the order of each operation in the embodiment is shown as an example, and is not limited to these.
 本出願は、2022年1月25日に出願された日本国特許出願第2022-009614号に基づきその優先権を主張するものであり、同日本国特許出願の全内容を参照することにより本願に援用する。 This application claims priority based on Japanese Patent Application No. 2022-009614 filed on January 25, 2022. invoke.

Claims (16)

  1.  1又は複数のメモリと、
     1又は複数のプロセッサと、を備え、
     前記1又は複数のプロセッサは、
      1以上の物体が含まれる所定の空間について、当該所定の空間の階調画像と当該所定の空間の3次元情報とを取得することと、
      前記3次元情報に基づいて前記階調画像の一部をマスクすることと、
      マスク後の前記階調画像を用いて、所定の処理を行うことと、
     を実行する画像処理装置。
    one or more memories;
    one or more processors;
    The one or more processors are
    Acquiring a gradation image of the predetermined space and three-dimensional information of the predetermined space for the predetermined space containing one or more objects;
    masking a portion of the grayscale image based on the three-dimensional information;
    performing a predetermined process using the masked gradation image;
    An image processing device that executes
  2.  前記階調画像の前記一部をマスクすることは、
      前記3次元情報に基づいて前記階調画像の前記一部を特定する処理と、
      前記特定する処理により特定された前記階調画像の前記一部をマスクする処理と、
     を含む、請求項1に記載の画像処理装置。
    masking the portion of the grayscale image;
    a process of identifying the part of the gradation image based on the three-dimensional information;
    a process of masking the portion of the gradation image specified by the specifying process;
    The image processing device according to claim 1, comprising:
  3.  前記特定する処理は、前記階調画像の前記一部として、前記所定の空間における特定の3次元座標に対応する、前記階調画像の2次元座標を特定する、請求項2に記載の画像処理装置。 3. The image processing according to claim 2, wherein said specifying process specifies, as said part of said grayscale image, two-dimensional coordinates of said grayscale image corresponding to specific three-dimensional coordinates in said predetermined space. Device.
  4.  前記マスクする処理は、前記階調画像の前記一部の画素の階調情報をマスクすることである、請求項2に記載の画像処理装置。 The image processing apparatus according to claim 2, wherein the masking process is to mask the gradation information of the partial pixels of the gradation image.
  5.  前記階調画像の前記一部をマスクすることは、
      前記3次元情報に基づいて、前記1以上の物体の中から、所定の条件を満たす物体の領域を特定する処理と、
      前記階調画像の各画素に対応付けられた階調情報のうち、前記特定した物体の領域に含まれる画素以外の画素に対応付けられた階調情報をマスクする処理と
     を含む、請求項1に記載の画像処理装置。
    masking the portion of the grayscale image;
    A process of identifying an area of an object that satisfies a predetermined condition from among the one or more objects based on the three-dimensional information;
    and a process of masking grayscale information associated with pixels other than pixels included in the region of the identified object among the grayscale information associated with each pixel of the grayscale image. The image processing device according to .
  6.  前記1又は複数のプロセッサは、更に、
      前記3次元情報に基づいて前記3次元情報の一部をマスクすることを実行し、
     前記所定の処理は、マスク後の前記階調画像及びマスク後の前記3次元情報を用いて、所定の処理を行うことが含まれる、請求項5に記載の画像処理装置。
    The one or more processors further comprise:
    performing masking a portion of the three-dimensional information based on the three-dimensional information;
    6. The image processing apparatus according to claim 5, wherein said predetermined processing includes performing predetermined processing using said masked gradation image and said masked three-dimensional information.
  7.  前記3次元情報の一部をマスクすることは、
      前記階調画像の各画素に対応付けられた3次元情報のうち、前記所定の条件を満たす物体の領域に含まれる画素以外の画素に対応付けられた3次元情報をマスクする処理、
     を含む、請求項6に記載の画像処理装置。
    Masking a portion of the three-dimensional information includes:
    A process of masking three-dimensional information associated with pixels other than pixels included in the region of the object that satisfies the predetermined condition among the three-dimensional information associated with each pixel of the gradation image;
    7. The image processing device according to claim 6, comprising:
  8.  前記所定の条件を満たす物体の領域は、所定の3次元情報を有する物体の領域である、請求項6に記載の画像処理装置。 The image processing apparatus according to claim 6, wherein the area of the object satisfying the predetermined condition is an area of the object having predetermined three-dimensional information.
  9.  前記所定の3次元情報を有する物体の領域は、前記1以上の物体のうち、上面の高さがが前記所定の3次元情報を有する物体の領域である、請求項8に記載の画像処理装置。 9. The image processing apparatus according to claim 8, wherein the area of the object having the predetermined three-dimensional information is an area of the object having the predetermined three-dimensional information, among the one or more objects, the height of the top surface of which is the area of the object. .
  10.  前記所定の処理には、物体認識処理又は前記階調画像を撮影する際の撮影条件を調整する調整処理が含まれる、請求項8に記載の画像処理装置。 The image processing apparatus according to claim 8, wherein the predetermined processing includes object recognition processing or adjustment processing for adjusting shooting conditions when shooting the gradation image.
  11.  前記所定の処理が物体認識処理である場合、前記1以上の物体が載置されうる範囲において、前記所定の条件を満たす物体の領域を特定する、請求項10に記載の画像処理装置。 11. The image processing apparatus according to claim 10, wherein when the predetermined processing is object recognition processing, an area of the object that satisfies the predetermined condition is specified within a range in which the one or more objects can be placed.
  12.  前記物体認識処理を、深層ニューラルネットワークを用いて行う、請求項10に記載の画像処理装置。 The image processing apparatus according to claim 10, wherein the object recognition processing is performed using a deep neural network.
  13.  前記深層ニューラルネットワークは、訓練用データとして、マスク後の階調画像を用いて訓練されている、請求項12に記載の画像処理装置。 The image processing apparatus according to claim 12, wherein the deep neural network is trained using a masked gradation image as training data.
  14.  前記深層ニューラルネットワークは、訓練用データとして、マスク後の階調画像及びマスク後の3次元情報を用いて訓練されている、請求項12に記載の画像処理装置。 The image processing apparatus according to claim 12, wherein the deep neural network is trained using a masked gradation image and masked three-dimensional information as training data.
  15.  前記調整処理において調整される撮影条件には、前記階調画像を撮影する画像生成装置のホワイトバランス、露光、焦点が含まれる、請求項10に記載の画像処理装置。 The image processing apparatus according to claim 10, wherein the shooting conditions adjusted in the adjustment process include white balance, exposure, and focus of an image generating apparatus that shoots the gradation image.
  16.  請求項1に記載の画像処理装置と、
     前記階調画像を生成する画像生成装置と、
     前記3次元情報を生成する3次元情報生成装置と、
     前記1以上の物体をデパレタイズするロボットと
     を有するロボット制御システム。
    The image processing device according to claim 1;
    an image generation device that generates the gradation image;
    a three-dimensional information generating device that generates the three-dimensional information;
    and a robot that depalletizes the one or more objects.
PCT/JP2023/001517 2022-01-25 2023-01-19 Image processing device, image processing method, image processing program, and robot control system WO2023145599A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-009614 2022-01-25
JP2022009614 2022-01-25

Publications (1)

Publication Number Publication Date
WO2023145599A1 true WO2023145599A1 (en) 2023-08-03

Family

ID=87471833

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/001517 WO2023145599A1 (en) 2022-01-25 2023-01-19 Image processing device, image processing method, image processing program, and robot control system

Country Status (1)

Country Link
WO (1) WO2023145599A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004307110A (en) * 2003-04-04 2004-11-04 Daifuku Co Ltd Object stacking method and facility for it
JP2007274234A (en) * 2006-03-30 2007-10-18 National Institute Of Advanced Industrial & Technology White cane user detection system using stereo camera
JP2009031939A (en) * 2007-07-25 2009-02-12 Advanced Telecommunication Research Institute International Image processing apparatus, method and program
JP2019028926A (en) * 2017-08-03 2019-02-21 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program
US20190246041A1 (en) * 2018-02-07 2019-08-08 Robert Bosch Gmbh Method and device for object identification
JP2020035443A (en) * 2018-08-24 2020-03-05 株式会社豊田中央研究所 Sensing device
US20200311969A1 (en) * 2019-03-26 2020-10-01 Samsung Electronics Co., Ltd. Method and apparatus for estimating tool trajectories

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004307110A (en) * 2003-04-04 2004-11-04 Daifuku Co Ltd Object stacking method and facility for it
JP2007274234A (en) * 2006-03-30 2007-10-18 National Institute Of Advanced Industrial & Technology White cane user detection system using stereo camera
JP2009031939A (en) * 2007-07-25 2009-02-12 Advanced Telecommunication Research Institute International Image processing apparatus, method and program
JP2019028926A (en) * 2017-08-03 2019-02-21 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program
US20190246041A1 (en) * 2018-02-07 2019-08-08 Robert Bosch Gmbh Method and device for object identification
JP2020035443A (en) * 2018-08-24 2020-03-05 株式会社豊田中央研究所 Sensing device
US20200311969A1 (en) * 2019-03-26 2020-10-01 Samsung Electronics Co., Ltd. Method and apparatus for estimating tool trajectories

Similar Documents

Publication Publication Date Title
KR102472592B1 (en) Updating of local feature models based on robot behavior calibration
CN108198141B (en) Image processing method and device for realizing face thinning special effect and computing equipment
JP6415026B2 (en) Interference determination apparatus, interference determination method, and computer program
US11667036B2 (en) Workpiece picking device and workpiece picking method
US9691132B2 (en) Method and apparatus for inferring facial composite
JP2020064335A (en) Shape information generation device, control device, unloading device, distribution system, program, and control method
CN112085755A (en) Object contour detection method, device and equipment and storage medium
US11538238B2 (en) Method and system for performing image classification for object recognition
CN110232315A (en) Object detection method and device
JP2013067499A (en) Article inspection device, article inspection system, article inspection method, and program
JP2018126862A (en) Interference determination apparatus, interference determination method, and computer program
WO2023145599A1 (en) Image processing device, image processing method, image processing program, and robot control system
CN113255664B (en) Image processing method, related device and computer program product
WO2011010693A1 (en) Marker generation device, marker generation detection system, marker generation detection device, marker, marker generation method, and program therefor
JP2010210511A (en) Recognition device of three-dimensional positions and attitudes of objects, and method for the same
EP4281937A1 (en) Enhancing three-dimensional models using multi-view refinement
CN114092428A (en) Image data processing method, image data processing device, electronic equipment and storage medium
CN110533717B (en) Target grabbing method and device based on binocular vision
WO2019003687A1 (en) Projection instruction device, baggage sorting system, and projection instruction method
EP4275176A1 (en) Three-dimensional scan registration with deformable models
CN113688704A (en) Item sorting method, item sorting device, electronic device, and computer-readable medium
CN113780269A (en) Image recognition method, device, computer system and readable storage medium
WO2022137509A1 (en) Object recognition device, object recognition method, non-transitory computer-readable medium, and object recognition system
CN116228854B (en) Automatic parcel sorting method based on deep learning
US20230169324A1 (en) Use synthetic dataset to train robotic depalletizing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23746812

Country of ref document: EP

Kind code of ref document: A1