WO2021149251A1 - Object recognition device and object recognition method - Google Patents

Object recognition device and object recognition method Download PDF

Info

Publication number
WO2021149251A1
WO2021149251A1 PCT/JP2020/002577 JP2020002577W WO2021149251A1 WO 2021149251 A1 WO2021149251 A1 WO 2021149251A1 JP 2020002577 W JP2020002577 W JP 2020002577W WO 2021149251 A1 WO2021149251 A1 WO 2021149251A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
unit
recognition
image conversion
target
Prior art date
Application number
PCT/JP2020/002577
Other languages
French (fr)
Japanese (ja)
Inventor
彩佳里 大島
亮輔 川西
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to JP2021572241A priority Critical patent/JP7361800B2/en
Priority to CN202080092120.2A priority patent/CN114981837A/en
Priority to PCT/JP2020/002577 priority patent/WO2021149251A1/en
Publication of WO2021149251A1 publication Critical patent/WO2021149251A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to an object recognition device and an object recognition method that recognize an object object based on a photographed image of the object object.
  • Patent Document 1 discloses a technique for recognizing the state of an object based on an image of the target object in a gripping system that grips the target object.
  • Patent Document 1 there is a problem that the recognition performance may deteriorate when the environment when the recognition process is executed, for example, the surrounding environment of the target object, the measurement conditions, and the like change. there were.
  • the present disclosure has been made in view of the above, and an object of the present disclosure is to obtain an object recognition device capable of improving recognition performance even when the environment when executing recognition processing changes. do.
  • the object recognition device of the present disclosure is an image acquired by the image acquisition unit using an image acquisition unit that acquires an image of the target object and an image conversion parameter.
  • An image conversion unit that converts the sensor image into an image and outputs the converted image, a recognition unit that recognizes the state of the target object based on the converted image, and a conversion unit that generates a converted image based on the recognition result of the recognition unit. It is characterized by including an evaluation unit for evaluating the image conversion parameter used for the purpose, and an output unit for outputting the recognition result and the evaluation result of the evaluation unit.
  • the figure which shows an example of the display screen displayed by the output part shown in FIG. The figure which shows an example of the detailed structure of the 1st learning part shown in FIG.
  • a flowchart for explaining an operation example of the first learning unit shown in FIG. The figure for demonstrating the operation example when the 1st learning part shown in FIG. 1 uses CycleGAN.
  • FIG. 8 before the start of operation.
  • a flowchart for explaining the operation of the simulation unit shown in FIG. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 11 before the start of operation.
  • the figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 4. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 13 before the start of operation.
  • FIG. 1 is a diagram showing a functional configuration of the object recognition device 10 according to the first embodiment.
  • the object recognition device 10 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108 and an input receiving unit 109.
  • the object recognition device 10 has a function of recognizing a state such as the position and orientation of the target object based on a photographed image of the target object.
  • the image acquisition unit 101 acquires an image of the target object.
  • the image acquisition unit 101 may be an imaging device having an image sensor, or may be an interface for acquiring an image captured by a photographing device connected to the object recognition device 10.
  • the image acquired by the image acquisition unit 101 is referred to as a sensor image.
  • the image acquisition unit 101 outputs the acquired sensor image to each of the image conversion unit 102 and the first learning unit 105.
  • the sensor image may be a monochrome image or an RGB image.
  • the sensor image may be a distance image in which the distance is expressed by the brightness and darkness. The distance image may be generated based on the set data of points having three-dimensional position information.
  • the image acquisition unit 101 acquires the minimum information for reconstructing a set of points having three-dimensional position information from the distance image at the same time as the distance image.
  • the minimum information for reconstructing a set of points is focal length, scale, and so on.
  • the image acquisition unit 101 may be able to acquire a plurality of types of images.
  • the image acquisition unit 101 may be able to acquire both a monochrome image and a distance image of the target object.
  • the image acquisition unit 101 may be a photographing device capable of capturing both a monochrome image and a distance image by one unit, a photographing device for capturing a monochrome image, and a photographing device for capturing a distance image. It may be composed of and.
  • the monochrome image shooting and the distance image shooting are performed by different shooting devices, it is preferable to grasp the positional relationship between the two shooting devices in advance.
  • the image conversion unit 102 converts the sensor image acquired by the image acquisition unit 101 into an image using the image conversion parameter, and outputs the converted image to the recognition unit 103.
  • the image conversion unit 102 is stored in the storage unit 106 so that the sensor image has a predetermined feature for each target image group by using the image conversion parameter which is the learning result of the first learning unit 105. Perform image conversion.
  • an image having predetermined features is referred to as a target image
  • a set of target images is referred to as a target image group.
  • Common features are, for example, the shape of the target object, the surface characteristics of the target object, the measurement distance, the depth, and the like.
  • common features are the position and orientation of objects other than the target object to be recognized, the type and intensity of ambient light, the type of measurement sensor, the parameters of the measurement sensor, the arrangement state of the target object, the image style, and the target object. It may be the quantity of.
  • the parameters of the measurement sensor are parameters such as focus and aperture.
  • the arrangement state of the target object is an alignment state, a bulk state, or the like.
  • a plurality of target images included in the same target image group may have one common feature or may have a plurality of common features.
  • “having a common feature” includes not only the case where the above-mentioned features are the same but also the case where they are similar.
  • a reference shape such as a rectangular parallelepiped, a cylinder, or a hexagonal column
  • the shape of the target object has a common feature even if the shape of the target object in the target image is close enough to approximate the same reference shape. It can be an image having.
  • the standard colors such as black, white, and gray are set for the surface characteristics of the target object, even if the apparent hues of the target objects in the target image are close enough to be classified into the same standard colors. It can be an image having common features.
  • At least one target object is shown in the target image.
  • the target object shown in the target image does not necessarily have to be shown in its entirety. For example, if a part of the target object is out of the measurement range, or if the target object is partially hidden by another object, the part of the target object displayed in the target image may be missing. no problem.
  • the arrangement state of the plurality of target objects may be an aligned state or a bulk state.
  • the target image is preferably an image that makes it easy to recognize the target object.
  • An image in which the target object can be easily recognized is, for example, an image in which the shape of the target object is not complicated, has a simple shape such as a rectangular parallelepiped or a cube, and has less noise.
  • the number and types of image conversion parameters used by the image conversion unit 102 differ depending on the image conversion method. It is desirable that the image conversion unit 102 use an image conversion method such that the state such as the position and orientation of the target object in the converted image is not significantly different from the state of the target object in the sensor image.
  • the image conversion unit 102 can use, for example, an image conversion method using a neural network. When an image conversion method using a neural network is used, the image conversion parameters include a weighting coefficient between each unit constituting the network.
  • the recognition unit 103 recognizes a state such as the position and orientation of the target object based on the converted image output by the image conversion unit 102.
  • the recognition method used by the recognition unit 103 is not particularly limited.
  • the recognition unit 103 may use a machine learning-based recognition method that performs pre-learning so that the state of the target object can be output from the image, or the CAD (Computer-Aided Design) data of the target object. Model matching that estimates the state of the target object by collating it with the three-dimensional measurement data may be used.
  • the recognition unit 103 may perform the recognition process using one type of recognition method, or may perform the recognition process using a combination of a plurality of types of recognition methods.
  • the recognition unit 103 outputs the recognition result to each of the output unit 104 and the evaluation unit 108.
  • the recognition result includes, for example, at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103.
  • the output unit 104 has a function of outputting the recognition result and the evaluation result of the evaluation unit 108, which will be described in detail later.
  • the method of outputting the recognition result and the evaluation result by the output unit 104 is not particularly limited.
  • the output unit 104 includes a display device, and may display the recognition result and the evaluation result on the screen of the display device. Further, the output unit 104 is provided with an interface with an external device, and the recognition result and the evaluation result may be transmitted to the external device.
  • FIG. 2 is a diagram showing an example of a display screen displayed by the output unit 104 shown in FIG.
  • “Input” in FIG. 2 indicates an area for displaying a sensor image
  • “parameter” indicates an area for displaying an image conversion parameter and an evaluation value which is an evaluation result.
  • “conversion” in FIG. 2 indicates an area for displaying the converted image
  • “recognition” indicates an area for displaying the recognition result. For example, when the user performs an operation of selecting one of a plurality of image conversion parameters displayed on the "parameter", the name of the selected image conversion parameter is displayed on the "Name" of the display screen.
  • the first learning unit 105 learns image conversion parameters for image conversion of the sensor image so as to have the characteristics of the target image group.
  • the first learning unit 105 learns the image conversion parameters used by the image conversion unit 102 for each target image group.
  • FIG. 3 is a diagram showing an example of a detailed configuration of the first learning unit 105 shown in FIG.
  • the first learning unit 105 has a state observation unit 11 and a machine learning unit 12.
  • the first learning unit 105 can obtain an image conversion parameter capable of performing image conversion that reproduces the characteristics of the target image group. The possibility is high.
  • the learning of the image conversion parameter of the first learning unit 105 is difficult to converge.
  • the state observation unit 11 observes the image conversion parameters, the target image group, and the similarity between the converted image and the features of the target image group as state variables.
  • the machine learning unit 12 learns the image conversion parameters for each target image group according to the training data set created based on the image conversion parameters, the target image group, and the state variables of the similarity.
  • the learning algorithm used by the machine learning unit 12 may be any. As an example, a case where the machine learning unit 12 uses reinforcement learning will be described. Reinforcement learning is a learning algorithm in which an agent, who is the subject of action in a certain environment, observes the current state and decides the action to be taken. Agents are rewarded by the environment by choosing an action and learn how to get the most reward through a series of actions. Q-learning and TD-learning are known as typical methods of reinforcement learning. For example, if the Q-learning, general update equations of action value function Q (s t, a t) is expressed by the following equation (1).
  • Equation (1) s t represents the environment at time t, a t represents the behavior in time t.
  • the environment is changed to s t + 1.
  • r t + 1 denotes the reward given in accordance with the changing environment as a result of action a t, gamma represents the discount rate, alpha represents a learning coefficient.
  • the update formula represented by the formula (1) increases the action value Q if the action value Q of the best action a at time t + 1 is larger than the action value Q of the action a executed at time t, and vice versa. In the case of, the action value Q is reduced. In other words, the action value Q of action a at time t, as close to the best action value at time t + 1, action value function Q (s t, a t) Update. By repeating such updates, the best behavioral value in a certain environment is sequentially propagated to the behavioral value in the previous environment.
  • the machine learning unit 12 has a reward calculation unit 121 and a function update unit 122.
  • the reward calculation unit 121 calculates the reward based on the state variable.
  • the reward calculation unit 121 calculates the reward r based on the similarity included in the state variable.
  • the degree of similarity increases as the converted image reproduces the characteristics of the target image group. For example, if the similarity is higher than a predetermined threshold, the reward calculation unit 121 increases the reward r.
  • the reward calculation unit 121 can increase the reward r by giving a reward of "1", for example.
  • the reward calculation unit 121 reduces the reward r.
  • the reward calculation unit 121 can, for example, give a reward of "-1" to reduce the reward r.
  • the similarity is calculated according to a known method according to the type of features of the target image group.
  • the function update unit 122 updates the function for determining the image conversion parameter according to the reward r calculated by the reward calculation unit 121.
  • action value function Q (s t, a t) represented by Equation (1), and is used as a function for determining an image transform parameter.
  • FIG. 4 is a flowchart for explaining an operation example of the first learning unit 105 shown in FIG.
  • the operation shown in FIG. 4 is performed before the operation of the object recognition device 10 is started.
  • the state observation unit 11 of the first learning unit 105 acquires the sensor image group using the image acquisition unit 101 (step S101).
  • the state observation unit 11 selects one target image group from a plurality of predetermined target image groups (step S102).
  • the first learning unit 105 sets the image conversion parameters for the selected target image group (step S103).
  • the first learning unit 105 causes the image conversion unit 102 to perform image conversion of the sensor image using the set image conversion parameters (step S104).
  • the state observation unit 11 of the first learning unit 105 acquires the image conversion parameter, which is a state variable, the target image group, and the similarity between the converted image and the features of the target image group (step S105).
  • the state observation unit 11 outputs the acquired state variables to the machine learning unit 12.
  • the reward calculation unit 121 of the machine learning unit 12 determines whether or not the similarity is higher than the threshold value (step S106).
  • step S106: Yes When the similarity is higher than the threshold value (step S106: Yes), the reward calculation unit 121 increases the reward r (step S107). When the similarity is lower than the threshold value (step S106: No), the reward calculation unit 121 reduces the reward r (step S108). The reward calculation unit 121 outputs the calculated reward r to the function update unit 122.
  • the first learning unit 105 determines whether or not a predetermined learning end condition is satisfied (step S110). It is desirable that the learning end condition is a condition for determining that the learning accuracy of the image conversion parameter is equal to or higher than the standard. For example, the learning end conditions are "the number of times the processing of steps S103 to S109 is repeated exceeds a predetermined number of times" and "the elapsed time from the start of learning the image conversion parameters for the same target image group". Exceeding a predetermined time. "the number of times the processing of steps S103 to S109 is repeated exceeds a predetermined number of times" and "the elapsed time from the start of learning the image conversion parameters for the same target image group”. Exceeding a predetermined time. "
  • step S110: No When the learning end condition is not satisfied (step S110: No), the first learning unit 105 repeats the process from step S103. When the learning end condition is satisfied (step S110: Yes), the first learning unit 105 outputs the learning result of the image conversion parameter for the target image group (step S111).
  • the first learning unit 105 determines whether or not the learning for all the target image groups has been completed (step S112). When the learning for all the target image groups is not completed, that is, when there is a target image group for which the learning has not been completed (step S112: No), the first learning unit 105 repeats the process from step S102. When the learning for all the target image groups is completed (step S112: Yes), the first learning unit 105 ends the image conversion parameter learning process.
  • first learning unit 105 performs machine learning using reinforcement learning
  • first learning unit 105 describes other known methods such as neural networks, genetic programming, and functional logic programming.
  • Machine learning may be performed according to a support vector machine or the like.
  • FIG. 5 is a diagram for explaining an operation example when the first learning unit 105 shown in FIG. 1 uses CycleGAN (Generative Adversarial Networks).
  • the first learning unit 105 learns the image conversion parameters using CycleGAN.
  • first learning unit 105 as shown in FIG. 5, a first generator G, a second generator F, the first discriminator D X, and a second discriminator D Y Use to learn image conversion parameters.
  • the first learning unit 105 learns the image conversion parameters between the image groups X and Y using the training data of the two types of image groups X and Y.
  • the image included in the training data of the image group X is referred to as an image x
  • the image included in the training data of the image group Y is referred to as an image y.
  • the first generator G generates an image having the characteristics of the image group Y from the image x.
  • G (x) be the output when the image x is input to the first generator G.
  • the second generator F generates an image having the characteristics of the image group X from the image y.
  • F (y) be the output when the image y is input to the second generator F.
  • First discriminator D X is distinguish between x and F (y).
  • the second discriminator D Y distinguish between y and G (x).
  • First learning unit 105 two on the basis of the loss, image conversion accuracy of the first generator G and the second generator F is increased, the identification accuracy of the first discriminator D X and a second discriminator D Y Learn so that Specifically, the first learning section 105, the following equation (2) total loss indicated L (G, F, D X , D Y) is, to satisfy an objective function represented by the following formula (3) To learn.
  • First loss L GAN included in Equation (2) (G, D Y , X, Y) , when the first generator G generates the image G (x) having the characteristics of the image group Y from the image x It is a loss that occurs.
  • Second loss L GAN included in Equation (2) (F, D X , Y, X) , when the second generator F generated the image F (x) having the characteristics of the image group X from the image y It is a loss that occurs.
  • the third loss L cyc (G, F) included in the equation (2) the image x is input to the first generator G to generate the image G (x), and the generated image G (x) is used as the second image G (x).
  • the first learning section 105 the following on the basis of the four assumptions, the total loss total loss L (G, F, D X , D Y) a first generator so that smaller G and a second generator learns of F, performing total loss total loss L (G, F, D X , D Y) a learning of the first discriminator so increases D X and a second discriminator D Y.
  • the image G (x) converted by inputting the image x into the first generator G should be similar to the image group Y.
  • the image F (y) converted by inputting the image y into the second generator F should be similar to the image group X. 3. 3.
  • the image F (G (x)) converted by inputting the image G (x) into the second generator F should be similar to the image group X. 4.
  • the image G (F (y)) converted by inputting the image F (y) into the first generator G should be similar to the image group Y.
  • the first learning unit 105 is used in the first generator G that performs the above learning with the sensor image group as the image group X and the target image group as the image group Y and generates the target image group from the sensor image group.
  • the image conversion parameters are learned, and the learning result is output to the storage unit 106.
  • the first learning unit 105 performs the above learning for each of the plurality of types of target image groups, and learns the image conversion parameters for each target image group.
  • the storage unit 106 stores the image conversion parameters for each target image group, which is the learning result of the first learning unit 105.
  • the image conversion parameter determination unit 107 determines the image conversion parameter used by the image conversion unit 102 during operation from among a plurality of image conversion parameters based on the evaluation result performed by the evaluation unit 108 described later before the start of operation. ..
  • the image conversion parameter determination unit 107 notifies the image conversion unit 102 of the determined image conversion parameter.
  • the image conversion parameter determination unit 107 may, for example, use the image conversion parameter having the maximum evaluation value E c as the image conversion parameter used by the image conversion unit 102, or the evaluation unit 108 causes the output unit 104 to output the evaluation result.
  • the image conversion parameter selected after confirming the evaluation result output by the user may be used as the image conversion parameter used by the image conversion unit 102.
  • the output unit 104 adds each image conversion parameter to the evaluation result. It is conceivable to output the converted image when used. In this case, the user can check the converted image and select an image conversion parameter capable of performing conversion that suppresses light reflection.
  • the output unit 104 may output the evaluation value of the image conversion parameter whose evaluation value is equal to or more than the threshold value and the converted image, and may not output the image conversion parameter whose evaluation value is less than the threshold value.
  • the evaluation unit 108 evaluates each of the plurality of image conversion parameters based on the recognition result of the recognition unit 103 when each of the plurality of image conversion parameters is used. Specifically, the evaluation unit 108 calculates the evaluation value E c, and outputs the a is the evaluation result calculated evaluation value E c to each of the image conversion parameter determination unit 107 and an output unit 104.
  • the evaluation value E c calculated by the evaluation unit 108 is represented by, for example, the following mathematical formula (4).
  • the evaluation value E c is the sum of the value obtained by multiplying the weight coefficient w pr in recognition accuracy p r, a value obtained by multiplying the weight coefficient w tr to the inverse of the recognition processing time t r.
  • the values of the weighting coefficients w pr and w tr may be determined depending on what the user attaches importance to. For example, if it is desired to emphasize the speed of the recognition process even if the recognition accuracy is slightly lowered, the value of the weighting coefficient w pr may be reduced and the value of the weighting coefficient w tr may be increased. On the contrary, when the recognition accuracy is emphasized even if it takes time, the value of the weighting coefficient w pr may be increased and the value of the weighting coefficient w tr may be decreased.
  • the recognition accuracy pr is the degree to which the target object in the sensor image can be recognized, or the error of the state of the target object, specifically, the error of the position and orientation.
  • the recognition accuracy pr is expressed by the following mathematical formula (5).
  • n r indicates the number of recognizable target objects
  • N w indicates the number of target objects in the sensor image.
  • recognition accuracy p r represented by the equation (5), the number n r of the recognized target object, which is divided by the number N w of the object in the sensor image. If the error between the position and orientation of the target object in the sensor image and the recognized position and orientation is within the threshold value, it may be determined that the recognition is successful, or the user visually determines whether or not the recognition is successful. You may.
  • recognition accuracy p r is expressed by the following equation (6).
  • x w indicates the actual position / orientation of the target object
  • x r indicates the recognized position / orientation
  • recognition accuracy p r represented by the equation (6) is the inverse of the value obtained by adding 1 to the absolute value of the difference between the actual position and orientation x w and recognized position and orientation x r of the target object.
  • the actual position / orientation and the recognized position / orientation of the target object may be the position / orientation in the image space or the position / orientation in the real space.
  • the recognition accuracy pr is not limited to the above example.
  • the above examples may be combined.
  • the evaluation value E c may be calculated using the following equation (7).
  • Tr indicates a recognition processing time threshold. That is, when using Equation (7), when the recognition process within the recognition processing time threshold T r is completed, the evaluation value E c is a value obtained by multiplying the weight coefficient w pr in recognition accuracy p r, the recognition processing time If the recognition process is not completed within the threshold T r , the evaluation value E c is 0. Recognition processing time The recognition processing is not completed within the threshold value T r. By setting the evaluation value E c of the image conversion parameter to 0, the image conversion parameter that can complete the recognition processing within the time required by the user can be confirmed and confirmed. It becomes possible to select.
  • the method for calculating the evaluation value E c is not limited to the above.
  • the input receiving unit 109 receives the input of the evaluation parameter, which is a parameter used by the evaluation unit 108 to evaluate the image conversion parameter.
  • the input receiving unit 109 may accept evaluation parameters input by the user using an input device or the like, may receive evaluation parameters from a functional unit in the object recognition device 10, or may receive evaluation parameters from an external device of the object recognition device 10. Evaluation parameters may be accepted from.
  • the evaluation parameters received by the input receiving unit 109 include, for example, the weighting coefficients w pr, w tr included in the mathematical formula (4), and the influence of each of a plurality of elements affecting the magnitude of the evaluation value on the evaluation value. It is a weighting factor for changing.
  • FIG. 6 is a flowchart for explaining the process performed by the object recognition device 10 shown in FIG. 1 before the start of operation.
  • the first learning unit 105 of the object recognition device 10 performs the image conversion parameter learning process (step S121). Since the image conversion parameter learning process shown in step S121 is the process described with reference to FIG. 4 or the process described with reference to FIG. 5, detailed description thereof will be omitted here.
  • the input receiving unit 109 acquires the evaluation parameters and outputs the acquired evaluation parameters to the evaluation unit 108 (step S122).
  • the image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S123).
  • the image conversion unit 102 selects one image conversion parameter for which the evaluation value has not yet been calculated from the plurality of learned image conversion parameters stored in the storage unit 106 (step S124).
  • the image conversion unit 102 performs an image conversion process of converting the sensor image acquired by the image acquisition unit 101 into an image after conversion using the selected image conversion parameter (step S125).
  • the image conversion unit 102 outputs the converted image to the recognition unit 103.
  • the recognition unit 103 performs recognition processing using the converted image and outputs the recognition result to the evaluation unit 108 (step S126). When outputting the recognition result, the recognition unit 103 may output the recognition result to the output unit 104.
  • the evaluation unit 108 calculates the evaluation value E c based on the recognition result, and outputs the calculated evaluation value E c to the image conversion parameter determination unit 107 (step S127).
  • the image conversion unit 102 determines whether or not the evaluation values E c of all the image conversion parameters have been calculated (step S128).
  • the evaluation values E c of all the image conversion parameters have not been calculated (step S128: No)
  • the image conversion unit 102 starts from step S124. Repeat the process.
  • the image transformation parameter determination unit 107 from among a plurality of image transformation parameters, based on the evaluation value is an evaluation result of the evaluation unit 108
  • the image conversion parameter used by the image conversion unit 102 during operation is determined (step S129).
  • FIG. 7 is a flowchart for explaining the operation of the object recognition device 10 shown in FIG. 1 during operation. Before the operation, the operation shown in FIG. 6 is performed, the image conversion parameters have been learned for each target image group, and the image conversion parameters used by the image conversion unit 102 from the learned image conversion parameters. Is selected.
  • the image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S131).
  • the image conversion unit 102 acquires the selected image conversion parameter (step S132).
  • the image conversion unit 102 performs an image conversion process for converting the sensor image into a converted image using the acquired image conversion parameters, and outputs the converted image to the recognition unit 103 (step S133).
  • the recognition unit 103 uses the converted image to perform a recognition process for recognizing the state of the target object included in the converted image, and outputs the recognition result to the output unit 104 (step S134).
  • the output unit 104 determines whether or not the target object exists based on the recognition result (step S135). When the target object exists (step S135: Yes), the output unit 104 outputs the recognition result (step S136). After outputting the recognition result, the image acquisition unit 101 repeats the process from step S131. When the target object does not exist (step S135: No), the object recognition device 10 ends the process.
  • the image conversion unit 102 converts the sensor image into a converted image by a one-step image conversion process, but the present embodiment is not limited to such an example.
  • the image conversion unit 102 may perform image conversion in a plurality of stages to convert the sensor image into an image after conversion. For example, when two-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image and converts the first intermediate image into a converted image. When three-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image, converts the first intermediate image into a second intermediate image, and converts the second intermediate image. Convert to a later image.
  • the first learning unit 105 learns each of the plurality of types of image conversion parameters used in each stage of the image conversion. Specifically, the first learning unit 105 sets a first image conversion parameter for converting the sensor image into an intermediate image and a second image conversion parameter for converting the intermediate image into a converted image. learn. Further, when three or more steps of image conversion are performed, the first learning unit 105 learns a third image conversion parameter for converting an intermediate image into an intermediate image. For example, when two-step image conversion is performed, the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the converted image of the first intermediate image. Learn with a second image conversion parameter for conversion to.
  • the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the first intermediate image into the second intermediate image.
  • a third image conversion parameter for converting to an intermediate image and a second image conversion parameter for converting a second intermediate image into a converted image are learned.
  • the intermediate image is an image that is different from both the sensor image and the converted image.
  • the converted image is a distance image generated using CG (Computer Graphic) without noise or omission
  • the intermediate image is simulated for noise, measurement error, omission of the blind spot of the sensor, etc. It can be a reproduced reproduced image.
  • the first learning unit 105 has a first image conversion parameter for converting the sensor image into an intermediate image which is a reproduced image, and a second learning unit 105 for converting the intermediate image into a converted image which is a distance image. Learn the image conversion parameters of. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.
  • the converted image may be obtained by dividing the converted image into a plurality of types of component images, converting the sensor image into a plurality of component images, and then synthesizing the images.
  • the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each component image. For example, from one sensor image, a texture image which is a component image having the characteristics of the texture component of the converted image and a color image which is a component image having the characteristics of the global color component of the converted image are generated. It is conceivable that the texture image and the color image are combined to obtain a converted image.
  • the first learning unit 105 learns an image conversion parameter for converting the sensor image into a texture image and an image conversion parameter for converting the sensor image into a color image.
  • a converted image can also be obtained by using three or more component images.
  • the problem to be solved is facilitated, so that the convergence of learning can be improved and the recognition performance can be improved.
  • synthesizing a plurality of component images to obtain a converted image a converted image having characteristics closer to the target image group is obtained than when a converted image is obtained from a sensor image using one type of image conversion parameter. Will be possible.
  • the image processing to be performed has features, properties, and the like that the image should have. Therefore, instead of converting the image used for recognition only once, image conversion that facilitates each image processing in the recognition process can be executed each time as a preprocessing for each image processing.
  • the first learning unit 105 only needs to learn the image conversion parameters for the number of image processes for which preprocessing is desired, and targets an ideal processing result image group obtained when each image processing is executed. Can be a group.
  • the image conversion parameter can be evaluated based on the recognition processing result, and the evaluation result can be obtained. Therefore, it is possible to confirm the influence of the image conversion parameter on the recognition process. Therefore, it is possible to select the image conversion parameter according to the environment when the recognition process is executed, and it is possible to improve the recognition performance even when the environment when the recognition process is executed changes. It becomes.
  • the image conversion parameter is a parameter for image conversion of the sensor image into an image having predetermined features.
  • the object recognition device 10 has a first learning unit 105 that learns image conversion parameters for each predetermined feature, and the image conversion unit 102 uses an image conversion parameter that is a learning result of the first learning unit 105.
  • the sensor image is converted into an image using the image.
  • the output unit 104 can obtain the evaluation result of the image conversion parameter which is the learning result for each predetermined feature. Therefore, it is possible to grasp what kind of characteristics the image has to be converted into an image so that the recognition performance can be improved.
  • the image conversion unit 102 performs image conversion in a plurality of stages to convert the sensor image into an image after conversion, and the first learning unit 105 is used for each of the plurality of image conversion stages. Learn each of the types of image conversion parameters. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.
  • the image conversion unit 102 can convert the sensor image into a plurality of component images and then synthesize the plurality of component images to acquire the converted image.
  • the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each of the plurality of component images.
  • the object recognition device 10 has an image conversion parameter determination unit 107 that determines the image conversion parameter used by the image conversion unit 102 based on the evaluation result of the evaluation unit 108 when each of the plurality of image conversion parameters is used. ..
  • an image conversion parameter determination unit 107 that determines the image conversion parameter used by the image conversion unit 102 based on the evaluation result of the evaluation unit 108 when each of the plurality of image conversion parameters is used. ..
  • the object recognition device 10 has an input receiving unit 109 that receives input of evaluation parameters, which are parameters used by the evaluation unit 108 to evaluate image conversion parameters.
  • the evaluation unit 108 evaluates the image conversion parameter using the evaluation parameter received by the input reception unit 109.
  • the evaluation parameter is, for example, a weighting coefficient for changing the influence of each of the plurality of elements affecting the magnitude of the evaluation value on the evaluation value.
  • the recognition result output by the recognition unit 103 of the object recognition device 10 includes at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103.
  • the evaluation unit 108 calculates the evaluation value of the image conversion parameter based on at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103. become.
  • the number n r of the target object recognition unit 103 recognizes, by using the number N r of the real object, can be calculated recognition accuracy p r. Therefore, the object recognition device 10 can evaluate the image conversion parameters in consideration of the recognition processing time, the recognition accuracy pr, and the like.
  • FIG. 8 is a diagram showing a functional configuration of the object recognition device 20 according to the second embodiment.
  • the object recognition device 20 evaluates the image acquisition unit 101, the image conversion unit 120, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, and a robot 110. Since the object recognition device 20 includes the robot 110 and has a function of picking an object, it can also be called an object extraction device. Since the object recognition device 20 includes the robot 110, the image conversion parameters can be evaluated based on the operation result of the robot 110.
  • the object recognition device 20 has a robot 110 in addition to the functional configuration of the object recognition device 10 according to the first embodiment.
  • the same functional configuration as that of the first embodiment will be omitted in detail by using the same reference numerals as those of the first embodiment, and the parts different from the first embodiment will be mainly described.
  • the output unit 104 outputs the recognition result of the recognition unit 103 to the robot 110.
  • the robot 110 grips the target object based on the recognition result output by the output unit 104.
  • the robot 110 outputs the operation result of the operation of gripping the target object to the evaluation unit 108.
  • the evaluation unit 108 evaluates the image conversion parameter based on the operation result of the robot 110 in addition to the recognition result of the recognition unit 103.
  • the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in gripping the target object, the gripping operation time, and the cause of the grip failure.
  • the robot 110 has a tool capable of grasping an object and performing an object operation necessary for executing a task.
  • a suction pad can be used as a tool.
  • the tool may be a gripper hand that grips the target object by sandwiching it with two claws.
  • the condition for determining that the robot 110 has successfully grasped the target object is that, for example, when the tool is a gripper hand, the opening width when the gripper hand is inserted into the target object and the gripper hand is closed is predetermined. It can be within the specified range.
  • the condition for determining that the robot 110 succeeds in gripping the target object is the target at the transport destination. It may be assumed that the target object can be held immediately before the gripper hand is released from the object.
  • the conditions for determining that the robot 110 has succeeded in grasping the target object are not limited to the above examples, and can be appropriately defined depending on the type of tool possessed by the robot 110, the work content to be performed by the robot 110, and the like.
  • Whether or not the target object can be held can be determined by using the detection result, for example, when the tool being used is equipped with a function of detecting the holding state of the target object. Alternatively, it may be determined whether or not the target object can be held by using the information of an external sensor such as a camera. For example, when the tool possessed by the robot 110 is an electric hand, there is a product having a function of determining whether or not the target object can be held by measuring the current value when operating the electric hand.
  • the image of the tool when the target object is not grasped can be stored in advance, the difference from the image taken by the tool after the gripping operation can be taken, and the target object can be held based on the difference. There is a way to determine if it is.
  • the operation result of the robot 110 can also include the gripping operation time.
  • the gripping operation time can be the time from closing the gripper hand to opening the gripper hand at the transport destination.
  • Causes of grip failure of the robot 110 include, for example, failure to grip, dropping during transportation, and multiple grips.
  • the evaluation unit 108 evaluates the image conversion parameter based on the cause of failure, so that the image conversion unit 102 can reduce the specific cause of failure. It is possible to use conversion parameters. For example, even if the target object fails to be gripped in the supply box that stores the target object before supply, the target object is likely to fall into the supply box and the gripping operation may be performed again, so the risk is high. low. On the other hand, if the target object is dropped during transportation, the target object may fall and be scattered around, and complicated control of the robot 110 may be required to return to the original state.
  • the image conversion unit 102 has the target object in the surroundings. It is possible to use image conversion parameters with less risk of scattering.
  • FIG. 9 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 before the start of operation.
  • the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 6, and detailed description thereof will be omitted.
  • the parts different from FIG. 6 will be mainly described.
  • step S121 to step S126 The operation from step S121 to step S126 is the same as in FIG.
  • the robot 110 performs picking based on the recognition result (step S201).
  • the robot 110 outputs the picking operation result to the evaluation unit 108.
  • the evaluation unit 108 calculates an evaluation value based on the operation result of the robot 110 in addition to the recognition result (step S202). Specifically, the evaluation unit 108 can calculate the evaluation value E c by using, for example, the following mathematical formula (8).
  • Equation (8) p g denotes the gripping success rate, t g represents the gripping time, p r represents the recognition accuracy, t r represents the recognition processing time, n f1, f2 ... gripping failure cause Indicates the type. Further, w pg , w tg , w pr , w tr , w f1, f2 ... Indicates a weighting coefficient.
  • the evaluation parameters received by the input receiving unit 109 include weighting coefficients w pg , w tg , w pr , w tr , w f1, f2, and so on.
  • the above method for calculating the evaluation value E c is an example, and the method for calculating the evaluation value E c used by the evaluation unit 108 is not limited to the above method.
  • steps S128 and S129 are the same as those in FIG. That is, in the process shown in FIG. 9, the point that the picking process is additionally performed between the recognition process and the process of calculating the evaluation value and the specific contents of the process of calculating the evaluation value are shown in FIG. Different from processing.
  • FIG. 10 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 during operation.
  • the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 7, and detailed description thereof will be omitted.
  • the parts different from those in FIG. 7 will be mainly described.
  • the object recognition device 20 determines that the target object exists as a result of the recognition process, the object recognition device 20 outputs the recognition result, whereas the object recognition device 20 outputs the recognition result by the robot 110 instead of the recognition result output.
  • Picking is performed based on (step S203). After the robot 110 picks, the object recognition device 20 repeats the process from step S131.
  • the recognition unit 103 recognizes the state of the target object based on the converted image, but the recognition unit 103 of the object recognition device 20 having the robot 110 uses the hand model of the robot 110.
  • the state of the target object may be recognized using a search-based method of searching for a location where the target object can be gripped.
  • the recognition result is the position / orientation information of the target object, it is desirable that the position / orientation information of the target object can be converted into the position / attitude information of the robot 110 when the robot 110 grips the target object.
  • the object recognition device 20 further includes a robot 110 that grips the target object based on the recognition result of the recognition unit 103.
  • the evaluation unit 108 of the object recognition device 20 evaluates the image conversion parameters based on the operation result of the robot 110.
  • the object recognition device 20 can select an image conversion parameter that can improve the gripping performance, and can improve the gripping success rate of the robot 110.
  • the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in grasping the target object, the gripping operation time, and the cause of the grip failure.
  • the image conversion parameter is evaluated based on the gripping success rate, so the image conversion parameter that can improve the gripping success rate is selected. This makes it possible to improve the gripping success rate of the robot 110.
  • the gripping operation time is included in the operation result, the image conversion parameter is evaluated based on the gripping operation time, so that the gripping operation time can be shortened.
  • the cause of grip failure is included in the operation result, the image conversion parameter is evaluated based on the cause of grip failure, so that it is possible to reduce a specific cause of grip failure.
  • FIG. 11 is a diagram showing a functional configuration of the object recognition device 30 according to the third embodiment.
  • the object recognition device 30 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, a robot 110, a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115.
  • the simulation unit 111 has a first generation unit 112 and a second generation unit 113.
  • the object recognition device 30 includes a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115, in addition to the configuration of the object recognition device 20 according to the second embodiment.
  • a simulation unit 111 an image conversion data set generation unit 114, and an image conversion data set selection unit 115, in addition to the configuration of the object recognition device 20 according to the second embodiment.
  • the same functional configuration as in the second embodiment will be described in detail by using the same reference numerals as those in the second embodiment, and the parts different from the second embodiment will be mainly described.
  • the simulation unit 111 creates a target image using the simulation. Specifically, the simulation unit 111 generates a target image by arranging the first generation unit 112 that generates arrangement information indicating the arrangement state of the target object based on the simulation conditions and the target object based on the arrangement information. It has a second generation unit 113.
  • the simulation conditions used by the first generation unit 112 include, for example, sensor information, target object information, and environmental information. It is desirable that the sensor information includes information such as the focal length, angle of view, and aperture value of the sensor that acquires the sensor image, which changes the state in the space generated by the values. Further, when the sensor performs stereo measurement, the sensor information may include a convergence angle, a baseline length, and the like.
  • the target object information is a CAD model of the target object, information indicating the material of the target object, and the like.
  • the target object information may include the texture information of each surface of the target object. It is desirable that the target object information includes information to the extent that the state of the target object in the space is uniquely determined when the target object is placed in the space by using simulation.
  • Environmental information can include measurement distance, measurement depth, position / orientation of an object other than the target object, type and intensity of ambient light, and the like.
  • Objects other than the target object are, for example, a box, a measuring table, and the like.
  • the simulation unit 111 can perform the simulation under detailed conditions and can generate various types of target images.
  • the arrangement information generated by the first generation unit 112 indicates the arrangement state of at least one target object.
  • the plurality of target objects may be arranged in an aligned manner or may be in a bulk state.
  • the processing time can be shortened by arranging the target objects at the calculated simple model positions after performing the simulation using the simple model of the target objects.
  • the target image generated by the second generation unit 113 may be an RGB image or a distance image.
  • RGB image it is desirable to set the color or texture of the target object and objects other than the target object.
  • the simulation unit 111 stores the generated target image in the storage unit 106. Further, the simulation unit 111 may store the simulation conditions used when the first generation unit 112 generates the arrangement information and the arrangement information generated by the first generation unit 112 in the storage unit 106. At this time, it is desirable that the simulation unit 111 stores the arrangement information in association with the target image constituting the image conversion data set.
  • the image conversion data set generation unit 114 generates an image conversion data set including the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111.
  • the image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106.
  • the image conversion dataset includes one or more sensor images and one or more target images. There is no limit to the number of images of the sensor image and the target image. If the number of images is too small, the learning of the image conversion parameters may not converge, and if the number of images is too large, the learning time may become long. Therefore, it is preferable to determine the number of images according to the intended use of the user, the installation status of the sensor, and the like. Further, the number of images of the target image and the number of images of the sensor image are preferably about the same, but there may be a bias.
  • the image conversion data set selection unit 115 selects the image conversion data set used for learning by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 based on the sensor image. Specifically, the image conversion data set selection unit 115, based on the sensor image, and calculates the selected evaluation value E p as a reference in selecting the converted image data sets, based on the calculated selected evaluation value E p And select the image conversion dataset. For example, the image conversion data set selection unit 115 can select only the image conversion data set whose selection evaluation value E p is equal to or less than a predetermined threshold value. The image conversion data set selection unit 115 can select one or a plurality of image conversion data sets.
  • the image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105.
  • the first learning unit 105 learns the image conversion parameters using the image conversion data set selected by the image conversion data set selection unit 115. Therefore, the first learning unit 105 learns the image conversion parameters using the target image generated by the simulation unit 111.
  • the selective evaluation value E p is calculated using, for example, the following mathematical formula (9).
  • I t represents the sensor image
  • II s represents the target image groups constituting the converted image data set
  • N s denotes the number of images of the target image included in the target image group.
  • F I (I) refers to any function for calculating the scalar values from the image I.
  • F I (I) is, for example, average value calculation function of the image, the edge number calculation function, and the like.
  • the image conversion data set selection unit 115 uses the following formula (10) to select and evaluate the evaluation value. E p may be calculated.
  • l s indicates the measurement distance of the sensor that acquires the sensor image
  • l t indicates the measurement distance of the target image constituting the target image group
  • w I and w l indicate the weighting coefficient. If the measurement distance of the sensor is not exactly known, an approximate distance may be used.
  • the method for calculating the selective evaluation value E p is an example, and is not limited to the above method.
  • FIG. 12 is a flowchart for explaining the operation of the simulation unit 111 shown in FIG.
  • the first generation unit 112 of the simulation unit 111 acquires the simulation conditions (step S301).
  • the simulation conditions are acquired from, for example, a storage area provided in the simulation unit 111.
  • the first generation unit 112 generates placement information indicating the placement state of the target object based on the simulation conditions (step S302).
  • the first generation unit 112 outputs the generated arrangement information to the second generation unit 113 of the simulation unit 111.
  • the second generation unit 113 arranges the target object based on the arrangement information generated by the first generation unit 112 and generates a target image (step S303).
  • the second generation unit 113 outputs the generated target image and stores it in the storage unit 106 (step S304).
  • FIG. 13 is a flowchart for explaining the process performed by the object recognition device 30 shown in FIG. 11 before the start of operation.
  • the same parts as those of the object recognition device 10 or the object recognition device 20 are designated by the same reference numerals as those in FIG. 6 or 9, and detailed description thereof will be omitted.
  • the parts different from those in FIG. 6 or 9 will be mainly described.
  • the simulation unit 111 of the object recognition device 30 first performs a simulation process (step S311).
  • the simulation process of step S311 is the process shown in steps S301 to S304 of FIG.
  • the image conversion data set generation unit 114 generates an image conversion data set using the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111 (step S312).
  • the image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106.
  • the image conversion data set selection unit 115 selects the image conversion data set used by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 (step S313).
  • the image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105.
  • step S121 the image conversion parameter learning process will be executed using the image conversion data set selected in step S313.
  • the object recognition device 30 creates a target image using simulation, and learns image conversion parameters using the created target image. Further, the object recognition device 30 generates an image conversion data set including a target image created by using simulation and a sensor image acquired by the image acquisition unit 101, and uses the generated image conversion data set to perform image conversion. Learn the parameters. Having such a configuration makes it possible to easily generate a target image and an image conversion data set necessary for learning image conversion parameters. Further, the target image is generated based on the simulation conditions, and is generated based on the arrangement information indicating the arrangement state of the target object. Therefore, by adjusting the simulation conditions, it is possible to generate various target images.
  • the object recognition device 30 selects an image conversion data set to be used by the first learning unit 105 from the image conversion data sets generated by the image conversion data set generation unit 114 based on the sensor image. It has a part 115. By having such a configuration, it becomes possible to learn the image conversion parameters only for the image conversion data set suitable for the surrounding environment, and it is possible to improve the learning efficiency.
  • FIG. 14 is a diagram showing a functional configuration of the object recognition device 40 according to the fourth embodiment.
  • the object recognition device 40 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like.
  • the object recognition device 40 has a recognition data set generation unit 116, a second learning unit 117, and a recognition parameter determination unit 118, in addition to the configuration of the object recognition device 30 according to the third embodiment.
  • a recognition data set generation unit 116 a recognition data set generation unit 116, a second learning unit 117, and a recognition parameter determination unit 118, in addition to the configuration of the object recognition device 30 according to the third embodiment.
  • the same functional configuration as in the third embodiment will be described in detail by using the same reference numerals as those in the third embodiment, and the parts different from the third embodiment will be mainly described.
  • the recognition data set generation unit 116 generates annotation data to be used when the recognition unit 103 performs recognition processing based on the recognition method used by the recognition unit 103, and generates a recognition data set including the generated annotation data and a target image. Generate.
  • the recognition data set generation unit 116 stores the generated recognition data set in the storage unit 106.
  • the annotation data differs depending on the recognition method used by the recognition unit 103. For example, when the recognition method is a neural network that outputs the position and size of the target object on the image, the annotation data is the position and size of the target object on the image.
  • the second learning unit 117 learns the recognition parameter, which is a parameter used by the recognition unit 103, based on the recognition data set generated by the recognition data set generation unit 116.
  • the second learning unit 117 can be realized, for example, by the same configuration as the first learning unit 105 shown in FIG.
  • the second learning unit 117 includes a state observation unit 11 and a machine learning unit 12.
  • the machine learning unit 12 includes a reward calculation unit 121 and a function update unit 122.
  • the example shown in FIG. 3 is an example of performing machine learning using reinforcement learning, but the second learning unit 117 uses other known methods such as neural networks, genetic programming, and functional logic programming. Machine learning may be performed according to a support vector machine or the like.
  • the second learning unit 117 stores the learning result of the recognition parameter in the storage unit 106.
  • the recognition parameter includes, for example, when the recognition method uses a neural network, the recognition parameter includes a weighting coefficient between each unit constituting the neural network.
  • the recognition parameter determination unit 118 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used.
  • the recognition parameter determination unit 118 outputs the determined recognition parameter to the recognition unit 103.
  • the recognition parameter determination unit 118 can, for example, set the recognition parameter having the largest evaluation value as the recognition parameter used by the recognition unit 103. Further, when the output unit 104 outputs the evaluation result of the evaluation unit 108 for each recognition parameter and the input reception unit 109 accepts the input for selecting the recognition parameter, the recognition parameter determination unit 118 recognizes the recognition parameter selected by the user. It can also be output to unit 103. Further, since it is considered that the evaluation value of the recognition parameter changes depending on the image conversion parameter, a plurality of evaluation values may be calculated by changing the image conversion parameter used by the image conversion unit 102 for one learned recognition parameter. .. In this case, the image conversion parameter determination unit 107 can determine the image conversion parameter based on the combination of the calculated evaluation value and the image conversion parameter.
  • FIG. 15 is a flowchart for explaining the processing performed by the object recognition device 40 shown in FIG. 14 before the start of operation.
  • the same parts as those of the object recognition device 30 are designated by the same reference numerals as those in FIG. 13, and detailed description thereof will be omitted.
  • the parts different from FIG. 13 will be mainly described.
  • the object recognition device 40 After performing the simulation process of step S311, the object recognition device 40 generates a recognition data set in parallel with the process of steps S312, S313, and S121 (step S401), and uses the generated recognition data set to generate recognition parameters.
  • the recognition parameter learning process for learning is performed (step S402).
  • the object recognition device 40 selects the image conversion parameter and the recognition parameter after the processing of steps S122 and S123 (step S403).
  • the processing of steps S125, S126, S201, and S202 is the same as that of the object recognition device 30.
  • the image conversion unit 102 of the object recognition device 40 determines whether or not the evaluation value of all the image conversion parameters and the combination of the recognition parameters has been calculated (step S404).
  • the object recognition device 40 performs the process of step S129 and determines the recognition parameters (step S405).
  • the object recognition device 40 returns to the process of step S403.
  • the object recognition device 40 As described above, the object recognition device 40 according to the fourth embodiment generates annotation data used by the recognition unit 103 based on the recognition method used by the recognition unit 103, and generates the generated annotation data and the target image. Learn the recognition parameters using the included recognition data set. With such a configuration, the object recognition device 40 can easily generate recognition data sets of various situations.
  • the object recognition device 40 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used.
  • the object recognition device 40 can perform recognition processing using recognition parameters suitable for the target object, the surrounding environment, and the like, and can improve the recognition success rate and the gripping success rate. become.
  • Each component of the object recognition device 10, 20, 30, and 40 is realized by a processing circuit.
  • processing circuits may be realized by dedicated hardware, or may be control circuits using a CPU (Central Processing Unit).
  • FIG. 16 is a diagram showing dedicated hardware for realizing the functions of the object recognition devices 10, 20, 30, and 40 according to the first to fourth embodiments.
  • the processing circuit 90 is a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a combination thereof.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • FIG. 17 is a diagram showing a configuration of a control circuit 91 for realizing the functions of the object recognition devices 10, 20, 30, and 40 according to the first to fourth embodiments.
  • the control circuit 91 includes a processor 92 and a memory 93.
  • the processor 92 is a CPU, and is also called a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), or the like.
  • the memory 93 is, for example, a non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable ROM), and EPROM (registered trademark) (Electrically EPROM). Magnetic discs, flexible discs, optical discs, compact discs, mini discs, DVDs (Digital Versatile Disks), etc.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • flash memory EPROM (Erasable Programmable ROM), and EPROM (registered trademark) (Electrically EPROM).
  • Magnetic discs flexible discs, optical discs, compact discs, mini discs, DVDs (Digital Versatile Disks), etc.
  • the control circuit 91 When the above processing circuit is realized by the control circuit 91, it is realized by the processor 92 reading and executing the program corresponding to the processing of each component stored in the memory 93.
  • the memory 93 is also used as a temporary memory in each process executed by the processor 92.
  • the computer program executed by the processor 92 may be provided via a communication network, or may be provided in a state of being stored in a storage medium.
  • the configuration shown in the above embodiments is an example, and can be combined with another known technique, can be combined with each other, and does not deviate from the gist. It is also possible to omit or change a part of the configuration.

Abstract

An object recognition device (10) is characterized by comprising: an image acquisition unit (101) that acquires images of a target object; an image conversion unit (102) that uses image conversion parameters to subject sensor images, which are the images acquired by the image acquisition unit (101), to image conversion and outputs the converted images; a recognition unit (103) that recognizes the state of the target object on the basis of the converted images; an evaluation unit (108) that, on the basis of the recognition results of the recognition unit (103), evaluates the image conversion parameters used for generating the converted images; and an output unit (104) that outputs the recognition results and the evaluation results of the evaluation unit (108).

Description

物体認識装置および物体認識方法Object recognition device and object recognition method
 本開示は、対象物体を撮影した画像に基づいて対象物体を認識する物体認識装置および物体認識方法に関する。 The present disclosure relates to an object recognition device and an object recognition method that recognize an object object based on a photographed image of the object object.
 各種の産業において、物体の位置姿勢など物体の状態を把握する認識技術が開発されている。認識技術は、例えば、産業用ロボットが物体を把持して搬送する際に、産業用ロボットを物体の状態に合わせて制御するために用いられる。特許文献1には、対象の物体を把持する把持システムにおいて、対象物体を撮影した画像に基づいて、物体の状態を認識する技術が開示されている。 In various industries, recognition technology for grasping the state of an object such as the position and orientation of the object has been developed. The recognition technology is used, for example, to control an industrial robot according to the state of the object when the industrial robot grips and conveys the object. Patent Document 1 discloses a technique for recognizing the state of an object based on an image of the target object in a gripping system that grips the target object.
特開2018-205929号公報JP-A-2018-205929
 しかしながら、特許文献1に開示された技術によれば、認識処理を実行するときの環境、例えば、対象物体の周辺環境、計測条件などが変化する場合、認識性能が低下する場合があるという問題があった。 However, according to the technique disclosed in Patent Document 1, there is a problem that the recognition performance may deteriorate when the environment when the recognition process is executed, for example, the surrounding environment of the target object, the measurement conditions, and the like change. there were.
 本開示は、上記に鑑みてなされたものであって、認識処理を実行するときの環境が変化する場合であっても、認識性能を向上させることが可能な物体認識装置を得ることを目的とする。 The present disclosure has been made in view of the above, and an object of the present disclosure is to obtain an object recognition device capable of improving recognition performance even when the environment when executing recognition processing changes. do.
 上述した課題を解決し、目的を達成するために、本開示の物体認識装置は、対象物体の画像を取得する画像取得部と、画像変換パラメータを用いて、画像取得部が取得した画像であるセンサ画像を画像変換して変換後画像を出力する画像変換部と、変換後画像に基づいて、対象物体の状態を認識する認識部と、認識部の認識結果に基づいて、変換後画像を生成するために用いられた画像変換パラメータを評価する評価部と、認識結果および評価部の評価結果を出力する出力部と、を備えることを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the object recognition device of the present disclosure is an image acquired by the image acquisition unit using an image acquisition unit that acquires an image of the target object and an image conversion parameter. An image conversion unit that converts the sensor image into an image and outputs the converted image, a recognition unit that recognizes the state of the target object based on the converted image, and a conversion unit that generates a converted image based on the recognition result of the recognition unit. It is characterized by including an evaluation unit for evaluating the image conversion parameter used for the purpose, and an output unit for outputting the recognition result and the evaluation result of the evaluation unit.
 本開示によれば、認識処理を実行するときの環境が変化する場合であっても、認識性能を向上させることが可能であるという効果を奏する。 According to the present disclosure, it is possible to improve the recognition performance even when the environment when executing the recognition process changes.
実施の形態1にかかる物体認識装置の機能構成を示す図The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 1. 図1に示す出力部が表示する表示画面の一例を示す図The figure which shows an example of the display screen displayed by the output part shown in FIG. 図1に示す第1の学習部の詳細な構成の一例を示す図The figure which shows an example of the detailed structure of the 1st learning part shown in FIG. 図1に示す第1の学習部の動作例を説明するためのフローチャートA flowchart for explaining an operation example of the first learning unit shown in FIG. 図1に示す第1の学習部がCycleGANを用いる場合の動作例を説明するための図The figure for demonstrating the operation example when the 1st learning part shown in FIG. 1 uses CycleGAN. 図1に示す物体認識装置が運用開始前に行う処理について説明するためのフローチャートA flowchart for explaining the processing performed by the object recognition device shown in FIG. 1 before the start of operation. 図1に示す物体認識装置の運用中の動作を説明するためのフローチャートA flowchart for explaining the operation of the object recognition device shown in FIG. 1 during operation. 実施の形態2にかかる物体認識装置の機能構成を示す図The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 2. 図8に示す物体認識装置が運用開始前に行う処理について説明するためのフローチャートA flowchart for explaining the processing performed by the object recognition device shown in FIG. 8 before the start of operation. 図8に示す物体認識装置が運用中に行う処理について説明するためのフローチャートA flowchart for explaining the processing performed by the object recognition device shown in FIG. 8 during operation. 実施の形態3にかかる物体認識装置の機能構成を示す図The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 3. 図11に示すシミュレーション部の動作を説明するためのフローチャートA flowchart for explaining the operation of the simulation unit shown in FIG. 図11に示す物体認識装置が運用開始前に行う処理について説明するためのフローチャートA flowchart for explaining the processing performed by the object recognition device shown in FIG. 11 before the start of operation. 実施の形態4にかかる物体認識装置の機能構成を示す図The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 4. 図13に示す物体認識装置が運用開始前に行う処理について説明するためのフローチャートA flowchart for explaining the processing performed by the object recognition device shown in FIG. 13 before the start of operation. 実施の形態1~4にかかる物体認識装置の機能を実現するための専用のハードウェアを示す図The figure which shows the dedicated hardware for realizing the function of the object recognition apparatus which concerns on Embodiments 1 to 4. 実施の形態1~4にかかる物体認識装置の機能を実現するための制御回路の構成を示す図The figure which shows the structure of the control circuit for realizing the function of the object recognition apparatus which concerns on Embodiments 1 to 4.
 以下に、本開示の実施の形態にかかる物体認識装置および物体認識方法を図面に基づいて詳細に説明する。なお、以下に示す実施の形態により本開示の技術的範囲が限定されるものではない。 The object recognition device and the object recognition method according to the embodiment of the present disclosure will be described in detail below with reference to the drawings. The technical scope of the present disclosure is not limited by the embodiments shown below.
実施の形態1.
 図1は、実施の形態1にかかる物体認識装置10の機能構成を示す図である。物体認識装置10は、画像取得部101と、画像変換部102と、認識部103と、出力部104と、第1の学習部105と、記憶部106と、画像変換パラメータ決定部107と、評価部108と、入力受付部109とを有する。物体認識装置10は、対象物体を撮影した画像に基づいて、対象物体の位置姿勢といった状態を認識する機能を有する。
Embodiment 1.
FIG. 1 is a diagram showing a functional configuration of the object recognition device 10 according to the first embodiment. The object recognition device 10 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108 and an input receiving unit 109. The object recognition device 10 has a function of recognizing a state such as the position and orientation of the target object based on a photographed image of the target object.
 画像取得部101は、対象物体の画像を取得する。画像取得部101は、イメージセンサを有する撮像装置であってもよいし、物体認識装置10に接続された撮影装置が撮影した画像を取得するインタフェースであってもよい。以下、画像取得部101が取得する画像をセンサ画像と称する。画像取得部101は、取得したセンサ画像を画像変換部102および第1の学習部105のそれぞれに出力する。センサ画像は、モノクロ画像であってもよいし、RGB画像であってもよい。また、センサ画像は、距離を輝度の明暗で表現した距離画像であってもよい。距離画像は、3次元の位置情報を持った点の集合データに基づいて生成されてもよい。このとき、画像取得部101は、距離画像から3次元の位置情報を持った点の集合を再構成するための最低限の情報を距離画像と同時に取得することが好ましい。点の集合を再構成するための最低限の情報とは、焦点距離、スケールなどである。 The image acquisition unit 101 acquires an image of the target object. The image acquisition unit 101 may be an imaging device having an image sensor, or may be an interface for acquiring an image captured by a photographing device connected to the object recognition device 10. Hereinafter, the image acquired by the image acquisition unit 101 is referred to as a sensor image. The image acquisition unit 101 outputs the acquired sensor image to each of the image conversion unit 102 and the first learning unit 105. The sensor image may be a monochrome image or an RGB image. Further, the sensor image may be a distance image in which the distance is expressed by the brightness and darkness. The distance image may be generated based on the set data of points having three-dimensional position information. At this time, it is preferable that the image acquisition unit 101 acquires the minimum information for reconstructing a set of points having three-dimensional position information from the distance image at the same time as the distance image. The minimum information for reconstructing a set of points is focal length, scale, and so on.
 なお、画像取得部101は、複数の種類の画像を取得することができてもよい。例えば、画像取得部101は、対象物体のモノクロ画像および距離画像の両方を取得することができてよい。このとき、画像取得部101は、モノクロ画像および距離画像の両方を1台で撮影することができる撮影装置であってもよいし、モノクロ画像を撮影する撮影装置と、距離画像を撮影する撮影装置とから構成されてもよい。ただし、モノクロ画像の撮影と距離画像の撮影とを別の撮影装置で行う場合、2台の撮影装置の位置関係を事前に把握しておくことが好ましい。 Note that the image acquisition unit 101 may be able to acquire a plurality of types of images. For example, the image acquisition unit 101 may be able to acquire both a monochrome image and a distance image of the target object. At this time, the image acquisition unit 101 may be a photographing device capable of capturing both a monochrome image and a distance image by one unit, a photographing device for capturing a monochrome image, and a photographing device for capturing a distance image. It may be composed of and. However, when the monochrome image shooting and the distance image shooting are performed by different shooting devices, it is preferable to grasp the positional relationship between the two shooting devices in advance.
 画像変換部102は、画像変換パラメータを用いて、画像取得部101が取得するセンサ画像を画像変換して変換後画像を認識部103に出力する。画像変換部102は、記憶部106に記憶されており、第1の学習部105の学習結果である画像変換パラメータを用いて、センサ画像が目標画像群毎に予め定められた特徴をもつように画像変換を行う。本実施の形態では、予め定められた特徴を有する画像を目標画像と称し、目標画像の集合を目標画像群と称する。 The image conversion unit 102 converts the sensor image acquired by the image acquisition unit 101 into an image using the image conversion parameter, and outputs the converted image to the recognition unit 103. The image conversion unit 102 is stored in the storage unit 106 so that the sensor image has a predetermined feature for each target image group by using the image conversion parameter which is the learning result of the first learning unit 105. Perform image conversion. In the present embodiment, an image having predetermined features is referred to as a target image, and a set of target images is referred to as a target image group.
 同じ目標画像群に含まれる複数の目標画像は、共通する特徴を有する。このとき共通する特徴は、例えば、対象物体の形状、対象物体の表面特性、計測距離、深度などである。また、共通する特徴は、認識の対象である対象物体以外の物体の位置姿勢、外乱光の種類および強度、計測センサの種類、計測センサのパラメータ、対象物体の配置状態、画像のスタイル、対象物体の数量などであってもよい。ここで、計測センサのパラメータとは、ピント、絞りなどのパラメータである。対象物体の配置状態は、整列状態、ばら積み状態などである。同じ目標画像群に含まれる複数の目標画像は、1つの共通する特徴を有してもよいし、複数の共通する特徴を有してもよい。また、「共通する特徴を有する」とは、上記のような特徴が同一である場合だけでなく、類似する場合も含む。例えば、対象物体の形状は、直方体、円柱、六角柱といった基準形状を定めた場合、目標画像内の対象物体の形状が、同じ基準形状に近似できる程度の近さであっても、共通する特徴を有する画像とすることができる。また、対象物体の表面特性は、例えば黒、白、灰色といった基準色を定めた場合、目標画像内の対象物体の見た目の色合いが同じ基準色に分類される程度の近さであっても、共通する特徴を有する画像とすることができる。 Multiple target images included in the same target image group have common features. Common features at this time are, for example, the shape of the target object, the surface characteristics of the target object, the measurement distance, the depth, and the like. In addition, common features are the position and orientation of objects other than the target object to be recognized, the type and intensity of ambient light, the type of measurement sensor, the parameters of the measurement sensor, the arrangement state of the target object, the image style, and the target object. It may be the quantity of. Here, the parameters of the measurement sensor are parameters such as focus and aperture. The arrangement state of the target object is an alignment state, a bulk state, or the like. A plurality of target images included in the same target image group may have one common feature or may have a plurality of common features. Further, "having a common feature" includes not only the case where the above-mentioned features are the same but also the case where they are similar. For example, when a reference shape such as a rectangular parallelepiped, a cylinder, or a hexagonal column is defined, the shape of the target object has a common feature even if the shape of the target object in the target image is close enough to approximate the same reference shape. It can be an image having. Further, when the standard colors such as black, white, and gray are set for the surface characteristics of the target object, even if the apparent hues of the target objects in the target image are close enough to be classified into the same standard colors. It can be an image having common features.
 目標画像には、少なくとも1つの対象物体が映っている。このとき、目標画像内に映っている対象物体は、必ずしも全体が映っている必要はない。例えば、対象物体の一部分が計測範囲外にある場合、他の物体によって対象物体の一部が隠れてしまっている場合、目標画像内に映る対象物体の一部が欠けてしまうことがあるが、問題ない。また、目標画像内に複数の対象物体が映っている場合、複数の対象物体の配置状態は、整列状態であってもよいし、ばら積み状態であってもよい。目標画像は、対象物体を認識しやすい画像であることが望ましい。対象物体を認識しやすい画像とは、例えば、対象物体の形状が複雑ではなく、直方体、立方体といった簡易な形状を有し、ノイズが少ない画像である。 At least one target object is shown in the target image. At this time, the target object shown in the target image does not necessarily have to be shown in its entirety. For example, if a part of the target object is out of the measurement range, or if the target object is partially hidden by another object, the part of the target object displayed in the target image may be missing. no problem. Further, when a plurality of target objects are shown in the target image, the arrangement state of the plurality of target objects may be an aligned state or a bulk state. The target image is preferably an image that makes it easy to recognize the target object. An image in which the target object can be easily recognized is, for example, an image in which the shape of the target object is not complicated, has a simple shape such as a rectangular parallelepiped or a cube, and has less noise.
 画像変換部102が用いる画像変換パラメータのパラメータ数および種類は、画像変換手法によって異なる。画像変換部102は、変換後画像中の対象物体の位置姿勢といった状態が、センサ画像中の対象物体の状態と大きく変わらないような画像変換手法を用いることが望ましい。画像変換部102は、例えば、ニューラルネットワークを利用した画像変換手法を用いることができる。ニューラルネットワークを利用した画像変換手法を用いる場合、画像変換パラメータは、ネットワークを構成する各ユニット間の重み係数を含む。 The number and types of image conversion parameters used by the image conversion unit 102 differ depending on the image conversion method. It is desirable that the image conversion unit 102 use an image conversion method such that the state such as the position and orientation of the target object in the converted image is not significantly different from the state of the target object in the sensor image. The image conversion unit 102 can use, for example, an image conversion method using a neural network. When an image conversion method using a neural network is used, the image conversion parameters include a weighting coefficient between each unit constituting the network.
 認識部103は、画像変換部102が出力する変換後画像に基づいて、対象物体の位置姿勢といった状態を認識する。認識部103が用いる認識手法は、特に制限されない。例えば、認識部103は、画像から対象物体の状態を出力することができるように事前学習を行う機械学習ベースの認識手法を用いてもよいし、対象物体のCAD(Computer-Aided Design)データと3次元計測データと照合して対象物体の状態を推定するモデルマッチングを用いてもよい。認識部103は、1種類の認識手法を用いて認識処理を行ってもよいし、複数の種類の認識手法を組み合わせて用いて認識処理を行ってもよい。認識部103は、認識結果を出力部104および評価部108のそれぞれに出力する。認識結果は、例えば、認識部103の認識処理時間および認識部103が認識した対象物体の個数の少なくともいずれかを含む。 The recognition unit 103 recognizes a state such as the position and orientation of the target object based on the converted image output by the image conversion unit 102. The recognition method used by the recognition unit 103 is not particularly limited. For example, the recognition unit 103 may use a machine learning-based recognition method that performs pre-learning so that the state of the target object can be output from the image, or the CAD (Computer-Aided Design) data of the target object. Model matching that estimates the state of the target object by collating it with the three-dimensional measurement data may be used. The recognition unit 103 may perform the recognition process using one type of recognition method, or may perform the recognition process using a combination of a plurality of types of recognition methods. The recognition unit 103 outputs the recognition result to each of the output unit 104 and the evaluation unit 108. The recognition result includes, for example, at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103.
 出力部104は、認識結果と、後に詳述する評価部108の評価結果とを出力する機能を有する。出力部104が認識結果および評価結果を出力する方法については、特に制限されない。例えば、出力部104は、表示装置を備えており、表示装置の画面上に認識結果および評価結果を表示してもよい。また出力部104は、外部装置とのインタフェースを備えており、認識結果および評価結果を外部装置に送信してもよい。 The output unit 104 has a function of outputting the recognition result and the evaluation result of the evaluation unit 108, which will be described in detail later. The method of outputting the recognition result and the evaluation result by the output unit 104 is not particularly limited. For example, the output unit 104 includes a display device, and may display the recognition result and the evaluation result on the screen of the display device. Further, the output unit 104 is provided with an interface with an external device, and the recognition result and the evaluation result may be transmitted to the external device.
 図2は、図1に示す出力部104が表示する表示画面の一例を示す図である。図2中の「input」は、センサ画像を表示する領域を示しており、「parameter」は、画像変換パラメータと、評価結果である評価値とを表示する領域を示している。また図2中の「conversion」は、変換後画像を表示する領域を示しており、「recognition」は、認識結果を表示する領域を示している。例えば、ユーザが、「parameter」に表示された複数の画像変換パラメータのうちの1つを選択する操作を行うと、表示画面の「Name」には選択された画像変換パラメータの名称が表示され、「Value」には、選択された画像変換パラメータを用いた場合の評価値が表示され、「conversion」には、選択された画像変換パラメータを用いた場合の変換後画像が表示され、「recognition」には、選択された画像変換パラメータを用いた場合の認識結果が表示される。 FIG. 2 is a diagram showing an example of a display screen displayed by the output unit 104 shown in FIG. “Input” in FIG. 2 indicates an area for displaying a sensor image, and “parameter” indicates an area for displaying an image conversion parameter and an evaluation value which is an evaluation result. Further, "conversion" in FIG. 2 indicates an area for displaying the converted image, and "recognition" indicates an area for displaying the recognition result. For example, when the user performs an operation of selecting one of a plurality of image conversion parameters displayed on the "parameter", the name of the selected image conversion parameter is displayed on the "Name" of the display screen. In "Value", the evaluation value when the selected image conversion parameter is used is displayed, and in "conversion", the converted image when the selected image conversion parameter is used is displayed, and "recognition" is displayed. Displays the recognition result when the selected image conversion parameter is used.
 第1の学習部105は、センサ画像を、目標画像群の特徴を有するように画像変換するための画像変換パラメータを学習する。第1の学習部105は、画像変換部102が用いる画像変換パラメータを、目標画像群ごとに学習する。図3は、図1に示す第1の学習部105の詳細な構成の一例を示す図である。第1の学習部105は、状態観測部11と、機械学習部12とを有する。目標画像群に含まれる複数の目標画像の間のばらつきが小さい場合、第1の学習部105は、目標画像群の特徴を再現した画像変換を行うことが可能な画像変換パラメータを得ることができる可能性が高くなる。センサ画像の目標画像群との乖離が大きい場合、第1の学習部105の画像変換パラメータの学習は収束し難い。 The first learning unit 105 learns image conversion parameters for image conversion of the sensor image so as to have the characteristics of the target image group. The first learning unit 105 learns the image conversion parameters used by the image conversion unit 102 for each target image group. FIG. 3 is a diagram showing an example of a detailed configuration of the first learning unit 105 shown in FIG. The first learning unit 105 has a state observation unit 11 and a machine learning unit 12. When the variation between the plurality of target images included in the target image group is small, the first learning unit 105 can obtain an image conversion parameter capable of performing image conversion that reproduces the characteristics of the target image group. The possibility is high. When the deviation of the sensor image from the target image group is large, the learning of the image conversion parameter of the first learning unit 105 is difficult to converge.
 状態観測部11は、画像変換パラメータと、目標画像群と、変換後画像および目標画像群の特徴の類似度とを状態変数として観測する。機械学習部12は、画像変換パラメータ、目標画像群、類似度の状態変数に基づいて作成される訓練データセットに従って、画像変換パラメータを目標画像群ごとに学習する。 The state observation unit 11 observes the image conversion parameters, the target image group, and the similarity between the converted image and the features of the target image group as state variables. The machine learning unit 12 learns the image conversion parameters for each target image group according to the training data set created based on the image conversion parameters, the target image group, and the state variables of the similarity.
 機械学習部12が用いる学習アルゴリズムはどのようなものであってもよい。一例として、機械学習部12が強化学習を用いる場合について説明する。強化学習は、ある環境内における行動主体であるエージェントが、現在の状態を観測し、取るべき行動を決定する、という学習アルゴリズムである。エージェントは、行動を選択することで環境から報酬を得て、一連の行動を通じて報酬が最も多く得られるような方策を学習する。強化学習の代表的な手法として、Q学習、TD学習が知られている。例えば、Q学習の場合、行動価値関数Q(s,a)の一般的な更新式は、以下の数式(1)で表される。 The learning algorithm used by the machine learning unit 12 may be any. As an example, a case where the machine learning unit 12 uses reinforcement learning will be described. Reinforcement learning is a learning algorithm in which an agent, who is the subject of action in a certain environment, observes the current state and decides the action to be taken. Agents are rewarded by the environment by choosing an action and learn how to get the most reward through a series of actions. Q-learning and TD-learning are known as typical methods of reinforcement learning. For example, if the Q-learning, general update equations of action value function Q (s t, a t) is expressed by the following equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 数式(1)において、sは時刻tにおける環境を表し、aは時刻tにおける行動を表す。行動aにより、環境はst+1に変わる。rt+1は行動aの結果として変化する環境に応じて与えられる報酬を表し、γは割引率を表し、αは学習係数を表す。 In Equation (1), s t represents the environment at time t, a t represents the behavior in time t. By the action a t, the environment is changed to s t + 1. r t + 1 denotes the reward given in accordance with the changing environment as a result of action a t, gamma represents the discount rate, alpha represents a learning coefficient.
 数式(1)で表される更新式は、時刻t+1における最良の行動aの行動価値Qが、時刻tにおいて実行された行動aの行動価値Qよりも大きければ、行動価値Qを大きくし、逆の場合は、行動価値Qを小さくする。換言すれば、時刻tにおける行動aの行動価値Qを、時刻t+1における最良の行動価値に近づけるように、行動価値関数Q(s,a)を更新する。このような更新を繰り返すことで、或る環境における最良の行動価値が、それ以前の環境における行動価値に順次伝搬していくようになる。 The update formula represented by the formula (1) increases the action value Q if the action value Q of the best action a at time t + 1 is larger than the action value Q of the action a executed at time t, and vice versa. In the case of, the action value Q is reduced. In other words, the action value Q of action a at time t, as close to the best action value at time t + 1, action value function Q (s t, a t) Update. By repeating such updates, the best behavioral value in a certain environment is sequentially propagated to the behavioral value in the previous environment.
 機械学習部12は、報酬計算部121と、関数更新部122とを有する。 The machine learning unit 12 has a reward calculation unit 121 and a function update unit 122.
 報酬計算部121は、状態変数に基づいて報酬を計算する。報酬計算部121は、状態変数に含まれる類似度に基づいて、報酬rを計算する。類似度は、変換後画像が、目標画像群の特徴を再現している度合いが高いほど高くなる。例えば、類似度が予め定められる閾値よりも高い場合、報酬計算部121は、報酬rを増大させる。報酬計算部121は、例えば、「1」の報酬を与えて報酬rを増大させることができる。他方、類似度が予め定められる閾値よりも低い場合、報酬計算部121は、報酬rを減少させる。報酬計算部121は、例えば、「-1」の報酬を与えて報酬rを減少させることができる。類似度は、目標画像群の特徴の種類に応じて、公知の方法に従って算出される。 The reward calculation unit 121 calculates the reward based on the state variable. The reward calculation unit 121 calculates the reward r based on the similarity included in the state variable. The degree of similarity increases as the converted image reproduces the characteristics of the target image group. For example, if the similarity is higher than a predetermined threshold, the reward calculation unit 121 increases the reward r. The reward calculation unit 121 can increase the reward r by giving a reward of "1", for example. On the other hand, when the similarity is lower than a predetermined threshold value, the reward calculation unit 121 reduces the reward r. The reward calculation unit 121 can, for example, give a reward of "-1" to reduce the reward r. The similarity is calculated according to a known method according to the type of features of the target image group.
 関数更新部122は、報酬計算部121によって計算される報酬rに従って、画像変換パラメータを決定するための関数を更新する。例えばQ学習の場合、数式(1)で表される行動価値関数Q(s,a)を、画像変換パラメータを決定するための関数として用いる。 The function update unit 122 updates the function for determining the image conversion parameter according to the reward r calculated by the reward calculation unit 121. For example, in the case of Q-learning, action value function Q (s t, a t) represented by Equation (1), and is used as a function for determining an image transform parameter.
 図4は、図1に示す第1の学習部105の動作例を説明するためのフローチャートである。図4に示す動作は、物体認識装置10の運用を開始する前に行われる。第1の学習部105の状態観測部11は、画像取得部101を用いてセンサ画像群を取得する(ステップS101)。状態観測部11は、予め定められた複数の目標画像群の中から1つの目標画像群を選択する(ステップS102)。 FIG. 4 is a flowchart for explaining an operation example of the first learning unit 105 shown in FIG. The operation shown in FIG. 4 is performed before the operation of the object recognition device 10 is started. The state observation unit 11 of the first learning unit 105 acquires the sensor image group using the image acquisition unit 101 (step S101). The state observation unit 11 selects one target image group from a plurality of predetermined target image groups (step S102).
 第1の学習部105は、選択された目標画像群に対する画像変換パラメータを設定する(ステップS103)。第1の学習部105は、画像変換部102に、設定した画像変換パラメータを用いてセンサ画像を画像変換させる(ステップS104)。 The first learning unit 105 sets the image conversion parameters for the selected target image group (step S103). The first learning unit 105 causes the image conversion unit 102 to perform image conversion of the sensor image using the set image conversion parameters (step S104).
 第1の学習部105の状態観測部11は、状態変数である、画像変換パラメータと、目標画像群と、変換後画像および目標画像群の特徴の類似度とを取得する(ステップS105)。状態観測部11は、取得した状態変数を機械学習部12に出力する。機械学習部12の報酬計算部121は、類似度が閾値よりも高いか否かを判断する(ステップS106)。 The state observation unit 11 of the first learning unit 105 acquires the image conversion parameter, which is a state variable, the target image group, and the similarity between the converted image and the features of the target image group (step S105). The state observation unit 11 outputs the acquired state variables to the machine learning unit 12. The reward calculation unit 121 of the machine learning unit 12 determines whether or not the similarity is higher than the threshold value (step S106).
 類似度が閾値よりも高い場合(ステップS106:Yes)、報酬計算部121は、報酬rを増大させる(ステップS107)。類似度が閾値よりも低い場合(ステップS106:No)、報酬計算部121は、報酬rを減少させる(ステップS108)。報酬計算部121は、計算した報酬rを関数更新部122に出力する。 When the similarity is higher than the threshold value (step S106: Yes), the reward calculation unit 121 increases the reward r (step S107). When the similarity is lower than the threshold value (step S106: No), the reward calculation unit 121 reduces the reward r (step S108). The reward calculation unit 121 outputs the calculated reward r to the function update unit 122.
 関数更新部122は、報酬計算部121が計算した報酬rに従って、行動価値関数Q(s,a)を更新する(ステップS109)。第1の学習部105は、予め定められた学習終了条件を満たすか否かを判断する(ステップS110)。学習終了条件は、画像変換パラメータの学習精度が基準以上となることを判定するための条件であることが望ましい。例えば、学習終了条件は、「ステップS103からステップS109の処理を繰り返した回数が予め定められた回数を超えること」、「同じ目標画像群に対する画像変換パラメータの学習を開始してからの経過時間が予め定められた時間を超えること」などである。 Function update unit 122, according reward r that compensation calculation unit 121 has calculated, action value function Q (s t, a t) updating (step S109). The first learning unit 105 determines whether or not a predetermined learning end condition is satisfied (step S110). It is desirable that the learning end condition is a condition for determining that the learning accuracy of the image conversion parameter is equal to or higher than the standard. For example, the learning end conditions are "the number of times the processing of steps S103 to S109 is repeated exceeds a predetermined number of times" and "the elapsed time from the start of learning the image conversion parameters for the same target image group". Exceeding a predetermined time. "
 学習終了条件を満たさない場合(ステップS110:No)、第1の学習部105は、ステップS103から処理を繰り返す。学習終了条件を満たした場合(ステップS110:Yes)、第1の学習部105は、目標画像群に対する画像変換パラメータの学習結果を出力する(ステップS111)。 When the learning end condition is not satisfied (step S110: No), the first learning unit 105 repeats the process from step S103. When the learning end condition is satisfied (step S110: Yes), the first learning unit 105 outputs the learning result of the image conversion parameter for the target image group (step S111).
 第1の学習部105は、全ての目標画像群に対する学習が終了したか否かを判断する(ステップS112)。全ての目標画像群に対する学習が終了していない場合、つまり、学習が終了していない目標画像群がある場合(ステップS112:No)、第1の学習部105は、ステップS102から処理を繰り返す。全ての目標画像群に対する学習が終了した場合(ステップS112:Yes)、第1の学習部105は、画像変換パラメータ学習処理を終了する。 The first learning unit 105 determines whether or not the learning for all the target image groups has been completed (step S112). When the learning for all the target image groups is not completed, that is, when there is a target image group for which the learning has not been completed (step S112: No), the first learning unit 105 repeats the process from step S102. When the learning for all the target image groups is completed (step S112: Yes), the first learning unit 105 ends the image conversion parameter learning process.
 以上、第1の学習部105が強化学習を利用して機械学習する例について説明したが、第1の学習部105は、他の公知の方法、例えばニューラルネットワーク、遺伝的プログラミング、機能論理プログラミング、サポートベクターマシンなどに従って機械学習を実行してもよい。 The example in which the first learning unit 105 performs machine learning using reinforcement learning has been described above, but the first learning unit 105 describes other known methods such as neural networks, genetic programming, and functional logic programming. Machine learning may be performed according to a support vector machine or the like.
 図5は、図1に示す第1の学習部105がCycleGAN(Generative Adversarial Networks)を用いる場合の動作例を説明するための図である。第2の例では、第1の学習部105は、CycleGANを利用して画像変換パラメータを学習する。CycleGANを用いる場合、第1の学習部105は、図5に示すように、第1生成器Gと、第2生成器Fと、第1識別器DXと、第2識別器DYとを用いて、画像変換パラメータを学習する。 FIG. 5 is a diagram for explaining an operation example when the first learning unit 105 shown in FIG. 1 uses CycleGAN (Generative Adversarial Networks). In the second example, the first learning unit 105 learns the image conversion parameters using CycleGAN. When using a CycleGAN, first learning unit 105, as shown in FIG. 5, a first generator G, a second generator F, the first discriminator D X, and a second discriminator D Y Use to learn image conversion parameters.
 第1の学習部105は、2種類の画像群X,Yの訓練データを用いて、画像群X,Y間の画像変換パラメータを学習する。画像群Xの訓練データに含まれる画像を画像xと称し、画像群Yの訓練データに含まれる画像を画像yと称する。 The first learning unit 105 learns the image conversion parameters between the image groups X and Y using the training data of the two types of image groups X and Y. The image included in the training data of the image group X is referred to as an image x, and the image included in the training data of the image group Y is referred to as an image y.
 第1生成器Gは、画像xから画像群Yの特徴を有する画像を生成する。第1生成器Gに画像xを入力したときの出力をG(x)とする。第2生成器Fは、画像yから画像群Xの特徴を有する画像を生成する。第2生成器Fに画像yを入力したときの出力をF(y)とする。第1識別器DXは、xとF(y)とを見分ける。第2識別器DYは、yとG(x)とを見分ける。 The first generator G generates an image having the characteristics of the image group Y from the image x. Let G (x) be the output when the image x is input to the first generator G. The second generator F generates an image having the characteristics of the image group X from the image y. Let F (y) be the output when the image y is input to the second generator F. First discriminator D X is distinguish between x and F (y). The second discriminator D Y, distinguish between y and G (x).
 第1の学習部105は、2種類の損失に基づいて、第1生成器Gおよび第2生成器Fの画像変換精度が高まり、第1識別器DXおよび第2識別器DYの識別精度が高まるように、学習を行う。具体的には、第1の学習部105は、以下の数式(2)が示す総損失L(G,F,DX,DY)が、以下の数式(3)が示す目的関数を満たすように学習を行う。 First learning unit 105, two on the basis of the loss, image conversion accuracy of the first generator G and the second generator F is increased, the identification accuracy of the first discriminator D X and a second discriminator D Y Learn so that Specifically, the first learning section 105, the following equation (2) total loss indicated L (G, F, D X , D Y) is, to satisfy an objective function represented by the following formula (3) To learn.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 数式(2)に含まれる第1損失LGAN(G,DY,X,Y)は、第1生成器Gが画像xから画像群Yの特徴を有する画像G(x)を生成した際に生じる損失である。数式(2)に含まれる第2損失LGAN(F,DX,Y,X)は、第2生成器Fが画像yから画像群Xの特徴を有する画像F(x)を生成した際に生じる損失である。数式(2)に含まれる第3損失Lcyc(G,F)は、画像xを第1生成器Gに入力して画像G(x)を生成し、生成した画像G(x)を第2生成器Fに入力して画像F(G(x))を生成した場合に生じる損失と、画像yを第2生成器Fに入力して画像F(y)を生成し、生成した画像F(y)を第1生成器Gに入力して画像G(F(y))を生成した場合に生じる損失との和である。 First loss L GAN included in Equation (2) (G, D Y , X, Y) , when the first generator G generates the image G (x) having the characteristics of the image group Y from the image x It is a loss that occurs. Second loss L GAN included in Equation (2) (F, D X , Y, X) , when the second generator F generated the image F (x) having the characteristics of the image group X from the image y It is a loss that occurs. In the third loss L cyc (G, F) included in the equation (2), the image x is input to the first generator G to generate the image G (x), and the generated image G (x) is used as the second image G (x). The loss that occurs when the image F (G (x)) is generated by inputting to the generator F and the image F (y) generated by inputting the image y into the second generator F to generate the image F (y). This is the sum of the loss that occurs when the image G (F (y)) is generated by inputting y) into the first generator G.
 つまり、第1の学習部105は、以下の4つの前提に基づいて、総損失総損失L(G,F,DX,DY)が小さくなるように第1生成器Gおよび第2生成器Fの学習を行い、総損失総損失L(G,F,DX,DY)が大きくなるように第1識別器DXおよび第2識別器DYの学習を行う。
1.画像xを第1生成器Gに入力して変換された画像G(x)は、画像群Yと類似するはずである。
2.画像yを第2生成器Fに入力して変換された画像F(y)は画像群Xと類似するはずである。
3.画像G(x)を第2生成器Fに入力して変換された画像F(G(x))は画像群Xと類似するはずである。
4.画像F(y)を第1生成器Gに入力して変換された画像G(F(y))は画像群Yと類似するはずである。
That is, the first learning section 105, the following on the basis of the four assumptions, the total loss total loss L (G, F, D X , D Y) a first generator so that smaller G and a second generator learns of F, performing total loss total loss L (G, F, D X , D Y) a learning of the first discriminator so increases D X and a second discriminator D Y.
1. 1. The image G (x) converted by inputting the image x into the first generator G should be similar to the image group Y.
2. The image F (y) converted by inputting the image y into the second generator F should be similar to the image group X.
3. 3. The image F (G (x)) converted by inputting the image G (x) into the second generator F should be similar to the image group X.
4. The image G (F (y)) converted by inputting the image F (y) into the first generator G should be similar to the image group Y.
 第1の学習部105は、センサ画像群を画像群Xとし、目標画像群を画像群Yとして、上記の学習を行い、センサ画像群から目標画像群を生成する第1生成器Gで用いられる画像変換パラメータを学習し、学習結果を記憶部106に出力する。第1の学習部105は、複数の種類の目標画像群のそれぞれについて、上記の学習を行い、目標画像群ごとに画像変換パラメータを学習する。 The first learning unit 105 is used in the first generator G that performs the above learning with the sensor image group as the image group X and the target image group as the image group Y and generates the target image group from the sensor image group. The image conversion parameters are learned, and the learning result is output to the storage unit 106. The first learning unit 105 performs the above learning for each of the plurality of types of target image groups, and learns the image conversion parameters for each target image group.
 図1の説明に戻る。記憶部106は、第1の学習部105の学習結果である、目標画像群毎の画像変換パラメータを記憶する。 Return to the explanation in Fig. 1. The storage unit 106 stores the image conversion parameters for each target image group, which is the learning result of the first learning unit 105.
 画像変換パラメータ決定部107は、後述する評価部108が運用開始前に行った評価結果に基づいて、複数の画像変換パラメータの中から、運用中に画像変換部102が用いる画像変換パラメータを決定する。画像変換パラメータ決定部107は、決定した画像変換パラメータを画像変換部102に通知する。 The image conversion parameter determination unit 107 determines the image conversion parameter used by the image conversion unit 102 during operation from among a plurality of image conversion parameters based on the evaluation result performed by the evaluation unit 108 described later before the start of operation. .. The image conversion parameter determination unit 107 notifies the image conversion unit 102 of the determined image conversion parameter.
 画像変換パラメータ決定部107は、例えば、評価値Ecが最大の画像変換パラメータを画像変換部102が用いる画像変換パラメータとしてもよいし、評価部108が出力部104に評価結果を出力させて、ユーザが出力された評価結果を確認した上で選択した画像変換パラメータを画像変換部102が用いる画像変換パラメータとしてもよい。例えば、学習時に用いたセンサ画像と実際に得られるセンサ画像の光の加減が、時間帯などの影響で変わることが考えられる場合、出力部104が評価結果に加えて、それぞれの画像変換パラメータを用いた場合の変換後画像を出力することが考えられる。この場合、ユーザは、変換後画像を確認して、光の反射を抑える変換が可能な画像変換パラメータを選択することができる。このとき、出力部104は、評価値が閾値以上である画像変換パラメータの評価値と、変換後画像とを出力し、評価値が閾値未満の画像変換パラメータを出力しなくてもよい。 The image conversion parameter determination unit 107 may, for example, use the image conversion parameter having the maximum evaluation value E c as the image conversion parameter used by the image conversion unit 102, or the evaluation unit 108 causes the output unit 104 to output the evaluation result. The image conversion parameter selected after confirming the evaluation result output by the user may be used as the image conversion parameter used by the image conversion unit 102. For example, when it is considered that the amount of light of the sensor image used at the time of learning and the light of the sensor image actually obtained changes depending on the time zone or the like, the output unit 104 adds each image conversion parameter to the evaluation result. It is conceivable to output the converted image when used. In this case, the user can check the converted image and select an image conversion parameter capable of performing conversion that suppresses light reflection. At this time, the output unit 104 may output the evaluation value of the image conversion parameter whose evaluation value is equal to or more than the threshold value and the converted image, and may not output the image conversion parameter whose evaluation value is less than the threshold value.
 評価部108は、運用開始前に、複数の画像変換パラメータのそれぞれを用いた場合の認識部103の認識結果に基づいて、複数の画像変換パラメータのそれぞれを評価する。具体的には、評価部108は、評価値Ecを算出し、算出した評価値Ecである評価結果を画像変換パラメータ決定部107および出力部104のそれぞれに出力する。評価部108が算出する評価値Ecは、例えば以下の数式(4)で表される。 Before the start of operation, the evaluation unit 108 evaluates each of the plurality of image conversion parameters based on the recognition result of the recognition unit 103 when each of the plurality of image conversion parameters is used. Specifically, the evaluation unit 108 calculates the evaluation value E c, and outputs the a is the evaluation result calculated evaluation value E c to each of the image conversion parameter determination unit 107 and an output unit 104. The evaluation value E c calculated by the evaluation unit 108 is represented by, for example, the following mathematical formula (4).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 ここで、prは認識精度を示し、trは認識処理時間を示し、wpr,wtrは重み係数を示す。つまり、評価値Ecは、認識精度prに重み係数wprを乗算した値と、認識処理時間trの逆数に重み係数wtrを乗算した値との和である。 Here, p r represents the recognition accuracy, t r represents the recognition processing time, w pr, w tr denotes the weight coefficient. That is, the evaluation value E c is the sum of the value obtained by multiplying the weight coefficient w pr in recognition accuracy p r, a value obtained by multiplying the weight coefficient w tr to the inverse of the recognition processing time t r.
 一般的に、認識精度prと認識処理時間trとはトレードオフの関係にある。このため、ユーザが何を重視するかによって、重み係数wpr,wtrの値を決定すればよい。例えば、認識精度が多少低下しても認識処理の速度を重視したい場合、重み係数wprの値を小さくし、重み係数wtrの値を大きくすればよい。逆に、時間がかかっても認識精度を重視する場合、重み係数wprの値を大きくし、重み係数wtrの値を小さくすればよい。 Generally, there is a trade-off between recognition accuracy p r a recognition processing time t r. Therefore, the values of the weighting coefficients w pr and w tr may be determined depending on what the user attaches importance to. For example, if it is desired to emphasize the speed of the recognition process even if the recognition accuracy is slightly lowered, the value of the weighting coefficient w pr may be reduced and the value of the weighting coefficient w tr may be increased. On the contrary, when the recognition accuracy is emphasized even if it takes time, the value of the weighting coefficient w pr may be increased and the value of the weighting coefficient w tr may be decreased.
 認識精度prは、センサ画像中の対象物体を認識することができた度合い、または、対象物体の状態の誤差、具体的には位置姿勢の誤差である。例えば、認識精度prがセンサ画像中の対象物体を認識することができた度合いである場合、認識精度prは、以下の数式(5)で表される。 The recognition accuracy pr is the degree to which the target object in the sensor image can be recognized, or the error of the state of the target object, specifically, the error of the position and orientation. For example, when the recognition accuracy pr is the degree to which the target object in the sensor image can be recognized, the recognition accuracy pr is expressed by the following mathematical formula (5).
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 ここで、nrは認識できた対象物体の個数を示し、Nwはセンサ画像中の対象物体の数を示す。つまり、数式(5)で表される認識精度prは、認識できた対象物体の個数nrを、センサ画像中の対象物体の数Nwで除算した値である。センサ画像中の対象物体の位置姿勢と、認識した位置姿勢との誤差が閾値以内の場合、認識が成功したと判断してもよいし、ユーザが目視で認識が成功したか否かを判断してもよい。 Here, n r indicates the number of recognizable target objects, and N w indicates the number of target objects in the sensor image. In other words, recognition accuracy p r represented by the equation (5), the number n r of the recognized target object, which is divided by the number N w of the object in the sensor image. If the error between the position and orientation of the target object in the sensor image and the recognized position and orientation is within the threshold value, it may be determined that the recognition is successful, or the user visually determines whether or not the recognition is successful. You may.
 また、対象物体の状態の誤差を認識精度prとする場合、認識精度prは、以下の数式(6)で表される。 In the case of an error condition of the target object and recognition accuracy p r, recognition accuracy p r is expressed by the following equation (6).
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 ここで、xwは対象物体の実際の位置姿勢を示し、xrは認識された位置姿勢を示す。つまり、数式(6)で表される認識精度prは、対象物体の実際の位置姿勢xwと認識された位置姿勢xrとの差の絶対値に1を加算した値の逆数である。対象物体の実際の位置姿勢および認識された位置姿勢は、画像空間内の位置姿勢であってもよいし、実空間内での位置姿勢であってもよい。 Here, x w indicates the actual position / orientation of the target object, and x r indicates the recognized position / orientation. In other words, recognition accuracy p r represented by the equation (6) is the inverse of the value obtained by adding 1 to the absolute value of the difference between the actual position and orientation x w and recognized position and orientation x r of the target object. The actual position / orientation and the recognized position / orientation of the target object may be the position / orientation in the image space or the position / orientation in the real space.
 また、認識精度prは、上記の例に限定されない。上記の例を組み合わせてもよい。 Further, the recognition accuracy pr is not limited to the above example. The above examples may be combined.
 また、上記の数式(4)で表した例に限らず、評価値Ecは、以下の数式(7)を用いて算出されてもよい。 Further, not limited to the example shown in the above equation (4), the evaluation value E c may be calculated using the following equation (7).
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 ここで、Trは認識処理時間閾値を示す。つまり、数式(7)を用いる場合、認識処理時間閾値Tr以内に認識処理が完了する場合、評価値Ecは、認識精度prに重み係数wprを乗算した値であり、認識処理時間閾値Tr以内に認識処理が完了しない場合、評価値Ecは0である。認識処理時間閾値Tr以内に認識処理が完了しない画像変換パラメータの評価値Ecを0とすることで、ユーザが要求する時間内に認識処理を完了することが可能な画像変換パラメータを確認および選択することが可能になる。評価値Ecの算出方法は、上記に限定されない。 Here, Tr indicates a recognition processing time threshold. That is, when using Equation (7), when the recognition process within the recognition processing time threshold T r is completed, the evaluation value E c is a value obtained by multiplying the weight coefficient w pr in recognition accuracy p r, the recognition processing time If the recognition process is not completed within the threshold T r , the evaluation value E c is 0. Recognition processing time The recognition processing is not completed within the threshold value T r. By setting the evaluation value E c of the image conversion parameter to 0, the image conversion parameter that can complete the recognition processing within the time required by the user can be confirmed and confirmed. It becomes possible to select. The method for calculating the evaluation value E c is not limited to the above.
 入力受付部109は、評価部108が画像変換パラメータを評価するために用いるパラメータである評価パラメータの入力を受け付ける。入力受付部109は、ユーザが入力装置などを用いて入力する評価パラメータを受け付けてもよいし、物体認識装置10内の機能部から評価パラメータを受け付けてもよいし、物体認識装置10の外部装置から評価パラメータを受け付けてもよい。入力受付部109が受け付ける評価パラメータは、例えば、数式(4)に含まれる重み係数wpr,trなど、評価値の大きさに影響を与える複数の要素のそれぞれが、評価値に与える影響を変更するための重み係数である。 The input receiving unit 109 receives the input of the evaluation parameter, which is a parameter used by the evaluation unit 108 to evaluate the image conversion parameter. The input receiving unit 109 may accept evaluation parameters input by the user using an input device or the like, may receive evaluation parameters from a functional unit in the object recognition device 10, or may receive evaluation parameters from an external device of the object recognition device 10. Evaluation parameters may be accepted from. The evaluation parameters received by the input receiving unit 109 include, for example, the weighting coefficients w pr, w tr included in the mathematical formula (4), and the influence of each of a plurality of elements affecting the magnitude of the evaluation value on the evaluation value. It is a weighting factor for changing.
 図6は、図1に示す物体認識装置10が運用開始前に行う処理について説明するためのフローチャートである。物体認識装置10の第1の学習部105は、画像変換パラメータ学習処理を行う(ステップS121)。ステップS121に示す画像変換パラメータ学習処理は、図4を用いて説明した処理または図5を用いて説明した処理であるため、ここでは詳細な説明を省略する。 FIG. 6 is a flowchart for explaining the process performed by the object recognition device 10 shown in FIG. 1 before the start of operation. The first learning unit 105 of the object recognition device 10 performs the image conversion parameter learning process (step S121). Since the image conversion parameter learning process shown in step S121 is the process described with reference to FIG. 4 or the process described with reference to FIG. 5, detailed description thereof will be omitted here.
 続いて入力受付部109は、評価パラメータを取得して、取得した評価パラメータを評価部108に出力する(ステップS122)。 Subsequently, the input receiving unit 109 acquires the evaluation parameters and outputs the acquired evaluation parameters to the evaluation unit 108 (step S122).
 画像取得部101は、センサ画像を取得し、取得したセンサ画像を画像変換部102に出力する(ステップS123)。画像変換部102は、記憶部106に記憶された複数の学習済みの画像変換パラメータの中から、未だ評価値の算出が済んでいない画像変換パラメータを1つ選択する(ステップS124)。 The image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S123). The image conversion unit 102 selects one image conversion parameter for which the evaluation value has not yet been calculated from the plurality of learned image conversion parameters stored in the storage unit 106 (step S124).
 画像変換部102は、選択した画像変換パラメータを用いて、画像取得部101が取得したセンサ画像を変換後画像に変換する画像変換処理を行う(ステップS125)。画像変換部102は、変換後画像を認識部103に出力する。 The image conversion unit 102 performs an image conversion process of converting the sensor image acquired by the image acquisition unit 101 into an image after conversion using the selected image conversion parameter (step S125). The image conversion unit 102 outputs the converted image to the recognition unit 103.
 認識部103は、変換後画像を用いて、認識処理を行い、認識結果を評価部108に出力する(ステップS126)。なお、認識結果を出力する場合、認識部103は、認識結果を出力部104に出力してもよい。 The recognition unit 103 performs recognition processing using the converted image and outputs the recognition result to the evaluation unit 108 (step S126). When outputting the recognition result, the recognition unit 103 may output the recognition result to the output unit 104.
 評価部108は、認識結果に基づいて、評価値Ecを算出し、算出した評価値Ecを画像変換パラメータ決定部107に出力する(ステップS127)。 The evaluation unit 108 calculates the evaluation value E c based on the recognition result, and outputs the calculated evaluation value E c to the image conversion parameter determination unit 107 (step S127).
 画像変換部102は、全ての画像変換パラメータの評価値Ecを算出したか否かを判断する(ステップS128)。全ての画像変換パラメータの評価値Ecを算出していない場合(ステップS128:No)、つまり、評価値Ecを算出していない画像変換パラメータがある場合、画像変換部102は、ステップS124から処理を繰り返す。全ての画像変換パラメータの評価値Ecを算出した場合(ステップS128:Yes)、画像変換パラメータ決定部107は、複数の画像変換パラメータの中から、評価部108の評価結果である評価値に基づいて、運用中に画像変換部102が用いる画像変換パラメータを決定する(ステップS129)。 The image conversion unit 102 determines whether or not the evaluation values E c of all the image conversion parameters have been calculated (step S128). When the evaluation values E c of all the image conversion parameters have not been calculated (step S128: No), that is, when there are image conversion parameters for which the evaluation values E c have not been calculated, the image conversion unit 102 starts from step S124. Repeat the process. When calculating the evaluation value E c of all the image transformation parameters (step S128: Yes), the image transformation parameter determination unit 107, from among a plurality of image transformation parameters, based on the evaluation value is an evaluation result of the evaluation unit 108 The image conversion parameter used by the image conversion unit 102 during operation is determined (step S129).
 図7は、図1に示す物体認識装置10の運用中の動作を説明するためのフローチャートである。運用前に、図6に示した動作が行われており、目標画像群毎に、画像変換パラメータが学習済みであり、学習済みの画像変換パラメータの中から、画像変換部102が用いる画像変換パラメータが選択済みであることとする。 FIG. 7 is a flowchart for explaining the operation of the object recognition device 10 shown in FIG. 1 during operation. Before the operation, the operation shown in FIG. 6 is performed, the image conversion parameters have been learned for each target image group, and the image conversion parameters used by the image conversion unit 102 from the learned image conversion parameters. Is selected.
 画像取得部101は、センサ画像を取得し、取得したセンサ画像を画像変換部102に出力する(ステップS131)。画像変換部102は、選択された画像変換パラメータを取得する(ステップS132)。画像変換部102は、取得した画像変換パラメータを用いて、センサ画像を変換後画像に変換する画像変換処理を行い、変換後画像を認識部103に出力する(ステップS133)。 The image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S131). The image conversion unit 102 acquires the selected image conversion parameter (step S132). The image conversion unit 102 performs an image conversion process for converting the sensor image into a converted image using the acquired image conversion parameters, and outputs the converted image to the recognition unit 103 (step S133).
 認識部103は、変換後画像を用いて、変換後画像中に含まれる対象物体の状態を認識する認識処理を行い、認識結果を出力部104に出力する(ステップS134)。 The recognition unit 103 uses the converted image to perform a recognition process for recognizing the state of the target object included in the converted image, and outputs the recognition result to the output unit 104 (step S134).
 出力部104は、認識結果に基づいて、対象物体が存在するか否かを判断する(ステップS135)。対象物体が存在する場合(ステップS135:Yes)、出力部104は、認識結果を出力する(ステップS136)。認識結果を出力した後、画像取得部101は、ステップS131から処理を繰り返す。対象物体が存在しない場合(ステップS135:No)、物体認識装置10は処理を終了する。 The output unit 104 determines whether or not the target object exists based on the recognition result (step S135). When the target object exists (step S135: Yes), the output unit 104 outputs the recognition result (step S136). After outputting the recognition result, the image acquisition unit 101 repeats the process from step S131. When the target object does not exist (step S135: No), the object recognition device 10 ends the process.
 なお、上記では、画像変換部102は、センサ画像を1段階の画像変換処理で変換後画像に変換することとしたが、本実施の形態はかかる例に限定されない。例えば、画像変換部102は、複数の段階の画像変換を行って、センサ画像を変換後画像に変換してもよい。例えば、2段階の画像変換が行われる場合、画像変換部102は、センサ画像を第1の中間画像に変換し、第1の中間画像を変換画像に変換する。3段階の画像変換が行われる場合、画像変換部102は、センサ画像を第1の中間画像に変換し、第1の中間画像を第2の中間画像に変換し、第2の中間画像を変換後画像に変換する。 In the above, the image conversion unit 102 converts the sensor image into a converted image by a one-step image conversion process, but the present embodiment is not limited to such an example. For example, the image conversion unit 102 may perform image conversion in a plurality of stages to convert the sensor image into an image after conversion. For example, when two-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image and converts the first intermediate image into a converted image. When three-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image, converts the first intermediate image into a second intermediate image, and converts the second intermediate image. Convert to a later image.
 なお、画像変換部102が複数の段階の画像変換を行う場合、第1の学習部105は、画像変換の段階ごとに用いられる複数の種類の画像変換パラメータのそれぞれを学習する。具体的には、第1の学習部105は、センサ画像を中間画像に変換するための第1の画像変換パラメータと、中間画像を変換後画像に変換するための第2の画像変換パラメータとを学習する。また、3段階以上の画像変換が行われる場合、第1の学習部105は、中間画像を中間画像に変換するための第3の画像変換パラメータを学習する。例えば、2段階の画像変換が行われる場合、第1の学習部105は、センサ画像を第1の中間画像に変換するための第1の画像変換パラメータと、第1の中間画像を変換後画像に変換するための第2の画像変換パラメータとを学習する。また、3段階の画像変換が行われる場合、第1の学習部105は、センサ画像を第1の中間画像に変換するための第1の画像変換パラメータと、第1の中間画像を第2の中間画像に変換するための第3の画像変換パラメータと、第2の中間画像を変換後画像に変換するための第2の画像変換パラメータとを学習する。 When the image conversion unit 102 performs image conversion in a plurality of stages, the first learning unit 105 learns each of the plurality of types of image conversion parameters used in each stage of the image conversion. Specifically, the first learning unit 105 sets a first image conversion parameter for converting the sensor image into an intermediate image and a second image conversion parameter for converting the intermediate image into a converted image. learn. Further, when three or more steps of image conversion are performed, the first learning unit 105 learns a third image conversion parameter for converting an intermediate image into an intermediate image. For example, when two-step image conversion is performed, the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the converted image of the first intermediate image. Learn with a second image conversion parameter for conversion to. Further, when three-step image conversion is performed, the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the first intermediate image into the second intermediate image. A third image conversion parameter for converting to an intermediate image and a second image conversion parameter for converting a second intermediate image into a converted image are learned.
 中間画像は、センサ画像とも変換後画像とも異なる画像である。例えば、変換後画像がノイズ、抜けなどがなくCG(Computer Graphic)を用いて生成した距離画像である場合、中間画像を、ノイズ、計測誤差、センサの死角になる部分の抜けなどをシミュレーションして再現した再現画像とすることができる。この場合、第1の学習部105は、センサ画像を再現画像である中間画像に変換するための第1の画像変換パラメータと、中間画像を距離画像である変換後画像に変換するための第2の画像変換パラメータとを学習する。画像変換を段階的に行うことで、学習の収束性を向上させることが可能になり、認識性能を向上させることができる。 The intermediate image is an image that is different from both the sensor image and the converted image. For example, if the converted image is a distance image generated using CG (Computer Graphic) without noise or omission, the intermediate image is simulated for noise, measurement error, omission of the blind spot of the sensor, etc. It can be a reproduced reproduced image. In this case, the first learning unit 105 has a first image conversion parameter for converting the sensor image into an intermediate image which is a reproduced image, and a second learning unit 105 for converting the intermediate image into a converted image which is a distance image. Learn the image conversion parameters of. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.
 また、変換後画像を複数の種類の成分画像に分けて、センサ画像を複数の成分画像に変換した後に合成することで、変換後画像を得てもよい。この場合、第1の学習部105は、センサ画像をそれぞれの成分画像に変換するための複数の種類の画像変換パラメータを学習する。例えば、1つのセンサ画像から、変換後画像のテクスチャ成分の特徴を有する成分画像であるテクスチャ画像と、変換後画像の大域的な色成分の特徴を有する成分画像である色画像とを生成し、テクスチャ画像と色画像とを合成して変換後画像を得る場合が考えられる。この場合、第1の学習部105は、センサ画像をテクスチャ画像に変換するための画像変換パラメータと、センサ画像を色画像に変換するための画像変換パラメータとを学習する。なお、上記では2つの成分画像を用いる例を示したが、3つ以上の成分画像を用いて、変換後画像を得ることもできる。成分画像ごとに画像変換パラメータを学習することで、解くべき問題が容易化するため、学習の収束性が向上し、認識性能を向上させることができる。複数の成分画像を合成して変換後画像を得ることで、1種類の画像変換パラメータを用いてセンサ画像から変換後画像を得る場合よりも、目標画像群により近い特徴を有する変換後画像を得ることが可能になる。 Further, the converted image may be obtained by dividing the converted image into a plurality of types of component images, converting the sensor image into a plurality of component images, and then synthesizing the images. In this case, the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each component image. For example, from one sensor image, a texture image which is a component image having the characteristics of the texture component of the converted image and a color image which is a component image having the characteristics of the global color component of the converted image are generated. It is conceivable that the texture image and the color image are combined to obtain a converted image. In this case, the first learning unit 105 learns an image conversion parameter for converting the sensor image into a texture image and an image conversion parameter for converting the sensor image into a color image. In the above, an example of using two component images is shown, but a converted image can also be obtained by using three or more component images. By learning the image conversion parameters for each component image, the problem to be solved is facilitated, so that the convergence of learning can be improved and the recognition performance can be improved. By synthesizing a plurality of component images to obtain a converted image, a converted image having characteristics closer to the target image group is obtained than when a converted image is obtained from a sensor image using one type of image conversion parameter. Will be possible.
 物体の認識を行う場合、異なる複数の種類の画像処理を行うことが一般的である。実行する画像処理の内容によって、所望の結果を得やすい画像と、そうでない画像とが存在する。例えば、エッジ検出処理においては、エッジを抽出したい対象物の境界付近の輝度値がステップ状に変化している場合にはエッジを抽出しやすく、境界付近の輝度値が滑らかに変化している場合にはエッジを抽出しづらい。このように、実行する画像処理によってその画像が有するべき特徴、性質などがある。そこで、認識に用いる画像を1度だけ変換するのではなく、認識過程の各画像処理が容易になるような画像変換を、各画像処理の前処理として都度実行することもできる。この場合、第1の学習部105は、前処理を実行したい画像処理の数だけ画像変換パラメータを学習すればよく、各画像処理を実行した場合に得られる理想的な処理結果画像群を目標画像群とすることができる。 When recognizing an object, it is common to perform multiple different types of image processing. Depending on the content of the image processing to be executed, there are an image in which a desired result is easily obtained and an image in which the desired result is not obtained. For example, in the edge detection process, when the brightness value near the boundary of the object for which the edge is to be extracted changes in steps, it is easy to extract the edge, and when the brightness value near the boundary changes smoothly. It is difficult to extract edges. In this way, the image processing to be performed has features, properties, and the like that the image should have. Therefore, instead of converting the image used for recognition only once, image conversion that facilitates each image processing in the recognition process can be executed each time as a preprocessing for each image processing. In this case, the first learning unit 105 only needs to learn the image conversion parameters for the number of image processes for which preprocessing is desired, and targets an ideal processing result image group obtained when each image processing is executed. Can be a group.
 以上説明したように、本実施の形態にかかる物体認識装置10によれば、認識処理結果に基づいて画像変換パラメータを評価し、評価結果を得ることができる。このため画像変換パラメータが認識処理に及ぼす影響を確認することができる。したがって、認識処理を実行するときの環境に合わせた画像変換パラメータを選択することが可能になり、認識処理を実行するときの環境が変化する場合であっても、認識性能を向上させることが可能となる。 As described above, according to the object recognition device 10 according to the present embodiment, the image conversion parameter can be evaluated based on the recognition processing result, and the evaluation result can be obtained. Therefore, it is possible to confirm the influence of the image conversion parameter on the recognition process. Therefore, it is possible to select the image conversion parameter according to the environment when the recognition process is executed, and it is possible to improve the recognition performance even when the environment when the recognition process is executed changes. It becomes.
 また、画像変換パラメータは、センサ画像を予め定められた特徴を有する画像に画像変換するためのパラメータである。物体認識装置10は、画像変換パラメータを予め定められた特徴ごとに学習する第1の学習部105を有し、画像変換部102は、第1の学習部105の学習結果である画像変換パラメータを用いてセンサ画像を画像変換する。このような構成を有することにより、出力部104は、予め定められた特徴ごとの学習結果である画像変換パラメータの評価結果を得ることができるようになる。したがって、どのような特徴を有する画像に画像変換すれば認識性能を向上させることができるようになるかを把握することが可能になる。 Further, the image conversion parameter is a parameter for image conversion of the sensor image into an image having predetermined features. The object recognition device 10 has a first learning unit 105 that learns image conversion parameters for each predetermined feature, and the image conversion unit 102 uses an image conversion parameter that is a learning result of the first learning unit 105. The sensor image is converted into an image using the image. By having such a configuration, the output unit 104 can obtain the evaluation result of the image conversion parameter which is the learning result for each predetermined feature. Therefore, it is possible to grasp what kind of characteristics the image has to be converted into an image so that the recognition performance can be improved.
 また本実施の形態では、画像変換部102は、複数の段階の画像変換を行ってセンサ画像を変換後画像に変換し、第1の学習部105は、画像変換の段階ごとに用いられる複数の種類の画像変換パラメータのそれぞれを学習する。画像変換を段階的に行うことで、学習の収束性を向上させることが可能になり、認識性能を向上させることができる。 Further, in the present embodiment, the image conversion unit 102 performs image conversion in a plurality of stages to convert the sensor image into an image after conversion, and the first learning unit 105 is used for each of the plurality of image conversion stages. Learn each of the types of image conversion parameters. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.
 また、本実施の形態では、画像変換部102は、センサ画像を複数の成分画像に変換した後、複数の成分画像を合成して変換後画像を取得することができる。この場合、第1の学習部105は、センサ画像を複数の成分画像のそれぞれに変換するための複数の種類の画像変換パラメータを学習する。このような構成を有することで、物体認識装置10は、1種類の画像変換パラメータを用いてセンサ画像から変換後画像を得る場合よりも、目標画像群により近い特徴を有する変換後画像を得ることが可能になる。 Further, in the present embodiment, the image conversion unit 102 can convert the sensor image into a plurality of component images and then synthesize the plurality of component images to acquire the converted image. In this case, the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each of the plurality of component images. By having such a configuration, the object recognition device 10 obtains a converted image having features closer to the target image group than when obtaining a converted image from the sensor image using one type of image conversion parameter. Becomes possible.
 また、物体認識装置10は、複数の画像変換パラメータのそれぞれを用いた場合の評価部108の評価結果に基づいて、画像変換部102が用いる画像変換パラメータを決定する画像変換パラメータ決定部107を有する。このような構成を有することで、ユーザが評価結果を見て手動で画像変換パラメータを選択しなくても、自動的に、認識性能を向上させることが可能な画像変換パラメータを選択することが可能になる。 Further, the object recognition device 10 has an image conversion parameter determination unit 107 that determines the image conversion parameter used by the image conversion unit 102 based on the evaluation result of the evaluation unit 108 when each of the plurality of image conversion parameters is used. .. By having such a configuration, it is possible to automatically select an image conversion parameter capable of improving recognition performance without the user having to manually select an image conversion parameter by looking at the evaluation result. become.
 物体認識装置10は、評価部108が画像変換パラメータを評価するために用いるパラメータである評価パラメータの入力を受け付ける入力受付部109を有する。評価部108は、入力受付部109が受け付けた評価パラメータを用いて画像変換パラメータを評価する。評価パラメータは、例えば、評価値の大きさに影響を与える複数の要素のそれぞれが、評価値に与える影響を変更するための重み係数である。このような構成を有することで、ユーザは、使用用途に合わせて評価パラメータを入力することで、ユーザの使用用途に適した画像変換パラメータの評価値を得ることが可能になる。 The object recognition device 10 has an input receiving unit 109 that receives input of evaluation parameters, which are parameters used by the evaluation unit 108 to evaluate image conversion parameters. The evaluation unit 108 evaluates the image conversion parameter using the evaluation parameter received by the input reception unit 109. The evaluation parameter is, for example, a weighting coefficient for changing the influence of each of the plurality of elements affecting the magnitude of the evaluation value on the evaluation value. With such a configuration, the user can obtain an evaluation value of an image conversion parameter suitable for the user's intended use by inputting the evaluation parameter according to the intended use.
 物体認識装置10の認識部103が出力する認識結果は、認識部103の認識処理時間および認識部103が認識した対象物体の個数の少なくともいずれかを含む。このような構成を有することで、評価部108は、認識部103の認識処理時間および認識部103が認識した対象物体の個数の少なくともいずれかに基づいて、画像変換パラメータの評価値を算出することになる。認識部103が認識した対象物体の個数nrと、実際の対象物体の個数Nrとを用いることで、認識精度prを算出することができる。したがって、物体認識装置10は、認識処理時間、認識精度prなどを考慮して画像変換パラメータを評価することが可能になる。 The recognition result output by the recognition unit 103 of the object recognition device 10 includes at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103. With such a configuration, the evaluation unit 108 calculates the evaluation value of the image conversion parameter based on at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103. become. The number n r of the target object recognition unit 103 recognizes, by using the number N r of the real object, can be calculated recognition accuracy p r. Therefore, the object recognition device 10 can evaluate the image conversion parameters in consideration of the recognition processing time, the recognition accuracy pr, and the like.
実施の形態2.
 図8は、実施の形態2にかかる物体認識装置20の機能構成を示す図である。物体認識装置20は、画像取得部101と、画像変換部120と、認識部103と、出力部104と、第1の学習部105と、記憶部106と、画像変換パラメータ決定部107と、評価部108と、入力受付部109と、ロボット110とを有する。物体認識装置20は、ロボット110を備え、対象物体をピッキングする機能を有するため、対象物取り出し装置と称することもできる。物体認識装置20は、ロボット110を備えるため、ロボット110の動作結果に基づいて、画像変換パラメータの評価を行うことができる。
Embodiment 2.
FIG. 8 is a diagram showing a functional configuration of the object recognition device 20 according to the second embodiment. The object recognition device 20 evaluates the image acquisition unit 101, the image conversion unit 120, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, and a robot 110. Since the object recognition device 20 includes the robot 110 and has a function of picking an object, it can also be called an object extraction device. Since the object recognition device 20 includes the robot 110, the image conversion parameters can be evaluated based on the operation result of the robot 110.
 物体認識装置20は、実施の形態1にかかる物体認識装置10の機能構成に加えて、ロボット110を有する。以下、実施の形態1と同様の機能構成については、実施の形態1と同じ符号を用いて詳細な説明を省略し、実施の形態1と異なる部分について主に説明する。 The object recognition device 20 has a robot 110 in addition to the functional configuration of the object recognition device 10 according to the first embodiment. Hereinafter, the same functional configuration as that of the first embodiment will be omitted in detail by using the same reference numerals as those of the first embodiment, and the parts different from the first embodiment will be mainly described.
 出力部104は、認識部103の認識結果をロボット110に出力する。ロボット110は、出力部104が出力する認識結果に基づいて対象物体を把持する。ロボット110は、対象物体を把持する動作の動作結果を評価部108に出力する。評価部108は、認識部103の認識結果に加えて、ロボット110の動作結果に基づいて、画像変換パラメータを評価する。ここでロボット110の動作結果には、ロボット110が対象物体の把持に成功した確率と、把持動作時間と、把持失敗原因とのうち少なくとも1つを含む。 The output unit 104 outputs the recognition result of the recognition unit 103 to the robot 110. The robot 110 grips the target object based on the recognition result output by the output unit 104. The robot 110 outputs the operation result of the operation of gripping the target object to the evaluation unit 108. The evaluation unit 108 evaluates the image conversion parameter based on the operation result of the robot 110 in addition to the recognition result of the recognition unit 103. Here, the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in gripping the target object, the gripping operation time, and the cause of the grip failure.
 ロボット110は、対象物体を把持してタスクを実行するために必要な物体操作を行ったりすることができるツールを有する。例えば、タスクが複数のコンベア間の対象物体の搬送であり、対象物体の表面が凹凸のない滑らかな面である場合、ツールとして吸着パッドを用いることができる。また、ツールは、2つの爪によって対象物体を挟んで把持するグリッパハンドであってもよい。 The robot 110 has a tool capable of grasping an object and performing an object operation necessary for executing a task. For example, when the task is to transport the target object between a plurality of conveyors and the surface of the target object is a smooth surface without unevenness, a suction pad can be used as a tool. Further, the tool may be a gripper hand that grips the target object by sandwiching it with two claws.
 ロボット110が対象物体の把持に成功したと判定するための条件は、例えばツールがグリッパハンドである場合、対象物体に対してグリッパハンドを差し込んでグリッパハンドを閉じたときの開き幅が、予め定められた範囲内であることとすることができる。或いは、ツールがグリッパハンドであり、ロボット110が対象物体を把持した後、把持した対象物体を搬送する場合、ロボット110が対象物体の把持に成功したと判定するための条件は、搬送先で対象物体からグリッパハンドを開放する直前に対象物体を保持できていることとしてもよい。ロボット110が対象物体の把持に成功したと判定するための条件は、上記の例に限定されず、ロボット110が有するツールの種類、ロボット110に行わせる作業内容などによって適宜定義することができる。 The condition for determining that the robot 110 has successfully grasped the target object is that, for example, when the tool is a gripper hand, the opening width when the gripper hand is inserted into the target object and the gripper hand is closed is predetermined. It can be within the specified range. Alternatively, when the tool is a gripper hand and the robot 110 grips the target object and then transports the gripped target object, the condition for determining that the robot 110 succeeds in gripping the target object is the target at the transport destination. It may be assumed that the target object can be held immediately before the gripper hand is released from the object. The conditions for determining that the robot 110 has succeeded in grasping the target object are not limited to the above examples, and can be appropriately defined depending on the type of tool possessed by the robot 110, the work content to be performed by the robot 110, and the like.
 上記において、対象物体を保持できているか否かに基づいて、ロボット110が対象物体の把持に成功したと判定するための条件を定める例を説明した。対象物体を保持できているか否かは、例えば、使用しているツールが対象物体の保持状態を検知する機能を搭載している場合、検知結果を使用して判定することができる。或いは、カメラなどの外部センサ情報を利用して、対象物体を保持できているか否かを判定してもよい。例えばロボット110が有するツールが電動ハンドである場合、電動ハンドを動作させるときの電流値を測定することで、対象物体を保持できているか否かを判定する機能を有する製品がある。カメラ画像を用いる場合、対象物体を掴んでいないときのツールの画像を予め記憶しておき、把持動作後のツールを撮影した画像との差分をとり、差分に基づいて、対象物体を保持できているか否かを判定する方法がある。 In the above, an example of defining conditions for determining that the robot 110 has succeeded in grasping the target object based on whether or not the target object can be held has been described. Whether or not the target object can be held can be determined by using the detection result, for example, when the tool being used is equipped with a function of detecting the holding state of the target object. Alternatively, it may be determined whether or not the target object can be held by using the information of an external sensor such as a camera. For example, when the tool possessed by the robot 110 is an electric hand, there is a product having a function of determining whether or not the target object can be held by measuring the current value when operating the electric hand. When using a camera image, the image of the tool when the target object is not grasped can be stored in advance, the difference from the image taken by the tool after the gripping operation can be taken, and the target object can be held based on the difference. There is a way to determine if it is.
 ロボット110の動作結果に把持成功率を含めることで、評価部108は、把持成功率に基づいて、画像変換パラメータを評価するため、画像変換部102は、把持成功率が高くなるような画像変換パラメータを用いることが可能になる。またロボット110の動作結果は、把持動作時間を含むこともできる。把持動作時間は、例えば、ロボット110が有するツールがグリッパハンドであり、ロボット110が把持した対象物体を搬送する場合、グリッパハンドを閉じてから搬送先で開放するまでの時間とすることができる。ロボット110の動作結果に把持動作時間を含めることで、評価部108は、把持動作時間に基づいて、画像変換パラメータを評価するため、画像変換部102は、把持動作が早くなるような画像変換パラメータを用いることが可能になる。 By including the gripping success rate in the operation result of the robot 110, the evaluation unit 108 evaluates the image conversion parameters based on the gripping success rate. Therefore, the image conversion unit 102 performs image conversion so that the gripping success rate becomes high. It becomes possible to use parameters. The operation result of the robot 110 can also include the gripping operation time. For example, when the tool possessed by the robot 110 is a gripper hand and the target object gripped by the robot 110 is transported, the gripping operation time can be the time from closing the gripper hand to opening the gripper hand at the transport destination. By including the gripping operation time in the operation result of the robot 110, the evaluation unit 108 evaluates the image conversion parameter based on the gripping operation time, so that the image conversion unit 102 evaluates the image conversion parameter so that the gripping operation becomes faster. Can be used.
 ロボット110の把持失敗原因は、例えば、掴み損ね、搬送中の落下、複数把持などがある。ロボット110の動作結果に把持失敗原因を含めることで、評価部108は、失敗原因に基づいて、画像変換パラメータを評価するため、画像変換部102は、特定の失敗原因を低減することができる画像変換パラメータを用いることが可能になる。例えば、供給前の対象物体を保管する供給箱中で対象物体の把持に失敗したとしても、対象物体は供給箱の中に落下する可能性が高く、把持動作を再度行えばよいため、リスクは低い。これに対して、搬送中に対象物体を落としてしまうと、対象物体が落下して周囲に散乱する可能性があり、元の状態に戻すには、ロボット110の複雑な制御が必要となったり、時間がかかったりするため、リスクは高い。このため、リスクの低い把持失敗原因に対しては評価の重みを小さくし、リスクの高い把持失敗原因に対しては評価の重みを大きくすることで、画像変換部102は、対象物体が周囲に散乱するリスクが少ない画像変換パラメータを用いることが可能になる。 Causes of grip failure of the robot 110 include, for example, failure to grip, dropping during transportation, and multiple grips. By including the cause of gripping failure in the operation result of the robot 110, the evaluation unit 108 evaluates the image conversion parameter based on the cause of failure, so that the image conversion unit 102 can reduce the specific cause of failure. It is possible to use conversion parameters. For example, even if the target object fails to be gripped in the supply box that stores the target object before supply, the target object is likely to fall into the supply box and the gripping operation may be performed again, so the risk is high. low. On the other hand, if the target object is dropped during transportation, the target object may fall and be scattered around, and complicated control of the robot 110 may be required to return to the original state. , It takes time, so the risk is high. Therefore, by reducing the evaluation weight for the low-risk gripping failure cause and increasing the evaluation weight for the high-risk gripping failure cause, the image conversion unit 102 has the target object in the surroundings. It is possible to use image conversion parameters with less risk of scattering.
 図9は、図8に示す物体認識装置20が運用開始前に行う処理について説明するためのフローチャートである。なお、図9において、物体認識装置10の処理と同様の部分については、図6と同じ符号を付することで詳細な説明を省略する。以下、図6と異なる部分について主に説明する。 FIG. 9 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 before the start of operation. In FIG. 9, the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 6, and detailed description thereof will be omitted. Hereinafter, the parts different from FIG. 6 will be mainly described.
 ステップS121からステップS126の動作は、図6と同様である。認識処理が行われると、ロボット110は、認識結果に基づいて、ピッキングを行う(ステップS201)。ロボット110は、ピッキングの動作結果を評価部108に出力する。 The operation from step S121 to step S126 is the same as in FIG. When the recognition process is performed, the robot 110 performs picking based on the recognition result (step S201). The robot 110 outputs the picking operation result to the evaluation unit 108.
 評価部108は、認識結果に加えて、ロボット110の動作結果に基づいて、評価値を算出する(ステップS202)。具体的には、評価部108は、例えば以下に示す数式(8)を用いて、評価値Ecを算出することができる。 The evaluation unit 108 calculates an evaluation value based on the operation result of the robot 110 in addition to the recognition result (step S202). Specifically, the evaluation unit 108 can calculate the evaluation value E c by using, for example, the following mathematical formula (8).
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 数式(8)において、pgは把持成功率を示し、tgは把持時間を示し、prは認識精度を示し、trは認識処理時間を示し、nf1,f2…は把持失敗原因の種類を示す。また、wpg,wtg,wpr,wtr,wf1,f2…は、重み係数を示す。入力受付部109が受け付ける評価パラメータは、重み係数wpg,wtg,wpr,wtr,wf1,f2…を含む。ただし、上記の評価値Ecの算出方法は一例であり、評価部108が使用する評価値Ecの算出方法は、上記の方法に限定されない。 In Equation (8), p g denotes the gripping success rate, t g represents the gripping time, p r represents the recognition accuracy, t r represents the recognition processing time, n f1, f2 ... gripping failure cause Indicates the type. Further, w pg , w tg , w pr , w tr , w f1, f2 ... Indicates a weighting coefficient. The evaluation parameters received by the input receiving unit 109 include weighting coefficients w pg , w tg , w pr , w tr , w f1, f2, and so on. However, the above method for calculating the evaluation value E c is an example, and the method for calculating the evaluation value E c used by the evaluation unit 108 is not limited to the above method.
 以下、ステップS128,S129の動作は図6と同様である。つまり、図9に示す処理は、認識処理と評価値を算出する処理との間にピッキング処理が追加で行われる点と、評価値を算出する処理の具体的な内容とが、図6に示す処理と異なる。 Hereinafter, the operations of steps S128 and S129 are the same as those in FIG. That is, in the process shown in FIG. 9, the point that the picking process is additionally performed between the recognition process and the process of calculating the evaluation value and the specific contents of the process of calculating the evaluation value are shown in FIG. Different from processing.
 図10は、図8に示す物体認識装置20が運用中に行う処理について説明するためのフローチャートである。なお、図10において、物体認識装置10の処理と同様の部分については、図7と同じ符号を付することで詳細な説明を省略する。以下、図7と異なる部分について主に説明する。 FIG. 10 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 during operation. In FIG. 10, the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 7, and detailed description thereof will be omitted. Hereinafter, the parts different from those in FIG. 7 will be mainly described.
 物体認識装置10が、認識処理の結果、対象物体が存在すると判断した場合、認識結果を出力するのに対して、物体認識装置20は、認識結果の出力の代わりに、ロボット110が、認識結果に基づいてピッキングを行う(ステップS203)。ロボット110がピッキングを行った後、物体認識装置20は、ステップS131から処理を繰り返す。 When the object recognition device 10 determines that the target object exists as a result of the recognition process, the object recognition device 20 outputs the recognition result, whereas the object recognition device 20 outputs the recognition result by the robot 110 instead of the recognition result output. Picking is performed based on (step S203). After the robot 110 picks, the object recognition device 20 repeats the process from step S131.
 なお、上記では、認識部103は、変換後画像に基づいて、対象物体の状態を認識することとしたが、ロボット110を有する物体認識装置20の認識部103は、ロボット110のハンドモデルを用いて、対象物体を把持することができる箇所を探索するサーチベースの手法を用いて、対象物体の状態を認識してもよい。認識結果が対象物体の位置姿勢情報である場合、対象物体の位置姿勢情報を、ロボット110がその対象物体を把持する際のロボット110の位置姿勢情報へ変換できることが望ましい。 In the above, the recognition unit 103 recognizes the state of the target object based on the converted image, but the recognition unit 103 of the object recognition device 20 having the robot 110 uses the hand model of the robot 110. Alternatively, the state of the target object may be recognized using a search-based method of searching for a location where the target object can be gripped. When the recognition result is the position / orientation information of the target object, it is desirable that the position / orientation information of the target object can be converted into the position / attitude information of the robot 110 when the robot 110 grips the target object.
 以上説明したように、実施の形態2にかかる物体認識装置20は、認識部103の認識結果に基づいて対象物体を把持するロボット110をさらに備える。物体認識装置20の評価部108は、ロボット110の動作結果に基づいて、画像変換パラメータを評価する。このような構成を有することで、物体認識装置20は、把持性能を向上させることができる画像変換パラメータを選択することが可能になり、ロボット110の把持成功率を向上させることが可能になる。 As described above, the object recognition device 20 according to the second embodiment further includes a robot 110 that grips the target object based on the recognition result of the recognition unit 103. The evaluation unit 108 of the object recognition device 20 evaluates the image conversion parameters based on the operation result of the robot 110. By having such a configuration, the object recognition device 20 can select an image conversion parameter that can improve the gripping performance, and can improve the gripping success rate of the robot 110.
 また、ロボット110の動作結果は、ロボット110が対象物体の把持に成功した確率と、把持動作時間と、把持失敗原因とのうち少なくとも1つを含む。ロボット110が対象物体の把持に成功した確率が動作結果に含まれる場合、把持成功率に基づいて画像変換パラメータが評価されるため、把持成功率を向上させることができるような画像変換パラメータを選択することが可能になり、ロボット110の把持成功率を向上させることが可能になる。また、把持動作時間が動作結果に含まれる場合、把持動作時間に基づいて画像変換パラメータが評価されるため、把持動作時間を短縮することが可能になる。把持失敗原因が動作結果に含まれる場合、把持失敗原因に基づいて画像変換パラメータが評価されるため、特定の把持失敗原因を減らすことが可能になる。 Further, the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in grasping the target object, the gripping operation time, and the cause of the grip failure. When the probability that the robot 110 succeeds in gripping the target object is included in the operation result, the image conversion parameter is evaluated based on the gripping success rate, so the image conversion parameter that can improve the gripping success rate is selected. This makes it possible to improve the gripping success rate of the robot 110. Further, when the gripping operation time is included in the operation result, the image conversion parameter is evaluated based on the gripping operation time, so that the gripping operation time can be shortened. When the cause of grip failure is included in the operation result, the image conversion parameter is evaluated based on the cause of grip failure, so that it is possible to reduce a specific cause of grip failure.
実施の形態3.
 図11は、実施の形態3にかかる物体認識装置30の機能構成を示す図である。物体認識装置30は、画像取得部101と、画像変換部102と、認識部103と、出力部104と、第1の学習部105と、記憶部106と、画像変換パラメータ決定部107と、評価部108と、入力受付部109と、ロボット110と、シミュレーション部111と、画像変換データセット生成部114と、画像変換データセット選択部115とを有する。シミュレーション部111は、第1生成部112と、第2生成部113とを有する。
Embodiment 3.
FIG. 11 is a diagram showing a functional configuration of the object recognition device 30 according to the third embodiment. The object recognition device 30 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, a robot 110, a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115. The simulation unit 111 has a first generation unit 112 and a second generation unit 113.
 物体認識装置30は、実施の形態2にかかる物体認識装置20の構成に加えて、シミュレーション部111と、画像変換データセット生成部114と、画像変換データセット選択部115とを有する。以下、実施の形態2と同様の機能構成については、実施の形態2と同じ符号を用いて詳細な説明を省略し、実施の形態2と異なる部分について主に説明する。 The object recognition device 30 includes a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115, in addition to the configuration of the object recognition device 20 according to the second embodiment. Hereinafter, the same functional configuration as in the second embodiment will be described in detail by using the same reference numerals as those in the second embodiment, and the parts different from the second embodiment will be mainly described.
 シミュレーション部111は、シミュレーションを用いて、目標画像を作成する。具体的には、シミュレーション部111は、シミュレーション条件に基づいて対象物体の配置状態を示す配置情報を生成する第1生成部112と、配置情報に基づいて対象物体を配置して目標画像を生成する第2生成部113とを有する。 The simulation unit 111 creates a target image using the simulation. Specifically, the simulation unit 111 generates a target image by arranging the first generation unit 112 that generates arrangement information indicating the arrangement state of the target object based on the simulation conditions and the target object based on the arrangement information. It has a second generation unit 113.
 第1生成部112が用いるシミュレーション条件は、例えば、センサ情報と、対象物体情報と、環境情報とを含む。センサ情報は、センサ画像を取得するセンサの焦点距離、画角、絞り値など、その値によって生成した空間内の状態が変化するものを含むことが望ましい。また、センサがステレオ計測を行う場合、センサ情報は、輻輳角、基線長などを含んでもよい。 The simulation conditions used by the first generation unit 112 include, for example, sensor information, target object information, and environmental information. It is desirable that the sensor information includes information such as the focal length, angle of view, and aperture value of the sensor that acquires the sensor image, which changes the state in the space generated by the values. Further, when the sensor performs stereo measurement, the sensor information may include a convergence angle, a baseline length, and the like.
 対象物体情報は、対象物体のCADモデル、対象物体の素材を示す情報などである。対象物体のCADモデルの場合、対象物体情報は、対象物体の有する面それぞれのテクスチャ情報を含んでもよい。対象物体情報は、シミュレーションを用いて、空間内に対象物体を配置したときに、空間内の対象物体の状態が一意に定まる程度の情報を含むことが望ましい。 The target object information is a CAD model of the target object, information indicating the material of the target object, and the like. In the case of the CAD model of the target object, the target object information may include the texture information of each surface of the target object. It is desirable that the target object information includes information to the extent that the state of the target object in the space is uniquely determined when the target object is placed in the space by using simulation.
 環境情報は、計測距離、計測深度、対象物体以外の物体の位置姿勢、外乱光の種類および強度などを含むことができる。対象物体以外の物体は、例えば、箱、計測台などである。シミュレーション条件を用いることで、シミュレーション部111は、詳細な条件でシミュレーションを行うことができ、様々な種類の目標画像を生成することができる。 Environmental information can include measurement distance, measurement depth, position / orientation of an object other than the target object, type and intensity of ambient light, and the like. Objects other than the target object are, for example, a box, a measuring table, and the like. By using the simulation conditions, the simulation unit 111 can perform the simulation under detailed conditions and can generate various types of target images.
 第1生成部112で生成した配置情報は、少なくとも1つの対象物体の配置状態を示す。複数の対象物体を空間内に配置する場合、複数の対象物体は、整列して配置されてもよいし、ばら積み状態であってもよい。ばら積み状態で対象物体を配置する場合、対象物体の簡易モデルを用いたシミュレーションを行った後に、算出された簡易モデル位置に対象物体を再配置することで、処理時間を短縮することができる。 The arrangement information generated by the first generation unit 112 indicates the arrangement state of at least one target object. When a plurality of target objects are arranged in space, the plurality of target objects may be arranged in an aligned manner or may be in a bulk state. When arranging the target objects in bulk, the processing time can be shortened by arranging the target objects at the calculated simple model positions after performing the simulation using the simple model of the target objects.
 第2生成部113で生成する目標画像は、RGB画像であってもよいし、距離画像であってもよい。RGB画像を用いる場合、対象物体および対象物体以外の物体の色またはテクスチャを設定することが望ましい。 The target image generated by the second generation unit 113 may be an RGB image or a distance image. When using an RGB image, it is desirable to set the color or texture of the target object and objects other than the target object.
 シミュレーション部111は、生成した目標画像を記憶部106に記憶させる。また、シミュレーション部111は、第1生成部112が配置情報を生成する際に使用したシミュレーション条件と、第1生成部112が生成した配置情報とを記憶部106に記憶させてもよい。このとき、シミュレーション部111は、配置情報を、画像変換データセットを構成する目標画像と対応づけて記憶することが望ましい。 The simulation unit 111 stores the generated target image in the storage unit 106. Further, the simulation unit 111 may store the simulation conditions used when the first generation unit 112 generates the arrangement information and the arrangement information generated by the first generation unit 112 in the storage unit 106. At this time, it is desirable that the simulation unit 111 stores the arrangement information in association with the target image constituting the image conversion data set.
 画像変換データセット生成部114は、画像取得部101が取得したセンサ画像と、シミュレーション部111が生成した目標画像とを含む画像変換データセットを生成する。画像変換データセット生成部114は、生成した画像変換データセットを、記憶部106に記憶させる。画像変換データセットは、1または複数のセンサ画像と、1または複数の目標画像とを含む。センサ画像および目標画像の画像数に制限はない。画像数が少なすぎる場合、画像変換パラメータの学習が収束しない可能性があり、画像数が多すぎる場合、学習時間が長くなる可能性がある。このため、ユーザの使用用途、センサの設置状況などに合わせて画像数を決定することが好ましい。また、目標画像の画像数と、センサ画像の画像数とは同程度が望ましいが、偏りがあってもよい。 The image conversion data set generation unit 114 generates an image conversion data set including the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111. The image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106. The image conversion dataset includes one or more sensor images and one or more target images. There is no limit to the number of images of the sensor image and the target image. If the number of images is too small, the learning of the image conversion parameters may not converge, and if the number of images is too large, the learning time may become long. Therefore, it is preferable to determine the number of images according to the intended use of the user, the installation status of the sensor, and the like. Further, the number of images of the target image and the number of images of the sensor image are preferably about the same, but there may be a bias.
 画像変換データセット選択部115は、センサ画像に基づいて、記憶部106に記憶された画像変換データセットの中から、第1の学習部105が学習に用いる画像変換データセットを選択する。具体的には、画像変換データセット選択部115は、センサ画像に基づいて、画像変換データセットを選択する際の基準となる選択評価値Epを算出し、算出した選択評価値Epに基づいて、画像変換データセットを選択する。例えば、画像変換データセット選択部115は、選択評価値Epが、予め定められた閾値以下の画像変換データセットのみを選択することができる。画像変換データセット選択部115は、1または複数の画像変換データセットを選択することができる。 The image conversion data set selection unit 115 selects the image conversion data set used for learning by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 based on the sensor image. Specifically, the image conversion data set selection unit 115, based on the sensor image, and calculates the selected evaluation value E p as a reference in selecting the converted image data sets, based on the calculated selected evaluation value E p And select the image conversion dataset. For example, the image conversion data set selection unit 115 can select only the image conversion data set whose selection evaluation value E p is equal to or less than a predetermined threshold value. The image conversion data set selection unit 115 can select one or a plurality of image conversion data sets.
 画像変換データセット選択部115は、選択した画像変換データセットを第1の学習部105に出力する。第1の学習部105は、画像変換データセット選択部115が選択した画像変換データセットを用いて、画像変換パラメータを学習する。このため、第1の学習部105は、シミュレーション部111が生成した目標画像を用いて画像変換パラメータを学習することになる。 The image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105. The first learning unit 105 learns the image conversion parameters using the image conversion data set selected by the image conversion data set selection unit 115. Therefore, the first learning unit 105 learns the image conversion parameters using the target image generated by the simulation unit 111.
 選択評価値Epは、例えば、以下に示す数式(9)を用いて算出される。 The selective evaluation value E p is calculated using, for example, the following mathematical formula (9).
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
 ここで、Itはセンサ画像を示し、IIsは画像変換データセットを構成する目標画像群を示し、Nsは目標画像群に含まれる目標画像の画像数を示す。また、FI(I)は、画像Iからスカラー値を算出するための任意の関数を示す。FI(I)は、例えば、画像の平均値算出関数、エッジ数算出関数などである。 Here, I t represents the sensor image, II s represents the target image groups constituting the converted image data set, N s denotes the number of images of the target image included in the target image group. Further, F I (I) refers to any function for calculating the scalar values from the image I. F I (I) is, for example, average value calculation function of the image, the edge number calculation function, and the like.
 また、画像変換データセットを構成する目標画像群に含まれる各目標画像に対応づけられた配置情報がある場合、画像変換データセット選択部115は、以下の数式(10)を用いて選択評価値Epを算出してもよい。 Further, when there is arrangement information associated with each target image included in the target image group constituting the image conversion data set, the image conversion data set selection unit 115 uses the following formula (10) to select and evaluate the evaluation value. E p may be calculated.
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 ここで、lsはセンサ画像を取得するセンサの計測距離を示し、ltは目標画像群を構成する目標画像の計測距離を示し、wI,wlは重み係数を示す。センサの計測距離が厳密に分からない場合にはおおよその距離が用いられてもよい。なお、上記の選択評価値Epの算出方法は一例であり、上記の方法に限定されない。 Here, l s indicates the measurement distance of the sensor that acquires the sensor image, l t indicates the measurement distance of the target image constituting the target image group, and w I and w l indicate the weighting coefficient. If the measurement distance of the sensor is not exactly known, an approximate distance may be used. The method for calculating the selective evaluation value E p is an example, and is not limited to the above method.
 図12は、図11に示すシミュレーション部111の動作を説明するためのフローチャートである。 FIG. 12 is a flowchart for explaining the operation of the simulation unit 111 shown in FIG.
 シミュレーション部111の第1生成部112は、シミュレーション条件を取得する(ステップS301)。シミュレーション条件は、例えば、シミュレーション部111内に備わる記憶領域から取得される。第1生成部112は、シミュレーション条件に基づいて対象物体の配置状態を示す配置情報を生成する(ステップS302)。第1生成部112は、生成した配置情報をシミュレーション部111の第2生成部113に出力する。 The first generation unit 112 of the simulation unit 111 acquires the simulation conditions (step S301). The simulation conditions are acquired from, for example, a storage area provided in the simulation unit 111. The first generation unit 112 generates placement information indicating the placement state of the target object based on the simulation conditions (step S302). The first generation unit 112 outputs the generated arrangement information to the second generation unit 113 of the simulation unit 111.
 第2生成部113は、第1生成部112が生成した配置情報に基づいて対象物体を配置して目標画像を生成する(ステップS303)。第2生成部113は、生成した目標画像を出力して記憶部106に記憶させる(ステップS304)。 The second generation unit 113 arranges the target object based on the arrangement information generated by the first generation unit 112 and generates a target image (step S303). The second generation unit 113 outputs the generated target image and stores it in the storage unit 106 (step S304).
 図13は、図11に示す物体認識装置30が運用開始前に行う処理について説明するためのフローチャートである。なお、図13において、物体認識装置10または物体認識装置20の処理と同様の部分については、図6または図9と同じ符号を付することで詳細な説明を省略する。以下、図6または図9と異なる部分について主に説明する。 FIG. 13 is a flowchart for explaining the process performed by the object recognition device 30 shown in FIG. 11 before the start of operation. In FIG. 13, the same parts as those of the object recognition device 10 or the object recognition device 20 are designated by the same reference numerals as those in FIG. 6 or 9, and detailed description thereof will be omitted. Hereinafter, the parts different from those in FIG. 6 or 9 will be mainly described.
 物体認識装置30のシミュレーション部111は、まず、シミュレーション処理を行う(ステップS311)。ステップS311のシミュレーション処理は、図12のステップS301~ステップS304に示す処理である。 The simulation unit 111 of the object recognition device 30 first performs a simulation process (step S311). The simulation process of step S311 is the process shown in steps S301 to S304 of FIG.
 続いて画像変換データセット生成部114は、画像取得部101が取得したセンサ画像と、シミュレーション部111が生成した目標画像とを用いて、画像変換データセットを生成する(ステップS312)。画像変換データセット生成部114は、生成した画像変換データセットを記憶部106に記憶させる。 Subsequently, the image conversion data set generation unit 114 generates an image conversion data set using the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111 (step S312). The image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106.
 画像変換データセット選択部115は、記憶部106に記憶された画像変換データセットの中から、第1の学習部105が用いる画像変換データセットを選択する(ステップS313)。画像変換データセット選択部115は、選択した画像変換データセットを第1の学習部105に出力する。 The image conversion data set selection unit 115 selects the image conversion data set used by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 (step S313). The image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105.
 以下、ステップS121~S126、ステップS201,S202、ステップS128,S129の処理は、図6または図9を用いて説明した処理と同様である。ステップS121において、画像変換パラメータ学習処理は、ステップS313において選択された画像変換データセットを用いて実行されることになる。 Hereinafter, the processes of steps S121 to S126, steps S201, S202, steps S128, and S129 are the same as the processes described with reference to FIG. 6 or 9. In step S121, the image conversion parameter learning process will be executed using the image conversion data set selected in step S313.
 以上説明したように、実施の形態3にかかる物体認識装置30は、シミュレーションを用いて目標画像を作成し、作成した目標画像を用いて、画像変換パラメータの学習を行う。また、物体認識装置30は、シミュレーションを用いて作成した目標画像と、画像取得部101が取得したセンサ画像とを含む画像変換データセットを生成し、生成した画像変換データセットを用いて、画像変換パラメータの学習を行う。このような構成を有することで、画像変換パラメータを学習するために必要な目標画像および画像変換データセットを容易に生成することが可能になる。また、目標画像は、シミュレーション条件に基づいて生成され、対象物体の配置状態を示す配置情報に基づいて、生成される。このため、シミュレーション条件を調整することで、様々な目標画像を生成することが可能になる。 As described above, the object recognition device 30 according to the third embodiment creates a target image using simulation, and learns image conversion parameters using the created target image. Further, the object recognition device 30 generates an image conversion data set including a target image created by using simulation and a sensor image acquired by the image acquisition unit 101, and uses the generated image conversion data set to perform image conversion. Learn the parameters. Having such a configuration makes it possible to easily generate a target image and an image conversion data set necessary for learning image conversion parameters. Further, the target image is generated based on the simulation conditions, and is generated based on the arrangement information indicating the arrangement state of the target object. Therefore, by adjusting the simulation conditions, it is possible to generate various target images.
 物体認識装置30は、センサ画像に基づいて、画像変換データセット生成部114が生成した画像変換データセットの中から、第1の学習部105が用いる画像変換データセットを選択する画像変換データセット選択部115を有する。このような構成を有することで、周辺環境に適した画像変換データセットに限定して、画像変換パラメータを学習することが可能になり、学習の効率化を図ることができる。 The object recognition device 30 selects an image conversion data set to be used by the first learning unit 105 from the image conversion data sets generated by the image conversion data set generation unit 114 based on the sensor image. It has a part 115. By having such a configuration, it becomes possible to learn the image conversion parameters only for the image conversion data set suitable for the surrounding environment, and it is possible to improve the learning efficiency.
実施の形態4.
 図14は、実施の形態4にかかる物体認識装置40の機能構成を示す図である。物体認識装置40は、画像取得部101と、画像変換部102と、認識部103と、出力部104と、第1の学習部105と、記憶部106と、画像変換パラメータ決定部107と、評価部108と、入力受付部109と、ロボット110と、シミュレーション部111と、画像変換データセット生成部114と、画像変換データセット選択部115と、認識データセット生成部116と、第2の学習部117と、認識パラメータ決定部118とを有する。
Embodiment 4.
FIG. 14 is a diagram showing a functional configuration of the object recognition device 40 according to the fourth embodiment. The object recognition device 40 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. Unit 108, input reception unit 109, robot 110, simulation unit 111, image conversion data set generation unit 114, image conversion data set selection unit 115, recognition data set generation unit 116, and second learning unit. It has 117 and a recognition parameter determination unit 118.
 物体認識装置40は、実施の形態3にかかる物体認識装置30の構成に加えて、認識データセット生成部116と、第2の学習部117と、認識パラメータ決定部118とを有する。以下、実施の形態3と同様の機能構成については、実施の形態3と同じ符号を用いて詳細な説明を省略し、実施の形態3と異なる部分について主に説明する。 The object recognition device 40 has a recognition data set generation unit 116, a second learning unit 117, and a recognition parameter determination unit 118, in addition to the configuration of the object recognition device 30 according to the third embodiment. Hereinafter, the same functional configuration as in the third embodiment will be described in detail by using the same reference numerals as those in the third embodiment, and the parts different from the third embodiment will be mainly described.
 認識データセット生成部116は、認識部103が用いる認識手法に基づいて、認識部103が認識処理を行うときに用いるアノテーションデータを生成し、生成したアノテーションデータと目標画像とを含む認識データセットを生成する。認識データセット生成部116は、生成した認識データセットを記憶部106に記憶させる。アノテーションデータは、認識部103が用いる認識手法によって異なる。例えば、認識手法が、対象物体の画像上の位置と大きさとを出力するニューラルネットワークである場合、アノテーションデータは、対象物体の画像上の位置および大きさである。 The recognition data set generation unit 116 generates annotation data to be used when the recognition unit 103 performs recognition processing based on the recognition method used by the recognition unit 103, and generates a recognition data set including the generated annotation data and a target image. Generate. The recognition data set generation unit 116 stores the generated recognition data set in the storage unit 106. The annotation data differs depending on the recognition method used by the recognition unit 103. For example, when the recognition method is a neural network that outputs the position and size of the target object on the image, the annotation data is the position and size of the target object on the image.
 第2の学習部117は、認識データセット生成部116が生成した認識データセットに基づいて、認識部103が用いるパラメータである認識パラメータを学習する。第2の学習部117は、例えば、図3に示す第1の学習部105と同様の構成により実現することができる。第2の学習部117は、状態観測部11と、機械学習部12とを含む。機械学習部12は、報酬計算部121と、関数更新部122とを含む。なお、図3に示す例は、強化学習を利用して機械学習を行う例であるが、第2の学習部117は、他の公知の方法、例えばニューラルネットワーク、遺伝的プログラミング、機能論理プログラミング、サポートベクターマシンなどに従って機械学習を実行してもよい。第2の学習部117は、認識パラメータの学習結果を記憶部106に記憶させる。認識パラメータは、例えば、認識手法が、ニューラルネットワークを用いる場合、認識パラメータは、ニューラルネットワークを構成する各ユニット間の重み係数を含む。 The second learning unit 117 learns the recognition parameter, which is a parameter used by the recognition unit 103, based on the recognition data set generated by the recognition data set generation unit 116. The second learning unit 117 can be realized, for example, by the same configuration as the first learning unit 105 shown in FIG. The second learning unit 117 includes a state observation unit 11 and a machine learning unit 12. The machine learning unit 12 includes a reward calculation unit 121 and a function update unit 122. The example shown in FIG. 3 is an example of performing machine learning using reinforcement learning, but the second learning unit 117 uses other known methods such as neural networks, genetic programming, and functional logic programming. Machine learning may be performed according to a support vector machine or the like. The second learning unit 117 stores the learning result of the recognition parameter in the storage unit 106. The recognition parameter includes, for example, when the recognition method uses a neural network, the recognition parameter includes a weighting coefficient between each unit constituting the neural network.
 認識パラメータ決定部118は、複数の認識パラメータのそれぞれを用いた場合の評価部108の評価結果に基づいて、認識部103が用いる認識パラメータを決定する。認識パラメータ決定部118は、決定した認識パラメータを認識部103に出力する。 The recognition parameter determination unit 118 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used. The recognition parameter determination unit 118 outputs the determined recognition parameter to the recognition unit 103.
 認識パラメータ決定部118は、例えば、評価値が最も大きい認識パラメータを、認識部103が用いる認識パラメータとすることができる。また、出力部104が認識パラメータごとに評価部108の評価結果を出力し、入力受付部109が認識パラメータを選択する入力を受け付ける場合、認識パラメータ決定部118は、ユーザが選択した認識パラメータを認識部103に出力することもできる。また、画像変換パラメータによって認識パラメータの評価値は変化すると考えられるため、学習した認識パラメータ1つに対して、画像変換部102で用いる画像変換パラメータを変えて複数の評価値を算出してもよい。この場合、画像変換パラメータ決定部107は、算出された評価値と画像変換パラメータとの組合せに基づいて、画像変換パラメータを決定することができる。 The recognition parameter determination unit 118 can, for example, set the recognition parameter having the largest evaluation value as the recognition parameter used by the recognition unit 103. Further, when the output unit 104 outputs the evaluation result of the evaluation unit 108 for each recognition parameter and the input reception unit 109 accepts the input for selecting the recognition parameter, the recognition parameter determination unit 118 recognizes the recognition parameter selected by the user. It can also be output to unit 103. Further, since it is considered that the evaluation value of the recognition parameter changes depending on the image conversion parameter, a plurality of evaluation values may be calculated by changing the image conversion parameter used by the image conversion unit 102 for one learned recognition parameter. .. In this case, the image conversion parameter determination unit 107 can determine the image conversion parameter based on the combination of the calculated evaluation value and the image conversion parameter.
 図15は、図14に示す物体認識装置40が運用開始前に行う処理について説明するためのフローチャートである。なお、図15において、物体認識装置30の処理と同様の部分については、図13と同じ符号を付することで詳細な説明を省略する。以下、図13と異なる部分について主に説明する。 FIG. 15 is a flowchart for explaining the processing performed by the object recognition device 40 shown in FIG. 14 before the start of operation. In FIG. 15, the same parts as those of the object recognition device 30 are designated by the same reference numerals as those in FIG. 13, and detailed description thereof will be omitted. Hereinafter, the parts different from FIG. 13 will be mainly described.
 物体認識装置40は、ステップS311のシミュレーション処理を行った後、ステップS312,S313,S121の処理と並行して、認識データセットを生成し(ステップS401)、生成した認識データセットを用いて認識パラメータを学習する認識パラメータ学習処理を行う(ステップS402)。 After performing the simulation process of step S311, the object recognition device 40 generates a recognition data set in parallel with the process of steps S312, S313, and S121 (step S401), and uses the generated recognition data set to generate recognition parameters. The recognition parameter learning process for learning is performed (step S402).
 続いて物体認識装置40は、ステップS122,S123の処理の後、画像変換パラメータおよび認識パラメータを選択する(ステップS403)。以下、ステップS125,S126,S201,S202の処理は、物体認識装置30と同様である。 Subsequently, the object recognition device 40 selects the image conversion parameter and the recognition parameter after the processing of steps S122 and S123 (step S403). Hereinafter, the processing of steps S125, S126, S201, and S202 is the same as that of the object recognition device 30.
 評価値が算出された後、物体認識装置40の画像変換部102は、全ての画像変換パラメータおよび認識パラメータの組合せの評価値を算出したか否かを判断する(ステップS404)。全ての画像変換パラメータおよび認識パラメータの組合せの評価値を算出した場合(ステップS404:Yes)、物体認識装置40は、ステップS129の処理を行い、認識パラメータを決定する(ステップS405)。全ての画像変換パラメータおよび認識パラメータの組合せの評価値を算出していない場合(ステップS404:No)、物体認識装置40は、ステップS403の処理に戻る。 After the evaluation value is calculated, the image conversion unit 102 of the object recognition device 40 determines whether or not the evaluation value of all the image conversion parameters and the combination of the recognition parameters has been calculated (step S404). When the evaluation value of the combination of all the image conversion parameters and the recognition parameters is calculated (step S404: Yes), the object recognition device 40 performs the process of step S129 and determines the recognition parameters (step S405). When the evaluation values of all the image conversion parameters and the combinations of the recognition parameters have not been calculated (step S404: No), the object recognition device 40 returns to the process of step S403.
 以上説明したように、実施の形態4にかかる物体認識装置40は、認識部103が用いる認識手法に基づいて、認識部103が用いるアノテーションデータを生成し、生成したアノテーションデータと、目標画像とを含む認識データセットを用いて、認識パラメータを学習する。このような構成を有することで、物体認識装置40は、様々なシチュエーションの認識データセットを容易に生成することが可能になる。 As described above, the object recognition device 40 according to the fourth embodiment generates annotation data used by the recognition unit 103 based on the recognition method used by the recognition unit 103, and generates the generated annotation data and the target image. Learn the recognition parameters using the included recognition data set. With such a configuration, the object recognition device 40 can easily generate recognition data sets of various situations.
 また、物体認識装置40は、複数の認識パラメータのそれぞれを用いた場合の評価部108の評価結果に基づいて、認識部103が用いる認識パラメータを決定する。このような構成を有することで、物体認識装置40は、対象物体、周囲環境などに適した認識パラメータを用いて認識処理を行うことができ、認識成功率および把持成功率を向上させることが可能になる。 Further, the object recognition device 40 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used. By having such a configuration, the object recognition device 40 can perform recognition processing using recognition parameters suitable for the target object, the surrounding environment, and the like, and can improve the recognition success rate and the gripping success rate. become.
 続いて、実施の形態1~4にかかる物体認識装置10,20,30,40のハードウェア構成について説明する。物体認識装置10,20,30,40の各構成要素は、処理回路により実現される。これらの処理回路は、専用のハードウェアにより実現されてもよいし、CPU(Central Processing Unit)を用いた制御回路であってもよい。 Subsequently, the hardware configurations of the object recognition devices 10, 20, 30, and 40 according to the first to fourth embodiments will be described. Each component of the object recognition device 10, 20, 30, and 40 is realized by a processing circuit. These processing circuits may be realized by dedicated hardware, or may be control circuits using a CPU (Central Processing Unit).
 上記の処理回路が、専用のハードウェアにより実現される場合、これらは、図16に示す処理回路90により実現される。図16は、実施の形態1~4にかかる物体認識装置10,20,30,40の機能を実現するための専用のハードウェアを示す図である。処理回路90は、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)、またはこれらを組み合わせたものである。 When the above processing circuits are realized by dedicated hardware, these are realized by the processing circuit 90 shown in FIG. FIG. 16 is a diagram showing dedicated hardware for realizing the functions of the object recognition devices 10, 20, 30, and 40 according to the first to fourth embodiments. The processing circuit 90 is a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a combination thereof.
 上記の処理回路が、CPUを用いた制御回路で実現される場合、この制御回路は例えば図17に示す構成の制御回路91である。図17は、実施の形態1~4にかかる物体認識装置10,20,30,40の機能を実現するための制御回路91の構成を示す図である。図17に示すように、制御回路91は、プロセッサ92と、メモリ93とを備える。プロセッサ92は、CPUであり、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSP(Digital Signal Processor)などとも呼ばれる。メモリ93は、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ、EPROM(Erasable Programmable ROM)、EEPROM(登録商標)(Electrically EPROM)などの不揮発性または揮発性の半導体メモリ、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVD(Digital Versatile Disk)などである。 When the above processing circuit is realized by a control circuit using a CPU, this control circuit is, for example, a control circuit 91 having the configuration shown in FIG. FIG. 17 is a diagram showing a configuration of a control circuit 91 for realizing the functions of the object recognition devices 10, 20, 30, and 40 according to the first to fourth embodiments. As shown in FIG. 17, the control circuit 91 includes a processor 92 and a memory 93. The processor 92 is a CPU, and is also called a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), or the like. The memory 93 is, for example, a non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable ROM), and EPROM (registered trademark) (Electrically EPROM). Magnetic discs, flexible discs, optical discs, compact discs, mini discs, DVDs (Digital Versatile Disks), etc.
 上記の処理回路が制御回路91により実現される場合、プロセッサ92がメモリ93に記憶された、各構成要素の処理に対応するプログラムを読み出して実行することにより実現される。また、メモリ93は、プロセッサ92が実行する各処理における一時メモリとしても使用される。なお、プロセッサ92が実行するコンピュータプログラムは、通信ネットワークを介して提供されてもよいし、記憶媒体に記憶された状態で提供されてもよい。 When the above processing circuit is realized by the control circuit 91, it is realized by the processor 92 reading and executing the program corresponding to the processing of each component stored in the memory 93. The memory 93 is also used as a temporary memory in each process executed by the processor 92. The computer program executed by the processor 92 may be provided via a communication network, or may be provided in a state of being stored in a storage medium.
 以上の実施の形態に示した構成は、一例を示すものであり、別の公知の技術と組み合わせることも可能であるし、実施の形態同士を組み合わせることも可能であるし、要旨を逸脱しない範囲で、構成の一部を省略、変更することも可能である。 The configuration shown in the above embodiments is an example, and can be combined with another known technique, can be combined with each other, and does not deviate from the gist. It is also possible to omit or change a part of the configuration.
 10,20,30,40 物体認識装置、11 状態観測部、12 機械学習部、90 処理回路、91 制御回路、92 プロセッサ、93 メモリ、101 画像取得部、102 画像変換部、103 認識部、104 出力部、105 第1の学習部、106 記憶部、107 画像変換パラメータ決定部、108 評価部、109 入力受付部、110 ロボット、111 シミュレーション部、112 第1生成部、113 第2生成部、114 画像変換データセット生成部、115 画像変換データセット選択部、116 認識データセット生成部、117 第2の学習部、118 認識パラメータ決定部、121 報酬計算部、122 関数更新部。 10, 20, 30, 40 Object recognition device, 11 State observation unit, 12 Machine learning unit, 90 Processing circuit, 91 Control circuit, 92 Processor, 93 Memory, 101 Image acquisition unit, 102 Image conversion unit, 103 Recognition unit, 104 Output unit, 105 first learning unit, 106 storage unit, 107 image conversion parameter determination unit, 108 evaluation unit, 109 input reception unit, 110 robot, 111 simulation unit, 112 first generation unit, 113 second generation unit, 114 Image conversion data set generation unit, 115 image conversion data set selection unit, 116 recognition data set generation unit, 117 second learning unit, 118 recognition parameter determination unit, 121 reward calculation unit, 122 function update unit.

Claims (18)

  1.  対象物体の画像を取得する画像取得部と、
     画像変換パラメータを用いて、前記画像取得部が取得した前記画像であるセンサ画像を画像変換して変換後画像を出力する画像変換部と、
     前記変換後画像に基づいて、前記対象物体の状態を認識する認識部と、
     前記認識部の認識結果に基づいて、前記変換後画像を生成するために用いられた前記画像変換パラメータを評価する評価部と、
     前記認識結果および前記評価部の評価結果を出力する出力部と、
     を備えることを特徴とする物体認識装置。
    An image acquisition unit that acquires an image of the target object,
    An image conversion unit that converts a sensor image, which is the image acquired by the image acquisition unit, into an image using image conversion parameters and outputs the converted image.
    A recognition unit that recognizes the state of the target object based on the converted image,
    An evaluation unit that evaluates the image conversion parameter used to generate the converted image based on the recognition result of the recognition unit, and an evaluation unit.
    An output unit that outputs the recognition result and the evaluation result of the evaluation unit,
    An object recognition device characterized by comprising.
  2.  前記画像変換パラメータは、前記センサ画像を、予め定められた特徴を有する画像に画像変換するためのパラメータであることを特徴とする請求項1に記載の物体認識装置。 The object recognition device according to claim 1, wherein the image conversion parameter is a parameter for converting the sensor image into an image having predetermined features.
  3.  前記特徴ごとに前記画像変換パラメータを学習する第1の学習部、
     をさらに備え、
     前記画像変換部は、前記第1の学習部の学習結果である前記画像変換パラメータを用いて、前記センサ画像を画像変換することを特徴とする請求項2に記載の物体認識装置。
    A first learning unit that learns the image conversion parameters for each of the features,
    With more
    The object recognition device according to claim 2, wherein the image conversion unit converts the sensor image into an image by using the image conversion parameter which is the learning result of the first learning unit.
  4.  前記画像変換部は、複数の段階の画像変換を行って前記センサ画像を前記変換後画像に変換し、
     前記第1の学習部は、画像変換の段階ごとに用いられる複数の種類の画像変換パラメータのそれぞれを学習することを特徴とする請求項3に記載の物体認識装置。
    The image conversion unit performs image conversion in a plurality of stages to convert the sensor image into the converted image, and then converts the sensor image into the converted image.
    The object recognition device according to claim 3, wherein the first learning unit learns each of a plurality of types of image conversion parameters used for each stage of image conversion.
  5.  前記画像変換部は、前記センサ画像を中間画像に変換し、前記中間画像を前記変換後画像に変換することで前記センサ画像を前記変換後画像に変換し、
     前記第1の学習部は、前記センサ画像を中間画像に変換するための第1の画像変換パラメータと、前記中間画像を前記変換後画像に変換するための第2の画像変換パラメータとを学習することを特徴とする請求項4に記載の物体認識装置。
    The image conversion unit converts the sensor image into an intermediate image, and converts the intermediate image into the converted image to convert the sensor image into the converted image.
    The first learning unit learns a first image conversion parameter for converting the sensor image into an intermediate image and a second image conversion parameter for converting the intermediate image into the converted image. The object recognition device according to claim 4, wherein the object recognition device is characterized by the above.
  6.  前記画像変換部は、前記センサ画像を複数の成分画像に変換した後、前記複数の成分画像を合成して前記変換後画像を取得し、
     前記第1の学習部は、前記センサ画像を前記複数の成分画像のそれぞれに変換するための複数の種類の画像変換パラメータを学習することを特徴とする請求項3に記載の物体認識装置。
    The image conversion unit converts the sensor image into a plurality of component images, then synthesizes the plurality of component images to acquire the converted image, and obtains the converted image.
    The object recognition device according to claim 3, wherein the first learning unit learns a plurality of types of image conversion parameters for converting the sensor image into each of the plurality of component images.
  7.  複数の前記画像変換パラメータのそれぞれを用いた場合の前記評価部の評価結果に基づいて、前記画像変換部が用いる画像変換パラメータを決定する変換パラメータ決定部、
     をさらに備えることを特徴とする請求項1から6のいずれか1項に記載の物体認識装置。
    A conversion parameter determination unit that determines the image conversion parameter used by the image conversion unit based on the evaluation result of the evaluation unit when each of the plurality of image conversion parameters is used.
    The object recognition device according to any one of claims 1 to 6, further comprising.
  8.  前記評価部が前記画像変換パラメータを評価するために用いるパラメータである評価パラメータの入力を受け付ける入力受付部、
     をさらに備え、
     前記評価部は、前記入力受付部が受け付けた評価パラメータを用いて前記画像変換パラメータを評価することを特徴とする請求項1から7のいずれか1項に記載の物体認識装置。
    An input receiving unit that accepts input of evaluation parameters, which is a parameter used by the evaluation unit to evaluate the image conversion parameter.
    With more
    The object recognition device according to any one of claims 1 to 7, wherein the evaluation unit evaluates the image conversion parameter using the evaluation parameter received by the input reception unit.
  9.  前記認識結果は、前記認識部の認識処理時間および前記認識部が認識した前記対象物体の個数の少なくともいずれかを含むことを特徴とする請求項1から8のいずれか1項に記載の物体認識装置。 The object recognition according to any one of claims 1 to 8, wherein the recognition result includes at least one of the recognition processing time of the recognition unit and the number of the target objects recognized by the recognition unit. Device.
  10.  前記認識部の認識結果に基づいて前記対象物体を把持するロボットをさらに備え、
     前記評価部は、前記ロボットの動作結果にさらに基づいて、前記画像変換パラメータを評価することを特徴とする請求項1から9のいずれか1項に記載の物体認識装置。
    A robot that grips the target object based on the recognition result of the recognition unit is further provided.
    The object recognition device according to any one of claims 1 to 9, wherein the evaluation unit evaluates the image conversion parameter based on the operation result of the robot.
  11.  前記動作結果は、前記ロボットが対象物体の把持に成功した確率と、把持動作時間と、把持失敗原因とのうち少なくとも1つを含むことを特徴とする請求項10に記載の物体認識装置。 The object recognition device according to claim 10, wherein the operation result includes at least one of a probability that the robot has successfully grasped the target object, a gripping operation time, and a cause of the grip failure.
  12.  シミュレーションを用いて、予め定められた前記特徴を有する画像である目標画像を作成するシミュレーション部、
     をさらに備え、
     前記第1の学習部は、前記シミュレーション部が作成した前記目標画像を用いて前記画像変換パラメータを学習することを特徴とする請求項3に記載の物体認識装置。
    A simulation unit that creates a target image, which is an image having the predetermined characteristics, using simulation.
    With more
    The object recognition device according to claim 3, wherein the first learning unit learns the image conversion parameter using the target image created by the simulation unit.
  13.  前記シミュレーション部は、シミュレーション条件に基づいて前記対象物体の配置状態を示す配置情報を生成する第1生成部と、前記配置情報に基づいて前記対象物体を配置して前記目標画像を生成する第2生成部と、を有し、
     前記シミュレーション部が生成した前記目標画像と、前記センサ画像とを含む画像変換データセットを生成する画像変換データセット生成部、
     をさらに備えることを特徴とする請求項12に記載の物体認識装置。
    The simulation unit has a first generation unit that generates arrangement information indicating an arrangement state of the target object based on simulation conditions, and a second generation unit that arranges the target object based on the arrangement information and generates the target image. Has a generator and
    An image conversion data set generation unit that generates an image conversion data set including the target image generated by the simulation unit and the sensor image.
    The object recognition device according to claim 12, further comprising.
  14.  前記センサ画像に基づいて、前記画像変換データセット生成部が作成した前記画像変換データセットの中から、前記第1の学習部が用いる画像変換データセットを選択する画像変換データセット選択部、
     をさらに備えることを特徴とする請求項13に記載の物体認識装置。
    An image conversion data set selection unit that selects an image conversion data set to be used by the first learning unit from the image conversion data sets created by the image conversion data set generation unit based on the sensor image.
    The object recognition device according to claim 13, further comprising.
  15.  前記認識部が用いる認識手法に基づいて、前記認識部が認識処理を行う時に用いるアノテーションデータを生成し、前記目標画像と前記アノテーションデータとを含む認識データセットを生成する認識データセット生成部、
     をさらに備えることを特徴とする請求項12から14のいずれか1項に記載の物体認識装置。
    A recognition data set generation unit that generates annotation data used when the recognition unit performs recognition processing based on the recognition method used by the recognition unit, and generates a recognition data set including the target image and the annotation data.
    The object recognition device according to any one of claims 12 to 14, further comprising.
  16.  前記認識部が認識処理を行うときに用いるアノテーションデータと前記目標画像とを含む認識データセットに基づいて、前記認識部が用いるパラメータである認識パラメータを学習する第2の学習部、
     をさらに備えることを特徴とする請求項15に記載の物体認識装置。
    A second learning unit that learns recognition parameters, which are parameters used by the recognition unit, based on a recognition data set including annotation data used when the recognition unit performs recognition processing and the target image.
    The object recognition device according to claim 15, further comprising.
  17.  複数の前記認識パラメータのそれぞれを用いた場合の前記評価部の評価結果に基づいて、前記認識部が用いる認識パラメータを決定する認識パラメータ決定部、
     をさらに備えることを特徴とする請求項16に記載の物体認識装置。
    A recognition parameter determination unit that determines the recognition parameters used by the recognition unit based on the evaluation results of the evaluation unit when each of the plurality of recognition parameters is used.
    The object recognition device according to claim 16, further comprising.
  18.  物体認識装置が、対象物体の画像を取得するステップと、
     前記物体認識装置が、画像変換パラメータを用いて、取得した前記画像を画像変換して変換後画像を出力するステップと、
     前記物体認識装置が、前記変換後画像に基づいて、前記対象物体の状態を認識するステップと、
     前記物体認識装置が、認識結果に基づいて、前記変換後画像を生成するために用いられた前記画像変換パラメータを評価するステップと、
     前記物体認識装置が、前記認識結果および評価結果を出力するステップと、
     を含むことを特徴とする物体認識方法。
    The step that the object recognition device acquires the image of the target object,
    A step in which the object recognition device converts the acquired image into an image using image conversion parameters and outputs the converted image.
    A step in which the object recognition device recognizes the state of the target object based on the converted image.
    A step of evaluating the image conversion parameter used by the object recognition device to generate the converted image based on the recognition result.
    A step in which the object recognition device outputs the recognition result and the evaluation result,
    An object recognition method characterized by including.
PCT/JP2020/002577 2020-01-24 2020-01-24 Object recognition device and object recognition method WO2021149251A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021572241A JP7361800B2 (en) 2020-01-24 2020-01-24 Object recognition device and object recognition method
CN202080092120.2A CN114981837A (en) 2020-01-24 2020-01-24 Object recognition device and object recognition method
PCT/JP2020/002577 WO2021149251A1 (en) 2020-01-24 2020-01-24 Object recognition device and object recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/002577 WO2021149251A1 (en) 2020-01-24 2020-01-24 Object recognition device and object recognition method

Publications (1)

Publication Number Publication Date
WO2021149251A1 true WO2021149251A1 (en) 2021-07-29

Family

ID=76993210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/002577 WO2021149251A1 (en) 2020-01-24 2020-01-24 Object recognition device and object recognition method

Country Status (3)

Country Link
JP (1) JP7361800B2 (en)
CN (1) CN114981837A (en)
WO (1) WO2021149251A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060466A (en) * 2016-10-07 2018-04-12 パナソニックIpマネジメント株式会社 Image processing apparatus, detection apparatus, learning apparatus, image processing method, and image processing program
WO2019064599A1 (en) * 2017-09-29 2019-04-04 日本電気株式会社 Abnormality detection device, abnormality detection method, and computer-readable recording medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060466A (en) * 2016-10-07 2018-04-12 パナソニックIpマネジメント株式会社 Image processing apparatus, detection apparatus, learning apparatus, image processing method, and image processing program
WO2019064599A1 (en) * 2017-09-29 2019-04-04 日本電気株式会社 Abnormality detection device, abnormality detection method, and computer-readable recording medium

Also Published As

Publication number Publication date
CN114981837A (en) 2022-08-30
JP7361800B2 (en) 2023-10-16
JPWO2021149251A1 (en) 2021-07-29

Similar Documents

Publication Publication Date Title
CN109934115B (en) Face recognition model construction method, face recognition method and electronic equipment
Johns et al. Deep learning a grasp function for grasping under gripper pose uncertainty
US11325252B2 (en) Action prediction networks for robotic grasping
Sadeghi et al. Sim2real viewpoint invariant visual servoing by recurrent control
CN105082132B (en) Fast machine people's learning by imitation of power moment of torsion task
CN107273936B (en) GAN image processing method and system
KR20210104777A (en) Automatic Semantic Segmentation of Non-Euclidean 3D Datasets Using Deep Learning
CN108780508A (en) System and method for normalized image
JP2021522591A (en) How to distinguish a 3D real object from a 2D spoof of a real object
CN111819568A (en) Method and device for generating face rotation image
CN113370217B (en) Object gesture recognition and grabbing intelligent robot method based on deep learning
CN110463376B (en) Machine plugging method and machine plugging equipment
CN110756462B (en) Power adapter test method, device, system, control device and storage medium
CN111179419A (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN105426901A (en) Method For Classifying A Known Object In A Field Of View Of A Camera
CN115816460B (en) Mechanical arm grabbing method based on deep learning target detection and image segmentation
JP2020119127A (en) Learning data generation method, program, learning data generation device, and inference processing method
CN114387513A (en) Robot grabbing method and device, electronic equipment and storage medium
CN110705564B (en) Image recognition method and device
CN115761905A (en) Diver action identification method based on skeleton joint points
WO2021149251A1 (en) Object recognition device and object recognition method
CN110598595A (en) Multi-attribute face generation algorithm based on face key points and postures
Lin et al. Robot vision to recognize both object and rotation for robot pick-and-place operation
CN116984269A (en) Gangue grabbing method and system based on image recognition
Makihara et al. Grasp pose detection for deformable daily items by pix2stiffness estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915796

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021572241

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915796

Country of ref document: EP

Kind code of ref document: A1