WO2021149251A1

WO2021149251A1 - Object recognition device and object recognition method

Info

Publication number: WO2021149251A1
Application number: PCT/JP2020/002577
Authority: WO
Inventors: 彩佳里大島; 亮輔川西
Original assignee: 三菱電機株式会社
Priority date: 2020-01-24
Filing date: 2020-01-24
Publication date: 2021-07-29
Also published as: CN114981837A; JP7361800B2; JPWO2021149251A1

Abstract

An object recognition device (10) is characterized by comprising: an image acquisition unit (101) that acquires images of a target object; an image conversion unit (102) that uses image conversion parameters to subject sensor images, which are the images acquired by the image acquisition unit (101), to image conversion and outputs the converted images; a recognition unit (103) that recognizes the state of the target object on the basis of the converted images; an evaluation unit (108) that, on the basis of the recognition results of the recognition unit (103), evaluates the image conversion parameters used for generating the converted images; and an output unit (104) that outputs the recognition results and the evaluation results of the evaluation unit (108).

Description

Object recognition device and object recognition method

The present disclosure relates to an object recognition device and an object recognition method that recognize an object object based on a photographed image of the object object.

In various industries, recognition technology for grasping the state of an object such as the position and orientation of the object has been developed. The recognition technology is used, for example, to control an industrial robot according to the state of the object when the industrial robot grips and conveys the object. Patent Document 1 discloses a technique for recognizing the state of an object based on an image of the target object in a gripping system that grips the target object.

JP-A-2018-205929

However, according to the technique disclosed in Patent Document 1, there is a problem that the recognition performance may deteriorate when the environment when the recognition process is executed, for example, the surrounding environment of the target object, the measurement conditions, and the like change. there were.

The present disclosure has been made in view of the above, and an object of the present disclosure is to obtain an object recognition device capable of improving recognition performance even when the environment when executing recognition processing changes. do.

In order to solve the above-mentioned problems and achieve the object, the object recognition device of the present disclosure is an image acquired by the image acquisition unit using an image acquisition unit that acquires an image of the target object and an image conversion parameter. An image conversion unit that converts the sensor image into an image and outputs the converted image, a recognition unit that recognizes the state of the target object based on the converted image, and a conversion unit that generates a converted image based on the recognition result of the recognition unit. It is characterized by including an evaluation unit for evaluating the image conversion parameter used for the purpose, and an output unit for outputting the recognition result and the evaluation result of the evaluation unit.

According to the present disclosure, it is possible to improve the recognition performance even when the environment when executing the recognition process changes.

The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 1. The figure which shows an example of the display screen displayed by the output part shown in FIG. The figure which shows an example of the detailed structure of the 1st learning part shown in FIG. A flowchart for explaining an operation example of the first learning unit shown in FIG. The figure for demonstrating the operation example when the 1st learning part shown in FIG. 1 uses CycleGAN. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 1 before the start of operation. A flowchart for explaining the operation of the object recognition device shown in FIG. 1 during operation. The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 2. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 8 before the start of operation. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 8 during operation. The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 3. A flowchart for explaining the operation of the simulation unit shown in FIG. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 11 before the start of operation. The figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 4. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 13 before the start of operation. The figure which shows the dedicated hardware for realizing the function of the object recognition apparatus which concerns on Embodiments 1 to 4. The figure which shows the structure of the control circuit for realizing the function of the object recognition apparatus which concerns on Embodiments 1 to 4.

The object recognition device and the object recognition method according to the embodiment of the present disclosure will be described in detail below with reference to the drawings. The technical scope of the present disclosure is not limited by the embodiments shown below.

Embodiment 1.
FIG. 1 is a diagram showing a functional configuration of the object recognition device 10 according to the first embodiment. The object recognition device 10 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108 and an input receiving unit 109. The object recognition device 10 has a function of recognizing a state such as the position and orientation of the target object based on a photographed image of the target object.

The image acquisition unit 101 acquires an image of the target object. The image acquisition unit 101 may be an imaging device having an image sensor, or may be an interface for acquiring an image captured by a photographing device connected to the object recognition device 10. Hereinafter, the image acquired by the image acquisition unit 101 is referred to as a sensor image. The image acquisition unit 101 outputs the acquired sensor image to each of the image conversion unit 102 and the first learning unit 105. The sensor image may be a monochrome image or an RGB image. Further, the sensor image may be a distance image in which the distance is expressed by the brightness and darkness. The distance image may be generated based on the set data of points having three-dimensional position information. At this time, it is preferable that the image acquisition unit 101 acquires the minimum information for reconstructing a set of points having three-dimensional position information from the distance image at the same time as the distance image. The minimum information for reconstructing a set of points is focal length, scale, and so on.

Note that the image acquisition unit 101 may be able to acquire a plurality of types of images. For example, the image acquisition unit 101 may be able to acquire both a monochrome image and a distance image of the target object. At this time, the image acquisition unit 101 may be a photographing device capable of capturing both a monochrome image and a distance image by one unit, a photographing device for capturing a monochrome image, and a photographing device for capturing a distance image. It may be composed of and. However, when the monochrome image shooting and the distance image shooting are performed by different shooting devices, it is preferable to grasp the positional relationship between the two shooting devices in advance.

The image conversion unit 102 converts the sensor image acquired by the image acquisition unit 101 into an image using the image conversion parameter, and outputs the converted image to the recognition unit 103. The image conversion unit 102 is stored in the storage unit 106 so that the sensor image has a predetermined feature for each target image group by using the image conversion parameter which is the learning result of the first learning unit 105. Perform image conversion. In the present embodiment, an image having predetermined features is referred to as a target image, and a set of target images is referred to as a target image group.

Multiple target images included in the same target image group have common features. Common features at this time are, for example, the shape of the target object, the surface characteristics of the target object, the measurement distance, the depth, and the like. In addition, common features are the position and orientation of objects other than the target object to be recognized, the type and intensity of ambient light, the type of measurement sensor, the parameters of the measurement sensor, the arrangement state of the target object, the image style, and the target object. It may be the quantity of. Here, the parameters of the measurement sensor are parameters such as focus and aperture. The arrangement state of the target object is an alignment state, a bulk state, or the like. A plurality of target images included in the same target image group may have one common feature or may have a plurality of common features. Further, "having a common feature" includes not only the case where the above-mentioned features are the same but also the case where they are similar. For example, when a reference shape such as a rectangular parallelepiped, a cylinder, or a hexagonal column is defined, the shape of the target object has a common feature even if the shape of the target object in the target image is close enough to approximate the same reference shape. It can be an image having. Further, when the standard colors such as black, white, and gray are set for the surface characteristics of the target object, even if the apparent hues of the target objects in the target image are close enough to be classified into the same standard colors. It can be an image having common features.

At least one target object is shown in the target image. At this time, the target object shown in the target image does not necessarily have to be shown in its entirety. For example, if a part of the target object is out of the measurement range, or if the target object is partially hidden by another object, the part of the target object displayed in the target image may be missing. no problem. Further, when a plurality of target objects are shown in the target image, the arrangement state of the plurality of target objects may be an aligned state or a bulk state. The target image is preferably an image that makes it easy to recognize the target object. An image in which the target object can be easily recognized is, for example, an image in which the shape of the target object is not complicated, has a simple shape such as a rectangular parallelepiped or a cube, and has less noise.

The number and types of image conversion parameters used by the image conversion unit 102 differ depending on the image conversion method. It is desirable that the image conversion unit 102 use an image conversion method such that the state such as the position and orientation of the target object in the converted image is not significantly different from the state of the target object in the sensor image. The image conversion unit 102 can use, for example, an image conversion method using a neural network. When an image conversion method using a neural network is used, the image conversion parameters include a weighting coefficient between each unit constituting the network.

The recognition unit 103 recognizes a state such as the position and orientation of the target object based on the converted image output by the image conversion unit 102. The recognition method used by the recognition unit 103 is not particularly limited. For example, the recognition unit 103 may use a machine learning-based recognition method that performs pre-learning so that the state of the target object can be output from the image, or the CAD (Computer-Aided Design) data of the target object. Model matching that estimates the state of the target object by collating it with the three-dimensional measurement data may be used. The recognition unit 103 may perform the recognition process using one type of recognition method, or may perform the recognition process using a combination of a plurality of types of recognition methods. The recognition unit 103 outputs the recognition result to each of the output unit 104 and the evaluation unit 108. The recognition result includes, for example, at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103.

The output unit 104 has a function of outputting the recognition result and the evaluation result of the evaluation unit 108, which will be described in detail later. The method of outputting the recognition result and the evaluation result by the output unit 104 is not particularly limited. For example, the output unit 104 includes a display device, and may display the recognition result and the evaluation result on the screen of the display device. Further, the output unit 104 is provided with an interface with an external device, and the recognition result and the evaluation result may be transmitted to the external device.

FIG. 2 is a diagram showing an example of a display screen displayed by the output unit 104 shown in FIG. “Input” in FIG. 2 indicates an area for displaying a sensor image, and “parameter” indicates an area for displaying an image conversion parameter and an evaluation value which is an evaluation result. Further, "conversion" in FIG. 2 indicates an area for displaying the converted image, and "recognition" indicates an area for displaying the recognition result. For example, when the user performs an operation of selecting one of a plurality of image conversion parameters displayed on the "parameter", the name of the selected image conversion parameter is displayed on the "Name" of the display screen. In "Value", the evaluation value when the selected image conversion parameter is used is displayed, and in "conversion", the converted image when the selected image conversion parameter is used is displayed, and "recognition" is displayed. Displays the recognition result when the selected image conversion parameter is used.

The first learning unit 105 learns image conversion parameters for image conversion of the sensor image so as to have the characteristics of the target image group. The first learning unit 105 learns the image conversion parameters used by the image conversion unit 102 for each target image group. FIG. 3 is a diagram showing an example of a detailed configuration of the first learning unit 105 shown in FIG. The first learning unit 105 has a state observation unit 11 and a machine learning unit 12. When the variation between the plurality of target images included in the target image group is small, the first learning unit 105 can obtain an image conversion parameter capable of performing image conversion that reproduces the characteristics of the target image group. The possibility is high. When the deviation of the sensor image from the target image group is large, the learning of the image conversion parameter of the first learning unit 105 is difficult to converge.

The state observation unit 11 observes the image conversion parameters, the target image group, and the similarity between the converted image and the features of the target image group as state variables. The machine learning unit 12 learns the image conversion parameters for each target image group according to the training data set created based on the image conversion parameters, the target image group, and the state variables of the similarity.

The learning algorithm used by the machine learning unit 12 may be any. As an example, a case where the machine learning unit 12 uses reinforcement learning will be described. Reinforcement learning is a learning algorithm in which an agent, who is the subject of action in a certain environment, observes the current state and decides the action to be taken. Agents are rewarded by the environment by choosing an action and learn how to get the most reward through a series of actions. Q-learning and TD-learning are known as typical methods of reinforcement learning. For example, if the Q-learning, general update equations of action value function _{Q (s} t, _{a t)} is expressed by the following equation (1).

In Equation (1), _{s t} represents the environment at time t, _{a t} represents the behavior in time t. By the action _{a t,} the environment is changed to _{s t + 1.} r t _{+ 1} denotes the reward given in accordance with the changing environment as a result of action a _t, gamma represents the discount rate, alpha represents a learning coefficient.

The update formula represented by the formula (1) increases the action value Q if the action value Q of the best action a at time t + 1 is larger than the action value Q of the action a executed at time t, and vice versa. In the case of, the action value Q is reduced. In other words, the action value Q of action a at time t, as close to the best action value at time t + 1, action value function Q (s _{t, a} _t) Update. By repeating such updates, the best behavioral value in a certain environment is sequentially propagated to the behavioral value in the previous environment.

The machine learning unit 12 has a reward calculation unit 121 and a function update unit 122.

The reward calculation unit 121 calculates the reward based on the state variable. The reward calculation unit 121 calculates the reward r based on the similarity included in the state variable. The degree of similarity increases as the converted image reproduces the characteristics of the target image group. For example, if the similarity is higher than a predetermined threshold, the reward calculation unit 121 increases the reward r. The reward calculation unit 121 can increase the reward r by giving a reward of "1", for example. On the other hand, when the similarity is lower than a predetermined threshold value, the reward calculation unit 121 reduces the reward r. The reward calculation unit 121 can, for example, give a reward of "-1" to reduce the reward r. The similarity is calculated according to a known method according to the type of features of the target image group.

The function update unit 122 updates the function for determining the image conversion parameter according to the reward r calculated by the reward calculation unit 121. For example, in the case of Q-learning, action value function Q (s _{t, a} _t) represented by Equation (1), and is used as a function for determining an image transform parameter.

FIG. 4 is a flowchart for explaining an operation example of the first learning unit 105 shown in FIG. The operation shown in FIG. 4 is performed before the operation of the object recognition device 10 is started. The state observation unit 11 of the first learning unit 105 acquires the sensor image group using the image acquisition unit 101 (step S101). The state observation unit 11 selects one target image group from a plurality of predetermined target image groups (step S102).

The first learning unit 105 sets the image conversion parameters for the selected target image group (step S103). The first learning unit 105 causes the image conversion unit 102 to perform image conversion of the sensor image using the set image conversion parameters (step S104).

The state observation unit 11 of the first learning unit 105 acquires the image conversion parameter, which is a state variable, the target image group, and the similarity between the converted image and the features of the target image group (step S105). The state observation unit 11 outputs the acquired state variables to the machine learning unit 12. The reward calculation unit 121 of the machine learning unit 12 determines whether or not the similarity is higher than the threshold value (step S106).

When the similarity is higher than the threshold value (step S106: Yes), the reward calculation unit 121 increases the reward r (step S107). When the similarity is lower than the threshold value (step S106: No), the reward calculation unit 121 reduces the reward r (step S108). The reward calculation unit 121 outputs the calculated reward r to the function update unit 122.

Function update unit 122, according reward r that compensation calculation unit 121 has calculated, action value function _{Q (s} _{t, a} t) updating (step S109). The first learning unit 105 determines whether or not a predetermined learning end condition is satisfied (step S110). It is desirable that the learning end condition is a condition for determining that the learning accuracy of the image conversion parameter is equal to or higher than the standard. For example, the learning end conditions are "the number of times the processing of steps S103 to S109 is repeated exceeds a predetermined number of times" and "the elapsed time from the start of learning the image conversion parameters for the same target image group". Exceeding a predetermined time. "

When the learning end condition is not satisfied (step S110: No), the first learning unit 105 repeats the process from step S103. When the learning end condition is satisfied (step S110: Yes), the first learning unit 105 outputs the learning result of the image conversion parameter for the target image group (step S111).

The first learning unit 105 determines whether or not the learning for all the target image groups has been completed (step S112). When the learning for all the target image groups is not completed, that is, when there is a target image group for which the learning has not been completed (step S112: No), the first learning unit 105 repeats the process from step S102. When the learning for all the target image groups is completed (step S112: Yes), the first learning unit 105 ends the image conversion parameter learning process.

The example in which the first learning unit 105 performs machine learning using reinforcement learning has been described above, but the first learning unit 105 describes other known methods such as neural networks, genetic programming, and functional logic programming. Machine learning may be performed according to a support vector machine or the like.

FIG. 5 is a diagram for explaining an operation example when the first learning unit 105 shown in FIG. 1 uses CycleGAN (Generative Adversarial Networks). In the second example, the first learning unit 105 learns the image conversion parameters using CycleGAN. When using a CycleGAN, first learning unit 105, as shown in FIG. 5, a first generator G, a second generator F, the first discriminator D _X, and a second discriminator D _Y Use to learn image conversion parameters.

The first learning unit 105 learns the image conversion parameters between the image groups X and Y using the training data of the two types of image groups X and Y. The image included in the training data of the image group X is referred to as an image x, and the image included in the training data of the image group Y is referred to as an image y.

The first generator G generates an image having the characteristics of the image group Y from the image x. Let G (x) be the output when the image x is input to the first generator G. The second generator F generates an image having the characteristics of the image group X from the image y. Let F (y) be the output when the image y is input to the second generator F. First discriminator D _X is distinguish between x and F (y). The second discriminator D _Y, distinguish between y and G (x).

First learning unit 105, two on the basis of the loss, image conversion accuracy of the first generator G and the second generator F is increased, the identification accuracy of the first discriminator D _X and a second discriminator D _Y Learn so that Specifically, the first learning section 105, the following equation (2) total loss indicated _{L (G, F, D X} , D Y) is, to satisfy an objective function represented by the following formula (3) To learn.

First loss L _GAN included in Equation _{(2) (G, D Y} , X, Y) , when the first generator G generates the image G (x) having the characteristics of the image group Y from the image x It is a loss that occurs. Second loss L _GAN included in Equation _{(2) (F, D X} , Y, X) , when the second generator F generated the image F (x) having the characteristics of the image group X from the image y It is a loss that occurs. _{In the third loss L cyc} (G, F) included in the equation (2), the image x is input to the first generator G to generate the image G (x), and the generated image G (x) is used as the second image G (x). The loss that occurs when the image F (G (x)) is generated by inputting to the generator F and the image F (y) generated by inputting the image y into the second generator F to generate the image F (y). This is the sum of the loss that occurs when the image G (F (y)) is generated by inputting y) into the first generator G.

That is, the first learning section 105, the following on the basis of the four assumptions, the total loss total loss _{L (G, F, D X} , D Y) a first generator so that smaller G and a second generator learns of F, performing total loss total loss _{L (G, F, D X} , D Y) a learning of the first discriminator so increases D _X and a second discriminator D _Y.
1. 1. The image G (x) converted by inputting the image x into the first generator G should be similar to the image group Y.
2. The image F (y) converted by inputting the image y into the second generator F should be similar to the image group X.
3. 3. The image F (G (x)) converted by inputting the image G (x) into the second generator F should be similar to the image group X.
4. The image G (F (y)) converted by inputting the image F (y) into the first generator G should be similar to the image group Y.

The first learning unit 105 is used in the first generator G that performs the above learning with the sensor image group as the image group X and the target image group as the image group Y and generates the target image group from the sensor image group. The image conversion parameters are learned, and the learning result is output to the storage unit 106. The first learning unit 105 performs the above learning for each of the plurality of types of target image groups, and learns the image conversion parameters for each target image group.

Return to the explanation in Fig. 1. The storage unit 106 stores the image conversion parameters for each target image group, which is the learning result of the first learning unit 105.

The image conversion parameter determination unit 107 determines the image conversion parameter used by the image conversion unit 102 during operation from among a plurality of image conversion parameters based on the evaluation result performed by the evaluation unit 108 described later before the start of operation. .. The image conversion parameter determination unit 107 notifies the image conversion unit 102 of the determined image conversion parameter.

The image conversion parameter determination unit 107 may, for example, _{use the image conversion parameter having the maximum evaluation value E c} as the image conversion parameter used by the image conversion unit 102, or the evaluation unit 108 causes the output unit 104 to output the evaluation result. The image conversion parameter selected after confirming the evaluation result output by the user may be used as the image conversion parameter used by the image conversion unit 102. For example, when it is considered that the amount of light of the sensor image used at the time of learning and the light of the sensor image actually obtained changes depending on the time zone or the like, the output unit 104 adds each image conversion parameter to the evaluation result. It is conceivable to output the converted image when used. In this case, the user can check the converted image and select an image conversion parameter capable of performing conversion that suppresses light reflection. At this time, the output unit 104 may output the evaluation value of the image conversion parameter whose evaluation value is equal to or more than the threshold value and the converted image, and may not output the image conversion parameter whose evaluation value is less than the threshold value.

Before the start of operation, the evaluation unit 108 evaluates each of the plurality of image conversion parameters based on the recognition result of the recognition unit 103 when each of the plurality of image conversion parameters is used. Specifically, the evaluation unit 108 calculates the evaluation value E _c, and outputs the a is the evaluation result calculated evaluation value E _c to each of the image conversion parameter determination unit 107 and an output unit 104. _{The evaluation value E c} calculated by the evaluation unit 108 is represented by, for example, the following mathematical formula (4).

Here, p _r represents the recognition accuracy, t _r represents the recognition processing time, w _pr, w _tr denotes the weight coefficient. That is, the evaluation value E _c is the sum of the value obtained by multiplying the weight coefficient w _pr in recognition accuracy p _r, a value obtained by multiplying the weight coefficient w _tr to the inverse of the recognition processing time t _r.

Generally, there is a trade-off between recognition accuracy p _r a recognition processing time t _r. Therefore, the values of _{the weighting coefficients w pr} and w _tr may be determined depending on what the user attaches importance to. For example, if it is desired to emphasize the speed of the recognition process even if the recognition accuracy is slightly lowered, _{the value of the weighting coefficient w pr} may be reduced and the value of the weighting coefficient w _tr may be increased. On the contrary, when the recognition accuracy is emphasized even if it takes time, _{the value of the weighting coefficient w pr} may be increased and the value of the weighting coefficient w _tr may be decreased.

The recognition accuracy _pr is the degree to which the target object in the sensor image can be recognized, or the error of the state of the target object, specifically, the error of the position and orientation. For example, when the recognition accuracy _pr is the degree to which the target object in the sensor image can be recognized, the recognition accuracy _pr is expressed by the following mathematical formula (5).

Here, n _r indicates the number of recognizable target objects, and N _w indicates the number of target objects in the sensor image. In other words, recognition accuracy p _r represented by the equation (5), the number n _r of the recognized target object, which is divided by the number N _w of the object in the sensor image. If the error between the position and orientation of the target object in the sensor image and the recognized position and orientation is within the threshold value, it may be determined that the recognition is successful, or the user visually determines whether or not the recognition is successful. You may.

In the case of an error condition of the target object and recognition accuracy p _r, recognition accuracy p _r is expressed by the following equation (6).

Here, x _w indicates the actual position / orientation of the target object, and x _r indicates the recognized position / orientation. In other words, recognition accuracy p _r represented by the equation (6) is the inverse of the value obtained by adding 1 to the absolute value of the difference between the actual position and orientation x _w and recognized position and orientation x _r of the target object. The actual position / orientation and the recognized position / orientation of the target object may be the position / orientation in the image space or the position / orientation in the real space.

Further, the recognition accuracy _pr is not limited to the above example. The above examples may be combined.

Further, not limited to the example shown in the above equation (4), the evaluation value E _c may be calculated using the following equation (7).

Here, _Tr indicates a recognition processing time threshold. That is, when using Equation (7), when the recognition process within the recognition processing time threshold T _r is completed, the evaluation value E _c is a value obtained by multiplying the weight coefficient w _pr in recognition accuracy p _r, the recognition processing time If the recognition process is not completed within the _{threshold T r} _{, the evaluation value E c} is 0. Recognition processing time The recognition processing is not completed within the _{threshold value T r. By} _{setting the evaluation value E c} of the image conversion parameter to 0, the image conversion parameter that can complete the recognition processing within the time required by the user can be confirmed and confirmed. It becomes possible to select. The method for calculating the evaluation value E _c is not limited to the above.

The input receiving unit 109 receives the input of the evaluation parameter, which is a parameter used by the evaluation unit 108 to evaluate the image conversion parameter. The input receiving unit 109 may accept evaluation parameters input by the user using an input device or the like, may receive evaluation parameters from a functional unit in the object recognition device 10, or may receive evaluation parameters from an external device of the object recognition device 10. Evaluation parameters may be accepted from. The evaluation parameters received by the input receiving unit 109 include, for example, the weighting coefficients w _pr, w _tr included in the mathematical formula (4), and the influence of each of a plurality of elements affecting the magnitude of the evaluation value on the evaluation value. It is a weighting factor for changing.

FIG. 6 is a flowchart for explaining the process performed by the object recognition device 10 shown in FIG. 1 before the start of operation. The first learning unit 105 of the object recognition device 10 performs the image conversion parameter learning process (step S121). Since the image conversion parameter learning process shown in step S121 is the process described with reference to FIG. 4 or the process described with reference to FIG. 5, detailed description thereof will be omitted here.

Subsequently, the input receiving unit 109 acquires the evaluation parameters and outputs the acquired evaluation parameters to the evaluation unit 108 (step S122).

The image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S123). The image conversion unit 102 selects one image conversion parameter for which the evaluation value has not yet been calculated from the plurality of learned image conversion parameters stored in the storage unit 106 (step S124).

The image conversion unit 102 performs an image conversion process of converting the sensor image acquired by the image acquisition unit 101 into an image after conversion using the selected image conversion parameter (step S125). The image conversion unit 102 outputs the converted image to the recognition unit 103.

The recognition unit 103 performs recognition processing using the converted image and outputs the recognition result to the evaluation unit 108 (step S126). When outputting the recognition result, the recognition unit 103 may output the recognition result to the output unit 104.

_{The evaluation unit 108 calculates the evaluation value E c} based on the recognition result, and outputs the calculated evaluation value E _c to the image conversion parameter determination unit 107 (step S127).

_{The image conversion unit 102 determines whether or not the evaluation values E c} of all the image conversion parameters have been calculated (step S128). _{When the evaluation values E c} of all the image conversion parameters have not been calculated (step S128: No), that is, _{when there are image conversion parameters for which the evaluation values E c} have not been calculated, the image conversion unit 102 starts from step S124. Repeat the process. When calculating the evaluation value E _c of all the image transformation parameters (step S128: Yes), the image transformation parameter determination unit 107, from among a plurality of image transformation parameters, based on the evaluation value is an evaluation result of the evaluation unit 108 The image conversion parameter used by the image conversion unit 102 during operation is determined (step S129).

FIG. 7 is a flowchart for explaining the operation of the object recognition device 10 shown in FIG. 1 during operation. Before the operation, the operation shown in FIG. 6 is performed, the image conversion parameters have been learned for each target image group, and the image conversion parameters used by the image conversion unit 102 from the learned image conversion parameters. Is selected.

The image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S131). The image conversion unit 102 acquires the selected image conversion parameter (step S132). The image conversion unit 102 performs an image conversion process for converting the sensor image into a converted image using the acquired image conversion parameters, and outputs the converted image to the recognition unit 103 (step S133).

The recognition unit 103 uses the converted image to perform a recognition process for recognizing the state of the target object included in the converted image, and outputs the recognition result to the output unit 104 (step S134).

The output unit 104 determines whether or not the target object exists based on the recognition result (step S135). When the target object exists (step S135: Yes), the output unit 104 outputs the recognition result (step S136). After outputting the recognition result, the image acquisition unit 101 repeats the process from step S131. When the target object does not exist (step S135: No), the object recognition device 10 ends the process.

In the above, the image conversion unit 102 converts the sensor image into a converted image by a one-step image conversion process, but the present embodiment is not limited to such an example. For example, the image conversion unit 102 may perform image conversion in a plurality of stages to convert the sensor image into an image after conversion. For example, when two-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image and converts the first intermediate image into a converted image. When three-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image, converts the first intermediate image into a second intermediate image, and converts the second intermediate image. Convert to a later image.

When the image conversion unit 102 performs image conversion in a plurality of stages, the first learning unit 105 learns each of the plurality of types of image conversion parameters used in each stage of the image conversion. Specifically, the first learning unit 105 sets a first image conversion parameter for converting the sensor image into an intermediate image and a second image conversion parameter for converting the intermediate image into a converted image. learn. Further, when three or more steps of image conversion are performed, the first learning unit 105 learns a third image conversion parameter for converting an intermediate image into an intermediate image. For example, when two-step image conversion is performed, the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the converted image of the first intermediate image. Learn with a second image conversion parameter for conversion to. Further, when three-step image conversion is performed, the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the first intermediate image into the second intermediate image. A third image conversion parameter for converting to an intermediate image and a second image conversion parameter for converting a second intermediate image into a converted image are learned.

The intermediate image is an image that is different from both the sensor image and the converted image. For example, if the converted image is a distance image generated using CG (Computer Graphic) without noise or omission, the intermediate image is simulated for noise, measurement error, omission of the blind spot of the sensor, etc. It can be a reproduced reproduced image. In this case, the first learning unit 105 has a first image conversion parameter for converting the sensor image into an intermediate image which is a reproduced image, and a second learning unit 105 for converting the intermediate image into a converted image which is a distance image. Learn the image conversion parameters of. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.

Further, the converted image may be obtained by dividing the converted image into a plurality of types of component images, converting the sensor image into a plurality of component images, and then synthesizing the images. In this case, the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each component image. For example, from one sensor image, a texture image which is a component image having the characteristics of the texture component of the converted image and a color image which is a component image having the characteristics of the global color component of the converted image are generated. It is conceivable that the texture image and the color image are combined to obtain a converted image. In this case, the first learning unit 105 learns an image conversion parameter for converting the sensor image into a texture image and an image conversion parameter for converting the sensor image into a color image. In the above, an example of using two component images is shown, but a converted image can also be obtained by using three or more component images. By learning the image conversion parameters for each component image, the problem to be solved is facilitated, so that the convergence of learning can be improved and the recognition performance can be improved. By synthesizing a plurality of component images to obtain a converted image, a converted image having characteristics closer to the target image group is obtained than when a converted image is obtained from a sensor image using one type of image conversion parameter. Will be possible.

When recognizing an object, it is common to perform multiple different types of image processing. Depending on the content of the image processing to be executed, there are an image in which a desired result is easily obtained and an image in which the desired result is not obtained. For example, in the edge detection process, when the brightness value near the boundary of the object for which the edge is to be extracted changes in steps, it is easy to extract the edge, and when the brightness value near the boundary changes smoothly. It is difficult to extract edges. In this way, the image processing to be performed has features, properties, and the like that the image should have. Therefore, instead of converting the image used for recognition only once, image conversion that facilitates each image processing in the recognition process can be executed each time as a preprocessing for each image processing. In this case, the first learning unit 105 only needs to learn the image conversion parameters for the number of image processes for which preprocessing is desired, and targets an ideal processing result image group obtained when each image processing is executed. Can be a group.

As described above, according to the object recognition device 10 according to the present embodiment, the image conversion parameter can be evaluated based on the recognition processing result, and the evaluation result can be obtained. Therefore, it is possible to confirm the influence of the image conversion parameter on the recognition process. Therefore, it is possible to select the image conversion parameter according to the environment when the recognition process is executed, and it is possible to improve the recognition performance even when the environment when the recognition process is executed changes. It becomes.

Further, the image conversion parameter is a parameter for image conversion of the sensor image into an image having predetermined features. The object recognition device 10 has a first learning unit 105 that learns image conversion parameters for each predetermined feature, and the image conversion unit 102 uses an image conversion parameter that is a learning result of the first learning unit 105. The sensor image is converted into an image using the image. By having such a configuration, the output unit 104 can obtain the evaluation result of the image conversion parameter which is the learning result for each predetermined feature. Therefore, it is possible to grasp what kind of characteristics the image has to be converted into an image so that the recognition performance can be improved.

Further, in the present embodiment, the image conversion unit 102 performs image conversion in a plurality of stages to convert the sensor image into an image after conversion, and the first learning unit 105 is used for each of the plurality of image conversion stages. Learn each of the types of image conversion parameters. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.

Further, in the present embodiment, the image conversion unit 102 can convert the sensor image into a plurality of component images and then synthesize the plurality of component images to acquire the converted image. In this case, the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each of the plurality of component images. By having such a configuration, the object recognition device 10 obtains a converted image having features closer to the target image group than when obtaining a converted image from the sensor image using one type of image conversion parameter. Becomes possible.

Further, the object recognition device 10 has an image conversion parameter determination unit 107 that determines the image conversion parameter used by the image conversion unit 102 based on the evaluation result of the evaluation unit 108 when each of the plurality of image conversion parameters is used. .. By having such a configuration, it is possible to automatically select an image conversion parameter capable of improving recognition performance without the user having to manually select an image conversion parameter by looking at the evaluation result. become.

The object recognition device 10 has an input receiving unit 109 that receives input of evaluation parameters, which are parameters used by the evaluation unit 108 to evaluate image conversion parameters. The evaluation unit 108 evaluates the image conversion parameter using the evaluation parameter received by the input reception unit 109. The evaluation parameter is, for example, a weighting coefficient for changing the influence of each of the plurality of elements affecting the magnitude of the evaluation value on the evaluation value. With such a configuration, the user can obtain an evaluation value of an image conversion parameter suitable for the user's intended use by inputting the evaluation parameter according to the intended use.

The recognition result output by the recognition unit 103 of the object recognition device 10 includes at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103. With such a configuration, the evaluation unit 108 calculates the evaluation value of the image conversion parameter based on at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103. become. The number n _r of the target object recognition unit 103 recognizes, by using the number N _r of the real object, can be calculated recognition accuracy p _r. Therefore, the object recognition device 10 can evaluate the image conversion parameters in consideration of the recognition processing time, the recognition accuracy _{pr, and the like.}

Embodiment 2.
FIG. 8 is a diagram showing a functional configuration of the object recognition device 20 according to the second embodiment. The object recognition device 20 evaluates the image acquisition unit 101, the image conversion unit 120, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, and a robot 110. Since the object recognition device 20 includes the robot 110 and has a function of picking an object, it can also be called an object extraction device. Since the object recognition device 20 includes the robot 110, the image conversion parameters can be evaluated based on the operation result of the robot 110.

The object recognition device 20 has a robot 110 in addition to the functional configuration of the object recognition device 10 according to the first embodiment. Hereinafter, the same functional configuration as that of the first embodiment will be omitted in detail by using the same reference numerals as those of the first embodiment, and the parts different from the first embodiment will be mainly described.

The output unit 104 outputs the recognition result of the recognition unit 103 to the robot 110. The robot 110 grips the target object based on the recognition result output by the output unit 104. The robot 110 outputs the operation result of the operation of gripping the target object to the evaluation unit 108. The evaluation unit 108 evaluates the image conversion parameter based on the operation result of the robot 110 in addition to the recognition result of the recognition unit 103. Here, the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in gripping the target object, the gripping operation time, and the cause of the grip failure.

The robot 110 has a tool capable of grasping an object and performing an object operation necessary for executing a task. For example, when the task is to transport the target object between a plurality of conveyors and the surface of the target object is a smooth surface without unevenness, a suction pad can be used as a tool. Further, the tool may be a gripper hand that grips the target object by sandwiching it with two claws.

The condition for determining that the robot 110 has successfully grasped the target object is that, for example, when the tool is a gripper hand, the opening width when the gripper hand is inserted into the target object and the gripper hand is closed is predetermined. It can be within the specified range. Alternatively, when the tool is a gripper hand and the robot 110 grips the target object and then transports the gripped target object, the condition for determining that the robot 110 succeeds in gripping the target object is the target at the transport destination. It may be assumed that the target object can be held immediately before the gripper hand is released from the object. The conditions for determining that the robot 110 has succeeded in grasping the target object are not limited to the above examples, and can be appropriately defined depending on the type of tool possessed by the robot 110, the work content to be performed by the robot 110, and the like.

In the above, an example of defining conditions for determining that the robot 110 has succeeded in grasping the target object based on whether or not the target object can be held has been described. Whether or not the target object can be held can be determined by using the detection result, for example, when the tool being used is equipped with a function of detecting the holding state of the target object. Alternatively, it may be determined whether or not the target object can be held by using the information of an external sensor such as a camera. For example, when the tool possessed by the robot 110 is an electric hand, there is a product having a function of determining whether or not the target object can be held by measuring the current value when operating the electric hand. When using a camera image, the image of the tool when the target object is not grasped can be stored in advance, the difference from the image taken by the tool after the gripping operation can be taken, and the target object can be held based on the difference. There is a way to determine if it is.

By including the gripping success rate in the operation result of the robot 110, the evaluation unit 108 evaluates the image conversion parameters based on the gripping success rate. Therefore, the image conversion unit 102 performs image conversion so that the gripping success rate becomes high. It becomes possible to use parameters. The operation result of the robot 110 can also include the gripping operation time. For example, when the tool possessed by the robot 110 is a gripper hand and the target object gripped by the robot 110 is transported, the gripping operation time can be the time from closing the gripper hand to opening the gripper hand at the transport destination. By including the gripping operation time in the operation result of the robot 110, the evaluation unit 108 evaluates the image conversion parameter based on the gripping operation time, so that the image conversion unit 102 evaluates the image conversion parameter so that the gripping operation becomes faster. Can be used.

Causes of grip failure of the robot 110 include, for example, failure to grip, dropping during transportation, and multiple grips. By including the cause of gripping failure in the operation result of the robot 110, the evaluation unit 108 evaluates the image conversion parameter based on the cause of failure, so that the image conversion unit 102 can reduce the specific cause of failure. It is possible to use conversion parameters. For example, even if the target object fails to be gripped in the supply box that stores the target object before supply, the target object is likely to fall into the supply box and the gripping operation may be performed again, so the risk is high. low. On the other hand, if the target object is dropped during transportation, the target object may fall and be scattered around, and complicated control of the robot 110 may be required to return to the original state. , It takes time, so the risk is high. Therefore, by reducing the evaluation weight for the low-risk gripping failure cause and increasing the evaluation weight for the high-risk gripping failure cause, the image conversion unit 102 has the target object in the surroundings. It is possible to use image conversion parameters with less risk of scattering.

FIG. 9 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 before the start of operation. In FIG. 9, the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 6, and detailed description thereof will be omitted. Hereinafter, the parts different from FIG. 6 will be mainly described.

The operation from step S121 to step S126 is the same as in FIG. When the recognition process is performed, the robot 110 performs picking based on the recognition result (step S201). The robot 110 outputs the picking operation result to the evaluation unit 108.

The evaluation unit 108 calculates an evaluation value based on the operation result of the robot 110 in addition to the recognition result (step S202). Specifically, the evaluation unit 108 can calculate the _{evaluation value E c} by using, for example, the following mathematical formula (8).

In Equation (8), p _g denotes the gripping success rate, t _g represents the gripping time, p _r represents the recognition accuracy, t _r represents the recognition processing _time, n _{f1, f2 ...} gripping failure cause Indicates the type. Further, w _pg , w _tg , w _pr , w _tr , w _{f1, f2 ...} Indicates a weighting coefficient. The evaluation parameters received by the input receiving unit 109 include weighting coefficients w _pg , w _tg , w _pr , w _tr , w _{f1, f2,} and so on. However, the above _{method for calculating the evaluation value E c} is an example, and the method for calculating the evaluation value E _c used by the evaluation unit 108 is not limited to the above method.

Hereinafter, the operations of steps S128 and S129 are the same as those in FIG. That is, in the process shown in FIG. 9, the point that the picking process is additionally performed between the recognition process and the process of calculating the evaluation value and the specific contents of the process of calculating the evaluation value are shown in FIG. Different from processing.

FIG. 10 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 during operation. In FIG. 10, the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 7, and detailed description thereof will be omitted. Hereinafter, the parts different from those in FIG. 7 will be mainly described.

When the object recognition device 10 determines that the target object exists as a result of the recognition process, the object recognition device 20 outputs the recognition result, whereas the object recognition device 20 outputs the recognition result by the robot 110 instead of the recognition result output. Picking is performed based on (step S203). After the robot 110 picks, the object recognition device 20 repeats the process from step S131.

In the above, the recognition unit 103 recognizes the state of the target object based on the converted image, but the recognition unit 103 of the object recognition device 20 having the robot 110 uses the hand model of the robot 110. Alternatively, the state of the target object may be recognized using a search-based method of searching for a location where the target object can be gripped. When the recognition result is the position / orientation information of the target object, it is desirable that the position / orientation information of the target object can be converted into the position / attitude information of the robot 110 when the robot 110 grips the target object.

As described above, the object recognition device 20 according to the second embodiment further includes a robot 110 that grips the target object based on the recognition result of the recognition unit 103. The evaluation unit 108 of the object recognition device 20 evaluates the image conversion parameters based on the operation result of the robot 110. By having such a configuration, the object recognition device 20 can select an image conversion parameter that can improve the gripping performance, and can improve the gripping success rate of the robot 110.

Further, the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in grasping the target object, the gripping operation time, and the cause of the grip failure. When the probability that the robot 110 succeeds in gripping the target object is included in the operation result, the image conversion parameter is evaluated based on the gripping success rate, so the image conversion parameter that can improve the gripping success rate is selected. This makes it possible to improve the gripping success rate of the robot 110. Further, when the gripping operation time is included in the operation result, the image conversion parameter is evaluated based on the gripping operation time, so that the gripping operation time can be shortened. When the cause of grip failure is included in the operation result, the image conversion parameter is evaluated based on the cause of grip failure, so that it is possible to reduce a specific cause of grip failure.

Embodiment 3.
FIG. 11 is a diagram showing a functional configuration of the object recognition device 30 according to the third embodiment. The object recognition device 30 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, a robot 110, a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115. The simulation unit 111 has a first generation unit 112 and a second generation unit 113.

The object recognition device 30 includes a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115, in addition to the configuration of the object recognition device 20 according to the second embodiment. Hereinafter, the same functional configuration as in the second embodiment will be described in detail by using the same reference numerals as those in the second embodiment, and the parts different from the second embodiment will be mainly described.

The simulation unit 111 creates a target image using the simulation. Specifically, the simulation unit 111 generates a target image by arranging the first generation unit 112 that generates arrangement information indicating the arrangement state of the target object based on the simulation conditions and the target object based on the arrangement information. It has a second generation unit 113.

The simulation conditions used by the first generation unit 112 include, for example, sensor information, target object information, and environmental information. It is desirable that the sensor information includes information such as the focal length, angle of view, and aperture value of the sensor that acquires the sensor image, which changes the state in the space generated by the values. Further, when the sensor performs stereo measurement, the sensor information may include a convergence angle, a baseline length, and the like.

The target object information is a CAD model of the target object, information indicating the material of the target object, and the like. In the case of the CAD model of the target object, the target object information may include the texture information of each surface of the target object. It is desirable that the target object information includes information to the extent that the state of the target object in the space is uniquely determined when the target object is placed in the space by using simulation.

Environmental information can include measurement distance, measurement depth, position / orientation of an object other than the target object, type and intensity of ambient light, and the like. Objects other than the target object are, for example, a box, a measuring table, and the like. By using the simulation conditions, the simulation unit 111 can perform the simulation under detailed conditions and can generate various types of target images.

The arrangement information generated by the first generation unit 112 indicates the arrangement state of at least one target object. When a plurality of target objects are arranged in space, the plurality of target objects may be arranged in an aligned manner or may be in a bulk state. When arranging the target objects in bulk, the processing time can be shortened by arranging the target objects at the calculated simple model positions after performing the simulation using the simple model of the target objects.

The target image generated by the second generation unit 113 may be an RGB image or a distance image. When using an RGB image, it is desirable to set the color or texture of the target object and objects other than the target object.

The simulation unit 111 stores the generated target image in the storage unit 106. Further, the simulation unit 111 may store the simulation conditions used when the first generation unit 112 generates the arrangement information and the arrangement information generated by the first generation unit 112 in the storage unit 106. At this time, it is desirable that the simulation unit 111 stores the arrangement information in association with the target image constituting the image conversion data set.

The image conversion data set generation unit 114 generates an image conversion data set including the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111. The image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106. The image conversion dataset includes one or more sensor images and one or more target images. There is no limit to the number of images of the sensor image and the target image. If the number of images is too small, the learning of the image conversion parameters may not converge, and if the number of images is too large, the learning time may become long. Therefore, it is preferable to determine the number of images according to the intended use of the user, the installation status of the sensor, and the like. Further, the number of images of the target image and the number of images of the sensor image are preferably about the same, but there may be a bias.

The image conversion data set selection unit 115 selects the image conversion data set used for learning by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 based on the sensor image. Specifically, the image conversion data set selection unit 115, based on the sensor image, and calculates the selected evaluation value E _p as a reference in selecting the converted image data sets, based on the calculated selected evaluation value E _p And select the image conversion dataset. For example, the image conversion data set selection unit 115 _{can select only the image conversion data set whose selection evaluation value E p} is equal to or less than a predetermined threshold value. The image conversion data set selection unit 115 can select one or a plurality of image conversion data sets.

The image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105. The first learning unit 105 learns the image conversion parameters using the image conversion data set selected by the image conversion data set selection unit 115. Therefore, the first learning unit 105 learns the image conversion parameters using the target image generated by the simulation unit 111.

The selective evaluation value E _p is calculated using, for example, the following mathematical formula (9).

Here, I _t represents the sensor image, II _s represents the target image groups constituting the converted image data set, N _s denotes the number of images of the target image included in the target image group. Further, F _I (I) refers to any function for calculating the scalar values from the image I. F _I (I) is, for example, average value calculation function of the image, the edge number calculation function, and the like.

Further, when there is arrangement information associated with each target image included in the target image group constituting the image conversion data set, the image conversion data set selection unit 115 uses the following formula (10) to select and evaluate the evaluation value. E _p may be calculated.

Here, l _s indicates the measurement distance of the sensor that acquires the sensor image, l _t indicates the measurement distance of the target image constituting the target image group, and w _I and w _l indicate the weighting coefficient. If the measurement distance of the sensor is not exactly known, an approximate distance may be used. The method for calculating the selective evaluation value E _p is an example, and is not limited to the above method.

FIG. 12 is a flowchart for explaining the operation of the simulation unit 111 shown in FIG.

The first generation unit 112 of the simulation unit 111 acquires the simulation conditions (step S301). The simulation conditions are acquired from, for example, a storage area provided in the simulation unit 111. The first generation unit 112 generates placement information indicating the placement state of the target object based on the simulation conditions (step S302). The first generation unit 112 outputs the generated arrangement information to the second generation unit 113 of the simulation unit 111.

The second generation unit 113 arranges the target object based on the arrangement information generated by the first generation unit 112 and generates a target image (step S303). The second generation unit 113 outputs the generated target image and stores it in the storage unit 106 (step S304).

FIG. 13 is a flowchart for explaining the process performed by the object recognition device 30 shown in FIG. 11 before the start of operation. In FIG. 13, the same parts as those of the object recognition device 10 or the object recognition device 20 are designated by the same reference numerals as those in FIG. 6 or 9, and detailed description thereof will be omitted. Hereinafter, the parts different from those in FIG. 6 or 9 will be mainly described.

The simulation unit 111 of the object recognition device 30 first performs a simulation process (step S311). The simulation process of step S311 is the process shown in steps S301 to S304 of FIG.

Subsequently, the image conversion data set generation unit 114 generates an image conversion data set using the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111 (step S312). The image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106.

The image conversion data set selection unit 115 selects the image conversion data set used by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 (step S313). The image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105.

Hereinafter, the processes of steps S121 to S126, steps S201, S202, steps S128, and S129 are the same as the processes described with reference to FIG. 6 or 9. In step S121, the image conversion parameter learning process will be executed using the image conversion data set selected in step S313.

As described above, the object recognition device 30 according to the third embodiment creates a target image using simulation, and learns image conversion parameters using the created target image. Further, the object recognition device 30 generates an image conversion data set including a target image created by using simulation and a sensor image acquired by the image acquisition unit 101, and uses the generated image conversion data set to perform image conversion. Learn the parameters. Having such a configuration makes it possible to easily generate a target image and an image conversion data set necessary for learning image conversion parameters. Further, the target image is generated based on the simulation conditions, and is generated based on the arrangement information indicating the arrangement state of the target object. Therefore, by adjusting the simulation conditions, it is possible to generate various target images.

The object recognition device 30 selects an image conversion data set to be used by the first learning unit 105 from the image conversion data sets generated by the image conversion data set generation unit 114 based on the sensor image. It has a part 115. By having such a configuration, it becomes possible to learn the image conversion parameters only for the image conversion data set suitable for the surrounding environment, and it is possible to improve the learning efficiency.

Embodiment 4.
FIG. 14 is a diagram showing a functional configuration of the object recognition device 40 according to the fourth embodiment. The object recognition device 40 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. Unit 108, input reception unit 109, robot 110, simulation unit 111, image conversion data set generation unit 114, image conversion data set selection unit 115, recognition data set generation unit 116, and second learning unit. It has 117 and a recognition parameter determination unit 118.

The object recognition device 40 has a recognition data set generation unit 116, a second learning unit 117, and a recognition parameter determination unit 118, in addition to the configuration of the object recognition device 30 according to the third embodiment. Hereinafter, the same functional configuration as in the third embodiment will be described in detail by using the same reference numerals as those in the third embodiment, and the parts different from the third embodiment will be mainly described.

The recognition data set generation unit 116 generates annotation data to be used when the recognition unit 103 performs recognition processing based on the recognition method used by the recognition unit 103, and generates a recognition data set including the generated annotation data and a target image. Generate. The recognition data set generation unit 116 stores the generated recognition data set in the storage unit 106. The annotation data differs depending on the recognition method used by the recognition unit 103. For example, when the recognition method is a neural network that outputs the position and size of the target object on the image, the annotation data is the position and size of the target object on the image.

The second learning unit 117 learns the recognition parameter, which is a parameter used by the recognition unit 103, based on the recognition data set generated by the recognition data set generation unit 116. The second learning unit 117 can be realized, for example, by the same configuration as the first learning unit 105 shown in FIG. The second learning unit 117 includes a state observation unit 11 and a machine learning unit 12. The machine learning unit 12 includes a reward calculation unit 121 and a function update unit 122. The example shown in FIG. 3 is an example of performing machine learning using reinforcement learning, but the second learning unit 117 uses other known methods such as neural networks, genetic programming, and functional logic programming. Machine learning may be performed according to a support vector machine or the like. The second learning unit 117 stores the learning result of the recognition parameter in the storage unit 106. The recognition parameter includes, for example, when the recognition method uses a neural network, the recognition parameter includes a weighting coefficient between each unit constituting the neural network.

The recognition parameter determination unit 118 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used. The recognition parameter determination unit 118 outputs the determined recognition parameter to the recognition unit 103.

The recognition parameter determination unit 118 can, for example, set the recognition parameter having the largest evaluation value as the recognition parameter used by the recognition unit 103. Further, when the output unit 104 outputs the evaluation result of the evaluation unit 108 for each recognition parameter and the input reception unit 109 accepts the input for selecting the recognition parameter, the recognition parameter determination unit 118 recognizes the recognition parameter selected by the user. It can also be output to unit 103. Further, since it is considered that the evaluation value of the recognition parameter changes depending on the image conversion parameter, a plurality of evaluation values may be calculated by changing the image conversion parameter used by the image conversion unit 102 for one learned recognition parameter. .. In this case, the image conversion parameter determination unit 107 can determine the image conversion parameter based on the combination of the calculated evaluation value and the image conversion parameter.

FIG. 15 is a flowchart for explaining the processing performed by the object recognition device 40 shown in FIG. 14 before the start of operation. In FIG. 15, the same parts as those of the object recognition device 30 are designated by the same reference numerals as those in FIG. 13, and detailed description thereof will be omitted. Hereinafter, the parts different from FIG. 13 will be mainly described.

After performing the simulation process of step S311, the object recognition device 40 generates a recognition data set in parallel with the process of steps S312, S313, and S121 (step S401), and uses the generated recognition data set to generate recognition parameters. The recognition parameter learning process for learning is performed (step S402).

Subsequently, the object recognition device 40 selects the image conversion parameter and the recognition parameter after the processing of steps S122 and S123 (step S403). Hereinafter, the processing of steps S125, S126, S201, and S202 is the same as that of the object recognition device 30.

After the evaluation value is calculated, the image conversion unit 102 of the object recognition device 40 determines whether or not the evaluation value of all the image conversion parameters and the combination of the recognition parameters has been calculated (step S404). When the evaluation value of the combination of all the image conversion parameters and the recognition parameters is calculated (step S404: Yes), the object recognition device 40 performs the process of step S129 and determines the recognition parameters (step S405). When the evaluation values of all the image conversion parameters and the combinations of the recognition parameters have not been calculated (step S404: No), the object recognition device 40 returns to the process of step S403.

As described above, the object recognition device 40 according to the fourth embodiment generates annotation data used by the recognition unit 103 based on the recognition method used by the recognition unit 103, and generates the generated annotation data and the target image. Learn the recognition parameters using the included recognition data set. With such a configuration, the object recognition device 40 can easily generate recognition data sets of various situations.

Further, the object recognition device 40 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used. By having such a configuration, the object recognition device 40 can perform recognition processing using recognition parameters suitable for the target object, the surrounding environment, and the like, and can improve the recognition success rate and the gripping success rate. become.

Subsequently, the hardware configurations of the

object recognition devices

10, 20, 30, and 40 according to the first to fourth embodiments will be described. Each component of the

object recognition device

10, 20, 30, and 40 is realized by a processing circuit. These processing circuits may be realized by dedicated hardware, or may be control circuits using a CPU (Central Processing Unit).

When the above processing circuits are realized by dedicated hardware, these are realized by the processing circuit 90 shown in FIG. FIG. 16 is a diagram showing dedicated hardware for realizing the functions of the

object recognition devices

10, 20, 30, and 40 according to the first to fourth embodiments. The processing circuit 90 is a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a combination thereof.

When the above processing circuit is realized by a control circuit using a CPU, this control circuit is, for example, a control circuit 91 having the configuration shown in FIG. FIG. 17 is a diagram showing a configuration of a control circuit 91 for realizing the functions of the

object recognition devices

10, 20, 30, and 40 according to the first to fourth embodiments. As shown in FIG. 17, the control circuit 91 includes a processor 92 and a memory 93. The processor 92 is a CPU, and is also called a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), or the like. The memory 93 is, for example, a non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable ROM), and EPROM (registered trademark) (Electrically EPROM). Magnetic discs, flexible discs, optical discs, compact discs, mini discs, DVDs (Digital Versatile Disks), etc.

When the above processing circuit is realized by the control circuit 91, it is realized by the processor 92 reading and executing the program corresponding to the processing of each component stored in the memory 93. The memory 93 is also used as a temporary memory in each process executed by the processor 92. The computer program executed by the processor 92 may be provided via a communication network, or may be provided in a state of being stored in a storage medium.

The configuration shown in the above embodiments is an example, and can be combined with another known technique, can be combined with each other, and does not deviate from the gist. It is also possible to omit or change a part of the configuration.

10, 20, 30, 40 Object recognition device, 11 State observation unit, 12 Machine learning unit, 90 Processing circuit, 91 Control circuit, 92 Processor, 93 Memory, 101 Image acquisition unit, 102 Image conversion unit, 103 Recognition unit, 104 Output unit, 105 first learning unit, 106 storage unit, 107 image conversion parameter determination unit, 108 evaluation unit, 109 input reception unit, 110 robot, 111 simulation unit, 112 first generation unit, 113 second generation unit, 114 Image conversion data set generation unit, 115 image conversion data set selection unit, 116 recognition data set generation unit, 117 second learning unit, 118 recognition parameter determination unit, 121 reward calculation unit, 122 function update unit.

Claims

An image acquisition unit that acquires an image of the target object,
An image conversion unit that converts a sensor image, which is the image acquired by the image acquisition unit, into an image using image conversion parameters and outputs the converted image.
A recognition unit that recognizes the state of the target object based on the converted image,
An evaluation unit that evaluates the image conversion parameter used to generate the converted image based on the recognition result of the recognition unit, and an evaluation unit.
An output unit that outputs the recognition result and the evaluation result of the evaluation unit,
An object recognition device characterized by comprising.
The object recognition device according to claim 1, wherein the image conversion parameter is a parameter for converting the sensor image into an image having predetermined features.
A first learning unit that learns the image conversion parameters for each of the features,
With more
The object recognition device according to claim 2, wherein the image conversion unit converts the sensor image into an image by using the image conversion parameter which is the learning result of the first learning unit.
The image conversion unit performs image conversion in a plurality of stages to convert the sensor image into the converted image, and then converts the sensor image into the converted image.
The object recognition device according to claim 3, wherein the first learning unit learns each of a plurality of types of image conversion parameters used for each stage of image conversion.
The image conversion unit converts the sensor image into an intermediate image, and converts the intermediate image into the converted image to convert the sensor image into the converted image.
The first learning unit learns a first image conversion parameter for converting the sensor image into an intermediate image and a second image conversion parameter for converting the intermediate image into the converted image. The object recognition device according to claim 4, wherein the object recognition device is characterized by the above.
The image conversion unit converts the sensor image into a plurality of component images, then synthesizes the plurality of component images to acquire the converted image, and obtains the converted image.
The object recognition device according to claim 3, wherein the first learning unit learns a plurality of types of image conversion parameters for converting the sensor image into each of the plurality of component images.
A conversion parameter determination unit that determines the image conversion parameter used by the image conversion unit based on the evaluation result of the evaluation unit when each of the plurality of image conversion parameters is used.
The object recognition device according to any one of claims 1 to 6, further comprising.
An input receiving unit that accepts input of evaluation parameters, which is a parameter used by the evaluation unit to evaluate the image conversion parameter.
With more
The object recognition device according to any one of claims 1 to 7, wherein the evaluation unit evaluates the image conversion parameter using the evaluation parameter received by the input reception unit.
The object recognition according to any one of claims 1 to 8, wherein the recognition result includes at least one of the recognition processing time of the recognition unit and the number of the target objects recognized by the recognition unit. Device.
A robot that grips the target object based on the recognition result of the recognition unit is further provided.
The object recognition device according to any one of claims 1 to 9, wherein the evaluation unit evaluates the image conversion parameter based on the operation result of the robot.
The object recognition device according to claim 10, wherein the operation result includes at least one of a probability that the robot has successfully grasped the target object, a gripping operation time, and a cause of the grip failure.
A simulation unit that creates a target image, which is an image having the predetermined characteristics, using simulation.
With more
The object recognition device according to claim 3, wherein the first learning unit learns the image conversion parameter using the target image created by the simulation unit.
The simulation unit has a first generation unit that generates arrangement information indicating an arrangement state of the target object based on simulation conditions, and a second generation unit that arranges the target object based on the arrangement information and generates the target image. Has a generator and
An image conversion data set generation unit that generates an image conversion data set including the target image generated by the simulation unit and the sensor image.
The object recognition device according to claim 12, further comprising.
An image conversion data set selection unit that selects an image conversion data set to be used by the first learning unit from the image conversion data sets created by the image conversion data set generation unit based on the sensor image.
The object recognition device according to claim 13, further comprising.
A recognition data set generation unit that generates annotation data used when the recognition unit performs recognition processing based on the recognition method used by the recognition unit, and generates a recognition data set including the target image and the annotation data.
The object recognition device according to any one of claims 12 to 14, further comprising.
A second learning unit that learns recognition parameters, which are parameters used by the recognition unit, based on a recognition data set including annotation data used when the recognition unit performs recognition processing and the target image.
The object recognition device according to claim 15, further comprising.
A recognition parameter determination unit that determines the recognition parameters used by the recognition unit based on the evaluation results of the evaluation unit when each of the plurality of recognition parameters is used.
The object recognition device according to claim 16, further comprising.
The step that the object recognition device acquires the image of the target object,
A step in which the object recognition device converts the acquired image into an image using image conversion parameters and outputs the converted image.
A step in which the object recognition device recognizes the state of the target object based on the converted image.
A step of evaluating the image conversion parameter used by the object recognition device to generate the converted image based on the recognition result.
A step in which the object recognition device outputs the recognition result and the evaluation result,
An object recognition method characterized by including.