WO2021149251A1 - Dispositif de reconnaissance d'objet et procédé de reconnaissance d'objet - Google Patents

Dispositif de reconnaissance d'objet et procédé de reconnaissance d'objet Download PDF

Info

Publication number
WO2021149251A1
WO2021149251A1 PCT/JP2020/002577 JP2020002577W WO2021149251A1 WO 2021149251 A1 WO2021149251 A1 WO 2021149251A1 JP 2020002577 W JP2020002577 W JP 2020002577W WO 2021149251 A1 WO2021149251 A1 WO 2021149251A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
unit
recognition
image conversion
target
Prior art date
Application number
PCT/JP2020/002577
Other languages
English (en)
Japanese (ja)
Inventor
彩佳里 大島
亮輔 川西
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to CN202080092120.2A priority Critical patent/CN114981837A/zh
Priority to JP2021572241A priority patent/JP7361800B2/ja
Priority to PCT/JP2020/002577 priority patent/WO2021149251A1/fr
Publication of WO2021149251A1 publication Critical patent/WO2021149251A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to an object recognition device and an object recognition method that recognize an object object based on a photographed image of the object object.
  • Patent Document 1 discloses a technique for recognizing the state of an object based on an image of the target object in a gripping system that grips the target object.
  • Patent Document 1 there is a problem that the recognition performance may deteriorate when the environment when the recognition process is executed, for example, the surrounding environment of the target object, the measurement conditions, and the like change. there were.
  • the present disclosure has been made in view of the above, and an object of the present disclosure is to obtain an object recognition device capable of improving recognition performance even when the environment when executing recognition processing changes. do.
  • the object recognition device of the present disclosure is an image acquired by the image acquisition unit using an image acquisition unit that acquires an image of the target object and an image conversion parameter.
  • An image conversion unit that converts the sensor image into an image and outputs the converted image, a recognition unit that recognizes the state of the target object based on the converted image, and a conversion unit that generates a converted image based on the recognition result of the recognition unit. It is characterized by including an evaluation unit for evaluating the image conversion parameter used for the purpose, and an output unit for outputting the recognition result and the evaluation result of the evaluation unit.
  • the figure which shows an example of the display screen displayed by the output part shown in FIG. The figure which shows an example of the detailed structure of the 1st learning part shown in FIG.
  • a flowchart for explaining an operation example of the first learning unit shown in FIG. The figure for demonstrating the operation example when the 1st learning part shown in FIG. 1 uses CycleGAN.
  • FIG. 8 before the start of operation.
  • a flowchart for explaining the operation of the simulation unit shown in FIG. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 11 before the start of operation.
  • the figure which shows the functional structure of the object recognition apparatus which concerns on Embodiment 4. A flowchart for explaining the processing performed by the object recognition device shown in FIG. 13 before the start of operation.
  • FIG. 1 is a diagram showing a functional configuration of the object recognition device 10 according to the first embodiment.
  • the object recognition device 10 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108 and an input receiving unit 109.
  • the object recognition device 10 has a function of recognizing a state such as the position and orientation of the target object based on a photographed image of the target object.
  • the image acquisition unit 101 acquires an image of the target object.
  • the image acquisition unit 101 may be an imaging device having an image sensor, or may be an interface for acquiring an image captured by a photographing device connected to the object recognition device 10.
  • the image acquired by the image acquisition unit 101 is referred to as a sensor image.
  • the image acquisition unit 101 outputs the acquired sensor image to each of the image conversion unit 102 and the first learning unit 105.
  • the sensor image may be a monochrome image or an RGB image.
  • the sensor image may be a distance image in which the distance is expressed by the brightness and darkness. The distance image may be generated based on the set data of points having three-dimensional position information.
  • the image acquisition unit 101 acquires the minimum information for reconstructing a set of points having three-dimensional position information from the distance image at the same time as the distance image.
  • the minimum information for reconstructing a set of points is focal length, scale, and so on.
  • the image acquisition unit 101 may be able to acquire a plurality of types of images.
  • the image acquisition unit 101 may be able to acquire both a monochrome image and a distance image of the target object.
  • the image acquisition unit 101 may be a photographing device capable of capturing both a monochrome image and a distance image by one unit, a photographing device for capturing a monochrome image, and a photographing device for capturing a distance image. It may be composed of and.
  • the monochrome image shooting and the distance image shooting are performed by different shooting devices, it is preferable to grasp the positional relationship between the two shooting devices in advance.
  • the image conversion unit 102 converts the sensor image acquired by the image acquisition unit 101 into an image using the image conversion parameter, and outputs the converted image to the recognition unit 103.
  • the image conversion unit 102 is stored in the storage unit 106 so that the sensor image has a predetermined feature for each target image group by using the image conversion parameter which is the learning result of the first learning unit 105. Perform image conversion.
  • an image having predetermined features is referred to as a target image
  • a set of target images is referred to as a target image group.
  • Common features are, for example, the shape of the target object, the surface characteristics of the target object, the measurement distance, the depth, and the like.
  • common features are the position and orientation of objects other than the target object to be recognized, the type and intensity of ambient light, the type of measurement sensor, the parameters of the measurement sensor, the arrangement state of the target object, the image style, and the target object. It may be the quantity of.
  • the parameters of the measurement sensor are parameters such as focus and aperture.
  • the arrangement state of the target object is an alignment state, a bulk state, or the like.
  • a plurality of target images included in the same target image group may have one common feature or may have a plurality of common features.
  • “having a common feature” includes not only the case where the above-mentioned features are the same but also the case where they are similar.
  • a reference shape such as a rectangular parallelepiped, a cylinder, or a hexagonal column
  • the shape of the target object has a common feature even if the shape of the target object in the target image is close enough to approximate the same reference shape. It can be an image having.
  • the standard colors such as black, white, and gray are set for the surface characteristics of the target object, even if the apparent hues of the target objects in the target image are close enough to be classified into the same standard colors. It can be an image having common features.
  • At least one target object is shown in the target image.
  • the target object shown in the target image does not necessarily have to be shown in its entirety. For example, if a part of the target object is out of the measurement range, or if the target object is partially hidden by another object, the part of the target object displayed in the target image may be missing. no problem.
  • the arrangement state of the plurality of target objects may be an aligned state or a bulk state.
  • the target image is preferably an image that makes it easy to recognize the target object.
  • An image in which the target object can be easily recognized is, for example, an image in which the shape of the target object is not complicated, has a simple shape such as a rectangular parallelepiped or a cube, and has less noise.
  • the number and types of image conversion parameters used by the image conversion unit 102 differ depending on the image conversion method. It is desirable that the image conversion unit 102 use an image conversion method such that the state such as the position and orientation of the target object in the converted image is not significantly different from the state of the target object in the sensor image.
  • the image conversion unit 102 can use, for example, an image conversion method using a neural network. When an image conversion method using a neural network is used, the image conversion parameters include a weighting coefficient between each unit constituting the network.
  • the recognition unit 103 recognizes a state such as the position and orientation of the target object based on the converted image output by the image conversion unit 102.
  • the recognition method used by the recognition unit 103 is not particularly limited.
  • the recognition unit 103 may use a machine learning-based recognition method that performs pre-learning so that the state of the target object can be output from the image, or the CAD (Computer-Aided Design) data of the target object. Model matching that estimates the state of the target object by collating it with the three-dimensional measurement data may be used.
  • the recognition unit 103 may perform the recognition process using one type of recognition method, or may perform the recognition process using a combination of a plurality of types of recognition methods.
  • the recognition unit 103 outputs the recognition result to each of the output unit 104 and the evaluation unit 108.
  • the recognition result includes, for example, at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103.
  • the output unit 104 has a function of outputting the recognition result and the evaluation result of the evaluation unit 108, which will be described in detail later.
  • the method of outputting the recognition result and the evaluation result by the output unit 104 is not particularly limited.
  • the output unit 104 includes a display device, and may display the recognition result and the evaluation result on the screen of the display device. Further, the output unit 104 is provided with an interface with an external device, and the recognition result and the evaluation result may be transmitted to the external device.
  • FIG. 2 is a diagram showing an example of a display screen displayed by the output unit 104 shown in FIG.
  • “Input” in FIG. 2 indicates an area for displaying a sensor image
  • “parameter” indicates an area for displaying an image conversion parameter and an evaluation value which is an evaluation result.
  • “conversion” in FIG. 2 indicates an area for displaying the converted image
  • “recognition” indicates an area for displaying the recognition result. For example, when the user performs an operation of selecting one of a plurality of image conversion parameters displayed on the "parameter", the name of the selected image conversion parameter is displayed on the "Name" of the display screen.
  • the first learning unit 105 learns image conversion parameters for image conversion of the sensor image so as to have the characteristics of the target image group.
  • the first learning unit 105 learns the image conversion parameters used by the image conversion unit 102 for each target image group.
  • FIG. 3 is a diagram showing an example of a detailed configuration of the first learning unit 105 shown in FIG.
  • the first learning unit 105 has a state observation unit 11 and a machine learning unit 12.
  • the first learning unit 105 can obtain an image conversion parameter capable of performing image conversion that reproduces the characteristics of the target image group. The possibility is high.
  • the learning of the image conversion parameter of the first learning unit 105 is difficult to converge.
  • the state observation unit 11 observes the image conversion parameters, the target image group, and the similarity between the converted image and the features of the target image group as state variables.
  • the machine learning unit 12 learns the image conversion parameters for each target image group according to the training data set created based on the image conversion parameters, the target image group, and the state variables of the similarity.
  • the learning algorithm used by the machine learning unit 12 may be any. As an example, a case where the machine learning unit 12 uses reinforcement learning will be described. Reinforcement learning is a learning algorithm in which an agent, who is the subject of action in a certain environment, observes the current state and decides the action to be taken. Agents are rewarded by the environment by choosing an action and learn how to get the most reward through a series of actions. Q-learning and TD-learning are known as typical methods of reinforcement learning. For example, if the Q-learning, general update equations of action value function Q (s t, a t) is expressed by the following equation (1).
  • Equation (1) s t represents the environment at time t, a t represents the behavior in time t.
  • the environment is changed to s t + 1.
  • r t + 1 denotes the reward given in accordance with the changing environment as a result of action a t, gamma represents the discount rate, alpha represents a learning coefficient.
  • the update formula represented by the formula (1) increases the action value Q if the action value Q of the best action a at time t + 1 is larger than the action value Q of the action a executed at time t, and vice versa. In the case of, the action value Q is reduced. In other words, the action value Q of action a at time t, as close to the best action value at time t + 1, action value function Q (s t, a t) Update. By repeating such updates, the best behavioral value in a certain environment is sequentially propagated to the behavioral value in the previous environment.
  • the machine learning unit 12 has a reward calculation unit 121 and a function update unit 122.
  • the reward calculation unit 121 calculates the reward based on the state variable.
  • the reward calculation unit 121 calculates the reward r based on the similarity included in the state variable.
  • the degree of similarity increases as the converted image reproduces the characteristics of the target image group. For example, if the similarity is higher than a predetermined threshold, the reward calculation unit 121 increases the reward r.
  • the reward calculation unit 121 can increase the reward r by giving a reward of "1", for example.
  • the reward calculation unit 121 reduces the reward r.
  • the reward calculation unit 121 can, for example, give a reward of "-1" to reduce the reward r.
  • the similarity is calculated according to a known method according to the type of features of the target image group.
  • the function update unit 122 updates the function for determining the image conversion parameter according to the reward r calculated by the reward calculation unit 121.
  • action value function Q (s t, a t) represented by Equation (1), and is used as a function for determining an image transform parameter.
  • FIG. 4 is a flowchart for explaining an operation example of the first learning unit 105 shown in FIG.
  • the operation shown in FIG. 4 is performed before the operation of the object recognition device 10 is started.
  • the state observation unit 11 of the first learning unit 105 acquires the sensor image group using the image acquisition unit 101 (step S101).
  • the state observation unit 11 selects one target image group from a plurality of predetermined target image groups (step S102).
  • the first learning unit 105 sets the image conversion parameters for the selected target image group (step S103).
  • the first learning unit 105 causes the image conversion unit 102 to perform image conversion of the sensor image using the set image conversion parameters (step S104).
  • the state observation unit 11 of the first learning unit 105 acquires the image conversion parameter, which is a state variable, the target image group, and the similarity between the converted image and the features of the target image group (step S105).
  • the state observation unit 11 outputs the acquired state variables to the machine learning unit 12.
  • the reward calculation unit 121 of the machine learning unit 12 determines whether or not the similarity is higher than the threshold value (step S106).
  • step S106: Yes When the similarity is higher than the threshold value (step S106: Yes), the reward calculation unit 121 increases the reward r (step S107). When the similarity is lower than the threshold value (step S106: No), the reward calculation unit 121 reduces the reward r (step S108). The reward calculation unit 121 outputs the calculated reward r to the function update unit 122.
  • the first learning unit 105 determines whether or not a predetermined learning end condition is satisfied (step S110). It is desirable that the learning end condition is a condition for determining that the learning accuracy of the image conversion parameter is equal to or higher than the standard. For example, the learning end conditions are "the number of times the processing of steps S103 to S109 is repeated exceeds a predetermined number of times" and "the elapsed time from the start of learning the image conversion parameters for the same target image group". Exceeding a predetermined time. "the number of times the processing of steps S103 to S109 is repeated exceeds a predetermined number of times" and "the elapsed time from the start of learning the image conversion parameters for the same target image group”. Exceeding a predetermined time. "
  • step S110: No When the learning end condition is not satisfied (step S110: No), the first learning unit 105 repeats the process from step S103. When the learning end condition is satisfied (step S110: Yes), the first learning unit 105 outputs the learning result of the image conversion parameter for the target image group (step S111).
  • the first learning unit 105 determines whether or not the learning for all the target image groups has been completed (step S112). When the learning for all the target image groups is not completed, that is, when there is a target image group for which the learning has not been completed (step S112: No), the first learning unit 105 repeats the process from step S102. When the learning for all the target image groups is completed (step S112: Yes), the first learning unit 105 ends the image conversion parameter learning process.
  • first learning unit 105 performs machine learning using reinforcement learning
  • first learning unit 105 describes other known methods such as neural networks, genetic programming, and functional logic programming.
  • Machine learning may be performed according to a support vector machine or the like.
  • FIG. 5 is a diagram for explaining an operation example when the first learning unit 105 shown in FIG. 1 uses CycleGAN (Generative Adversarial Networks).
  • the first learning unit 105 learns the image conversion parameters using CycleGAN.
  • first learning unit 105 as shown in FIG. 5, a first generator G, a second generator F, the first discriminator D X, and a second discriminator D Y Use to learn image conversion parameters.
  • the first learning unit 105 learns the image conversion parameters between the image groups X and Y using the training data of the two types of image groups X and Y.
  • the image included in the training data of the image group X is referred to as an image x
  • the image included in the training data of the image group Y is referred to as an image y.
  • the first generator G generates an image having the characteristics of the image group Y from the image x.
  • G (x) be the output when the image x is input to the first generator G.
  • the second generator F generates an image having the characteristics of the image group X from the image y.
  • F (y) be the output when the image y is input to the second generator F.
  • First discriminator D X is distinguish between x and F (y).
  • the second discriminator D Y distinguish between y and G (x).
  • First learning unit 105 two on the basis of the loss, image conversion accuracy of the first generator G and the second generator F is increased, the identification accuracy of the first discriminator D X and a second discriminator D Y Learn so that Specifically, the first learning section 105, the following equation (2) total loss indicated L (G, F, D X , D Y) is, to satisfy an objective function represented by the following formula (3) To learn.
  • First loss L GAN included in Equation (2) (G, D Y , X, Y) , when the first generator G generates the image G (x) having the characteristics of the image group Y from the image x It is a loss that occurs.
  • Second loss L GAN included in Equation (2) (F, D X , Y, X) , when the second generator F generated the image F (x) having the characteristics of the image group X from the image y It is a loss that occurs.
  • the third loss L cyc (G, F) included in the equation (2) the image x is input to the first generator G to generate the image G (x), and the generated image G (x) is used as the second image G (x).
  • the first learning section 105 the following on the basis of the four assumptions, the total loss total loss L (G, F, D X , D Y) a first generator so that smaller G and a second generator learns of F, performing total loss total loss L (G, F, D X , D Y) a learning of the first discriminator so increases D X and a second discriminator D Y.
  • the image G (x) converted by inputting the image x into the first generator G should be similar to the image group Y.
  • the image F (y) converted by inputting the image y into the second generator F should be similar to the image group X. 3. 3.
  • the image F (G (x)) converted by inputting the image G (x) into the second generator F should be similar to the image group X. 4.
  • the image G (F (y)) converted by inputting the image F (y) into the first generator G should be similar to the image group Y.
  • the first learning unit 105 is used in the first generator G that performs the above learning with the sensor image group as the image group X and the target image group as the image group Y and generates the target image group from the sensor image group.
  • the image conversion parameters are learned, and the learning result is output to the storage unit 106.
  • the first learning unit 105 performs the above learning for each of the plurality of types of target image groups, and learns the image conversion parameters for each target image group.
  • the storage unit 106 stores the image conversion parameters for each target image group, which is the learning result of the first learning unit 105.
  • the image conversion parameter determination unit 107 determines the image conversion parameter used by the image conversion unit 102 during operation from among a plurality of image conversion parameters based on the evaluation result performed by the evaluation unit 108 described later before the start of operation. ..
  • the image conversion parameter determination unit 107 notifies the image conversion unit 102 of the determined image conversion parameter.
  • the image conversion parameter determination unit 107 may, for example, use the image conversion parameter having the maximum evaluation value E c as the image conversion parameter used by the image conversion unit 102, or the evaluation unit 108 causes the output unit 104 to output the evaluation result.
  • the image conversion parameter selected after confirming the evaluation result output by the user may be used as the image conversion parameter used by the image conversion unit 102.
  • the output unit 104 adds each image conversion parameter to the evaluation result. It is conceivable to output the converted image when used. In this case, the user can check the converted image and select an image conversion parameter capable of performing conversion that suppresses light reflection.
  • the output unit 104 may output the evaluation value of the image conversion parameter whose evaluation value is equal to or more than the threshold value and the converted image, and may not output the image conversion parameter whose evaluation value is less than the threshold value.
  • the evaluation unit 108 evaluates each of the plurality of image conversion parameters based on the recognition result of the recognition unit 103 when each of the plurality of image conversion parameters is used. Specifically, the evaluation unit 108 calculates the evaluation value E c, and outputs the a is the evaluation result calculated evaluation value E c to each of the image conversion parameter determination unit 107 and an output unit 104.
  • the evaluation value E c calculated by the evaluation unit 108 is represented by, for example, the following mathematical formula (4).
  • the evaluation value E c is the sum of the value obtained by multiplying the weight coefficient w pr in recognition accuracy p r, a value obtained by multiplying the weight coefficient w tr to the inverse of the recognition processing time t r.
  • the values of the weighting coefficients w pr and w tr may be determined depending on what the user attaches importance to. For example, if it is desired to emphasize the speed of the recognition process even if the recognition accuracy is slightly lowered, the value of the weighting coefficient w pr may be reduced and the value of the weighting coefficient w tr may be increased. On the contrary, when the recognition accuracy is emphasized even if it takes time, the value of the weighting coefficient w pr may be increased and the value of the weighting coefficient w tr may be decreased.
  • the recognition accuracy pr is the degree to which the target object in the sensor image can be recognized, or the error of the state of the target object, specifically, the error of the position and orientation.
  • the recognition accuracy pr is expressed by the following mathematical formula (5).
  • n r indicates the number of recognizable target objects
  • N w indicates the number of target objects in the sensor image.
  • recognition accuracy p r represented by the equation (5), the number n r of the recognized target object, which is divided by the number N w of the object in the sensor image. If the error between the position and orientation of the target object in the sensor image and the recognized position and orientation is within the threshold value, it may be determined that the recognition is successful, or the user visually determines whether or not the recognition is successful. You may.
  • recognition accuracy p r is expressed by the following equation (6).
  • x w indicates the actual position / orientation of the target object
  • x r indicates the recognized position / orientation
  • recognition accuracy p r represented by the equation (6) is the inverse of the value obtained by adding 1 to the absolute value of the difference between the actual position and orientation x w and recognized position and orientation x r of the target object.
  • the actual position / orientation and the recognized position / orientation of the target object may be the position / orientation in the image space or the position / orientation in the real space.
  • the recognition accuracy pr is not limited to the above example.
  • the above examples may be combined.
  • the evaluation value E c may be calculated using the following equation (7).
  • Tr indicates a recognition processing time threshold. That is, when using Equation (7), when the recognition process within the recognition processing time threshold T r is completed, the evaluation value E c is a value obtained by multiplying the weight coefficient w pr in recognition accuracy p r, the recognition processing time If the recognition process is not completed within the threshold T r , the evaluation value E c is 0. Recognition processing time The recognition processing is not completed within the threshold value T r. By setting the evaluation value E c of the image conversion parameter to 0, the image conversion parameter that can complete the recognition processing within the time required by the user can be confirmed and confirmed. It becomes possible to select.
  • the method for calculating the evaluation value E c is not limited to the above.
  • the input receiving unit 109 receives the input of the evaluation parameter, which is a parameter used by the evaluation unit 108 to evaluate the image conversion parameter.
  • the input receiving unit 109 may accept evaluation parameters input by the user using an input device or the like, may receive evaluation parameters from a functional unit in the object recognition device 10, or may receive evaluation parameters from an external device of the object recognition device 10. Evaluation parameters may be accepted from.
  • the evaluation parameters received by the input receiving unit 109 include, for example, the weighting coefficients w pr, w tr included in the mathematical formula (4), and the influence of each of a plurality of elements affecting the magnitude of the evaluation value on the evaluation value. It is a weighting factor for changing.
  • FIG. 6 is a flowchart for explaining the process performed by the object recognition device 10 shown in FIG. 1 before the start of operation.
  • the first learning unit 105 of the object recognition device 10 performs the image conversion parameter learning process (step S121). Since the image conversion parameter learning process shown in step S121 is the process described with reference to FIG. 4 or the process described with reference to FIG. 5, detailed description thereof will be omitted here.
  • the input receiving unit 109 acquires the evaluation parameters and outputs the acquired evaluation parameters to the evaluation unit 108 (step S122).
  • the image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S123).
  • the image conversion unit 102 selects one image conversion parameter for which the evaluation value has not yet been calculated from the plurality of learned image conversion parameters stored in the storage unit 106 (step S124).
  • the image conversion unit 102 performs an image conversion process of converting the sensor image acquired by the image acquisition unit 101 into an image after conversion using the selected image conversion parameter (step S125).
  • the image conversion unit 102 outputs the converted image to the recognition unit 103.
  • the recognition unit 103 performs recognition processing using the converted image and outputs the recognition result to the evaluation unit 108 (step S126). When outputting the recognition result, the recognition unit 103 may output the recognition result to the output unit 104.
  • the evaluation unit 108 calculates the evaluation value E c based on the recognition result, and outputs the calculated evaluation value E c to the image conversion parameter determination unit 107 (step S127).
  • the image conversion unit 102 determines whether or not the evaluation values E c of all the image conversion parameters have been calculated (step S128).
  • the evaluation values E c of all the image conversion parameters have not been calculated (step S128: No)
  • the image conversion unit 102 starts from step S124. Repeat the process.
  • the image transformation parameter determination unit 107 from among a plurality of image transformation parameters, based on the evaluation value is an evaluation result of the evaluation unit 108
  • the image conversion parameter used by the image conversion unit 102 during operation is determined (step S129).
  • FIG. 7 is a flowchart for explaining the operation of the object recognition device 10 shown in FIG. 1 during operation. Before the operation, the operation shown in FIG. 6 is performed, the image conversion parameters have been learned for each target image group, and the image conversion parameters used by the image conversion unit 102 from the learned image conversion parameters. Is selected.
  • the image acquisition unit 101 acquires a sensor image and outputs the acquired sensor image to the image conversion unit 102 (step S131).
  • the image conversion unit 102 acquires the selected image conversion parameter (step S132).
  • the image conversion unit 102 performs an image conversion process for converting the sensor image into a converted image using the acquired image conversion parameters, and outputs the converted image to the recognition unit 103 (step S133).
  • the recognition unit 103 uses the converted image to perform a recognition process for recognizing the state of the target object included in the converted image, and outputs the recognition result to the output unit 104 (step S134).
  • the output unit 104 determines whether or not the target object exists based on the recognition result (step S135). When the target object exists (step S135: Yes), the output unit 104 outputs the recognition result (step S136). After outputting the recognition result, the image acquisition unit 101 repeats the process from step S131. When the target object does not exist (step S135: No), the object recognition device 10 ends the process.
  • the image conversion unit 102 converts the sensor image into a converted image by a one-step image conversion process, but the present embodiment is not limited to such an example.
  • the image conversion unit 102 may perform image conversion in a plurality of stages to convert the sensor image into an image after conversion. For example, when two-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image and converts the first intermediate image into a converted image. When three-step image conversion is performed, the image conversion unit 102 converts the sensor image into a first intermediate image, converts the first intermediate image into a second intermediate image, and converts the second intermediate image. Convert to a later image.
  • the first learning unit 105 learns each of the plurality of types of image conversion parameters used in each stage of the image conversion. Specifically, the first learning unit 105 sets a first image conversion parameter for converting the sensor image into an intermediate image and a second image conversion parameter for converting the intermediate image into a converted image. learn. Further, when three or more steps of image conversion are performed, the first learning unit 105 learns a third image conversion parameter for converting an intermediate image into an intermediate image. For example, when two-step image conversion is performed, the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the converted image of the first intermediate image. Learn with a second image conversion parameter for conversion to.
  • the first learning unit 105 converts the first image conversion parameter for converting the sensor image into the first intermediate image and the first intermediate image into the second intermediate image.
  • a third image conversion parameter for converting to an intermediate image and a second image conversion parameter for converting a second intermediate image into a converted image are learned.
  • the intermediate image is an image that is different from both the sensor image and the converted image.
  • the converted image is a distance image generated using CG (Computer Graphic) without noise or omission
  • the intermediate image is simulated for noise, measurement error, omission of the blind spot of the sensor, etc. It can be a reproduced reproduced image.
  • the first learning unit 105 has a first image conversion parameter for converting the sensor image into an intermediate image which is a reproduced image, and a second learning unit 105 for converting the intermediate image into a converted image which is a distance image. Learn the image conversion parameters of. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.
  • the converted image may be obtained by dividing the converted image into a plurality of types of component images, converting the sensor image into a plurality of component images, and then synthesizing the images.
  • the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each component image. For example, from one sensor image, a texture image which is a component image having the characteristics of the texture component of the converted image and a color image which is a component image having the characteristics of the global color component of the converted image are generated. It is conceivable that the texture image and the color image are combined to obtain a converted image.
  • the first learning unit 105 learns an image conversion parameter for converting the sensor image into a texture image and an image conversion parameter for converting the sensor image into a color image.
  • a converted image can also be obtained by using three or more component images.
  • the problem to be solved is facilitated, so that the convergence of learning can be improved and the recognition performance can be improved.
  • synthesizing a plurality of component images to obtain a converted image a converted image having characteristics closer to the target image group is obtained than when a converted image is obtained from a sensor image using one type of image conversion parameter. Will be possible.
  • the image processing to be performed has features, properties, and the like that the image should have. Therefore, instead of converting the image used for recognition only once, image conversion that facilitates each image processing in the recognition process can be executed each time as a preprocessing for each image processing.
  • the first learning unit 105 only needs to learn the image conversion parameters for the number of image processes for which preprocessing is desired, and targets an ideal processing result image group obtained when each image processing is executed. Can be a group.
  • the image conversion parameter can be evaluated based on the recognition processing result, and the evaluation result can be obtained. Therefore, it is possible to confirm the influence of the image conversion parameter on the recognition process. Therefore, it is possible to select the image conversion parameter according to the environment when the recognition process is executed, and it is possible to improve the recognition performance even when the environment when the recognition process is executed changes. It becomes.
  • the image conversion parameter is a parameter for image conversion of the sensor image into an image having predetermined features.
  • the object recognition device 10 has a first learning unit 105 that learns image conversion parameters for each predetermined feature, and the image conversion unit 102 uses an image conversion parameter that is a learning result of the first learning unit 105.
  • the sensor image is converted into an image using the image.
  • the output unit 104 can obtain the evaluation result of the image conversion parameter which is the learning result for each predetermined feature. Therefore, it is possible to grasp what kind of characteristics the image has to be converted into an image so that the recognition performance can be improved.
  • the image conversion unit 102 performs image conversion in a plurality of stages to convert the sensor image into an image after conversion, and the first learning unit 105 is used for each of the plurality of image conversion stages. Learn each of the types of image conversion parameters. By performing the image conversion step by step, it becomes possible to improve the convergence of learning, and it is possible to improve the recognition performance.
  • the image conversion unit 102 can convert the sensor image into a plurality of component images and then synthesize the plurality of component images to acquire the converted image.
  • the first learning unit 105 learns a plurality of types of image conversion parameters for converting the sensor image into each of the plurality of component images.
  • the object recognition device 10 has an image conversion parameter determination unit 107 that determines the image conversion parameter used by the image conversion unit 102 based on the evaluation result of the evaluation unit 108 when each of the plurality of image conversion parameters is used. ..
  • an image conversion parameter determination unit 107 that determines the image conversion parameter used by the image conversion unit 102 based on the evaluation result of the evaluation unit 108 when each of the plurality of image conversion parameters is used. ..
  • the object recognition device 10 has an input receiving unit 109 that receives input of evaluation parameters, which are parameters used by the evaluation unit 108 to evaluate image conversion parameters.
  • the evaluation unit 108 evaluates the image conversion parameter using the evaluation parameter received by the input reception unit 109.
  • the evaluation parameter is, for example, a weighting coefficient for changing the influence of each of the plurality of elements affecting the magnitude of the evaluation value on the evaluation value.
  • the recognition result output by the recognition unit 103 of the object recognition device 10 includes at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103.
  • the evaluation unit 108 calculates the evaluation value of the image conversion parameter based on at least one of the recognition processing time of the recognition unit 103 and the number of target objects recognized by the recognition unit 103. become.
  • the number n r of the target object recognition unit 103 recognizes, by using the number N r of the real object, can be calculated recognition accuracy p r. Therefore, the object recognition device 10 can evaluate the image conversion parameters in consideration of the recognition processing time, the recognition accuracy pr, and the like.
  • FIG. 8 is a diagram showing a functional configuration of the object recognition device 20 according to the second embodiment.
  • the object recognition device 20 evaluates the image acquisition unit 101, the image conversion unit 120, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, and a robot 110. Since the object recognition device 20 includes the robot 110 and has a function of picking an object, it can also be called an object extraction device. Since the object recognition device 20 includes the robot 110, the image conversion parameters can be evaluated based on the operation result of the robot 110.
  • the object recognition device 20 has a robot 110 in addition to the functional configuration of the object recognition device 10 according to the first embodiment.
  • the same functional configuration as that of the first embodiment will be omitted in detail by using the same reference numerals as those of the first embodiment, and the parts different from the first embodiment will be mainly described.
  • the output unit 104 outputs the recognition result of the recognition unit 103 to the robot 110.
  • the robot 110 grips the target object based on the recognition result output by the output unit 104.
  • the robot 110 outputs the operation result of the operation of gripping the target object to the evaluation unit 108.
  • the evaluation unit 108 evaluates the image conversion parameter based on the operation result of the robot 110 in addition to the recognition result of the recognition unit 103.
  • the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in gripping the target object, the gripping operation time, and the cause of the grip failure.
  • the robot 110 has a tool capable of grasping an object and performing an object operation necessary for executing a task.
  • a suction pad can be used as a tool.
  • the tool may be a gripper hand that grips the target object by sandwiching it with two claws.
  • the condition for determining that the robot 110 has successfully grasped the target object is that, for example, when the tool is a gripper hand, the opening width when the gripper hand is inserted into the target object and the gripper hand is closed is predetermined. It can be within the specified range.
  • the condition for determining that the robot 110 succeeds in gripping the target object is the target at the transport destination. It may be assumed that the target object can be held immediately before the gripper hand is released from the object.
  • the conditions for determining that the robot 110 has succeeded in grasping the target object are not limited to the above examples, and can be appropriately defined depending on the type of tool possessed by the robot 110, the work content to be performed by the robot 110, and the like.
  • Whether or not the target object can be held can be determined by using the detection result, for example, when the tool being used is equipped with a function of detecting the holding state of the target object. Alternatively, it may be determined whether or not the target object can be held by using the information of an external sensor such as a camera. For example, when the tool possessed by the robot 110 is an electric hand, there is a product having a function of determining whether or not the target object can be held by measuring the current value when operating the electric hand.
  • the image of the tool when the target object is not grasped can be stored in advance, the difference from the image taken by the tool after the gripping operation can be taken, and the target object can be held based on the difference. There is a way to determine if it is.
  • the operation result of the robot 110 can also include the gripping operation time.
  • the gripping operation time can be the time from closing the gripper hand to opening the gripper hand at the transport destination.
  • Causes of grip failure of the robot 110 include, for example, failure to grip, dropping during transportation, and multiple grips.
  • the evaluation unit 108 evaluates the image conversion parameter based on the cause of failure, so that the image conversion unit 102 can reduce the specific cause of failure. It is possible to use conversion parameters. For example, even if the target object fails to be gripped in the supply box that stores the target object before supply, the target object is likely to fall into the supply box and the gripping operation may be performed again, so the risk is high. low. On the other hand, if the target object is dropped during transportation, the target object may fall and be scattered around, and complicated control of the robot 110 may be required to return to the original state.
  • the image conversion unit 102 has the target object in the surroundings. It is possible to use image conversion parameters with less risk of scattering.
  • FIG. 9 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 before the start of operation.
  • the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 6, and detailed description thereof will be omitted.
  • the parts different from FIG. 6 will be mainly described.
  • step S121 to step S126 The operation from step S121 to step S126 is the same as in FIG.
  • the robot 110 performs picking based on the recognition result (step S201).
  • the robot 110 outputs the picking operation result to the evaluation unit 108.
  • the evaluation unit 108 calculates an evaluation value based on the operation result of the robot 110 in addition to the recognition result (step S202). Specifically, the evaluation unit 108 can calculate the evaluation value E c by using, for example, the following mathematical formula (8).
  • Equation (8) p g denotes the gripping success rate, t g represents the gripping time, p r represents the recognition accuracy, t r represents the recognition processing time, n f1, f2 ... gripping failure cause Indicates the type. Further, w pg , w tg , w pr , w tr , w f1, f2 ... Indicates a weighting coefficient.
  • the evaluation parameters received by the input receiving unit 109 include weighting coefficients w pg , w tg , w pr , w tr , w f1, f2, and so on.
  • the above method for calculating the evaluation value E c is an example, and the method for calculating the evaluation value E c used by the evaluation unit 108 is not limited to the above method.
  • steps S128 and S129 are the same as those in FIG. That is, in the process shown in FIG. 9, the point that the picking process is additionally performed between the recognition process and the process of calculating the evaluation value and the specific contents of the process of calculating the evaluation value are shown in FIG. Different from processing.
  • FIG. 10 is a flowchart for explaining the processing performed by the object recognition device 20 shown in FIG. 8 during operation.
  • the same parts as those of the object recognition device 10 are designated by the same reference numerals as those in FIG. 7, and detailed description thereof will be omitted.
  • the parts different from those in FIG. 7 will be mainly described.
  • the object recognition device 20 determines that the target object exists as a result of the recognition process, the object recognition device 20 outputs the recognition result, whereas the object recognition device 20 outputs the recognition result by the robot 110 instead of the recognition result output.
  • Picking is performed based on (step S203). After the robot 110 picks, the object recognition device 20 repeats the process from step S131.
  • the recognition unit 103 recognizes the state of the target object based on the converted image, but the recognition unit 103 of the object recognition device 20 having the robot 110 uses the hand model of the robot 110.
  • the state of the target object may be recognized using a search-based method of searching for a location where the target object can be gripped.
  • the recognition result is the position / orientation information of the target object, it is desirable that the position / orientation information of the target object can be converted into the position / attitude information of the robot 110 when the robot 110 grips the target object.
  • the object recognition device 20 further includes a robot 110 that grips the target object based on the recognition result of the recognition unit 103.
  • the evaluation unit 108 of the object recognition device 20 evaluates the image conversion parameters based on the operation result of the robot 110.
  • the object recognition device 20 can select an image conversion parameter that can improve the gripping performance, and can improve the gripping success rate of the robot 110.
  • the operation result of the robot 110 includes at least one of the probability that the robot 110 succeeds in grasping the target object, the gripping operation time, and the cause of the grip failure.
  • the image conversion parameter is evaluated based on the gripping success rate, so the image conversion parameter that can improve the gripping success rate is selected. This makes it possible to improve the gripping success rate of the robot 110.
  • the gripping operation time is included in the operation result, the image conversion parameter is evaluated based on the gripping operation time, so that the gripping operation time can be shortened.
  • the cause of grip failure is included in the operation result, the image conversion parameter is evaluated based on the cause of grip failure, so that it is possible to reduce a specific cause of grip failure.
  • FIG. 11 is a diagram showing a functional configuration of the object recognition device 30 according to the third embodiment.
  • the object recognition device 30 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like. It has a unit 108, an input receiving unit 109, a robot 110, a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115.
  • the simulation unit 111 has a first generation unit 112 and a second generation unit 113.
  • the object recognition device 30 includes a simulation unit 111, an image conversion data set generation unit 114, and an image conversion data set selection unit 115, in addition to the configuration of the object recognition device 20 according to the second embodiment.
  • a simulation unit 111 an image conversion data set generation unit 114, and an image conversion data set selection unit 115, in addition to the configuration of the object recognition device 20 according to the second embodiment.
  • the same functional configuration as in the second embodiment will be described in detail by using the same reference numerals as those in the second embodiment, and the parts different from the second embodiment will be mainly described.
  • the simulation unit 111 creates a target image using the simulation. Specifically, the simulation unit 111 generates a target image by arranging the first generation unit 112 that generates arrangement information indicating the arrangement state of the target object based on the simulation conditions and the target object based on the arrangement information. It has a second generation unit 113.
  • the simulation conditions used by the first generation unit 112 include, for example, sensor information, target object information, and environmental information. It is desirable that the sensor information includes information such as the focal length, angle of view, and aperture value of the sensor that acquires the sensor image, which changes the state in the space generated by the values. Further, when the sensor performs stereo measurement, the sensor information may include a convergence angle, a baseline length, and the like.
  • the target object information is a CAD model of the target object, information indicating the material of the target object, and the like.
  • the target object information may include the texture information of each surface of the target object. It is desirable that the target object information includes information to the extent that the state of the target object in the space is uniquely determined when the target object is placed in the space by using simulation.
  • Environmental information can include measurement distance, measurement depth, position / orientation of an object other than the target object, type and intensity of ambient light, and the like.
  • Objects other than the target object are, for example, a box, a measuring table, and the like.
  • the simulation unit 111 can perform the simulation under detailed conditions and can generate various types of target images.
  • the arrangement information generated by the first generation unit 112 indicates the arrangement state of at least one target object.
  • the plurality of target objects may be arranged in an aligned manner or may be in a bulk state.
  • the processing time can be shortened by arranging the target objects at the calculated simple model positions after performing the simulation using the simple model of the target objects.
  • the target image generated by the second generation unit 113 may be an RGB image or a distance image.
  • RGB image it is desirable to set the color or texture of the target object and objects other than the target object.
  • the simulation unit 111 stores the generated target image in the storage unit 106. Further, the simulation unit 111 may store the simulation conditions used when the first generation unit 112 generates the arrangement information and the arrangement information generated by the first generation unit 112 in the storage unit 106. At this time, it is desirable that the simulation unit 111 stores the arrangement information in association with the target image constituting the image conversion data set.
  • the image conversion data set generation unit 114 generates an image conversion data set including the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111.
  • the image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106.
  • the image conversion dataset includes one or more sensor images and one or more target images. There is no limit to the number of images of the sensor image and the target image. If the number of images is too small, the learning of the image conversion parameters may not converge, and if the number of images is too large, the learning time may become long. Therefore, it is preferable to determine the number of images according to the intended use of the user, the installation status of the sensor, and the like. Further, the number of images of the target image and the number of images of the sensor image are preferably about the same, but there may be a bias.
  • the image conversion data set selection unit 115 selects the image conversion data set used for learning by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 based on the sensor image. Specifically, the image conversion data set selection unit 115, based on the sensor image, and calculates the selected evaluation value E p as a reference in selecting the converted image data sets, based on the calculated selected evaluation value E p And select the image conversion dataset. For example, the image conversion data set selection unit 115 can select only the image conversion data set whose selection evaluation value E p is equal to or less than a predetermined threshold value. The image conversion data set selection unit 115 can select one or a plurality of image conversion data sets.
  • the image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105.
  • the first learning unit 105 learns the image conversion parameters using the image conversion data set selected by the image conversion data set selection unit 115. Therefore, the first learning unit 105 learns the image conversion parameters using the target image generated by the simulation unit 111.
  • the selective evaluation value E p is calculated using, for example, the following mathematical formula (9).
  • I t represents the sensor image
  • II s represents the target image groups constituting the converted image data set
  • N s denotes the number of images of the target image included in the target image group.
  • F I (I) refers to any function for calculating the scalar values from the image I.
  • F I (I) is, for example, average value calculation function of the image, the edge number calculation function, and the like.
  • the image conversion data set selection unit 115 uses the following formula (10) to select and evaluate the evaluation value. E p may be calculated.
  • l s indicates the measurement distance of the sensor that acquires the sensor image
  • l t indicates the measurement distance of the target image constituting the target image group
  • w I and w l indicate the weighting coefficient. If the measurement distance of the sensor is not exactly known, an approximate distance may be used.
  • the method for calculating the selective evaluation value E p is an example, and is not limited to the above method.
  • FIG. 12 is a flowchart for explaining the operation of the simulation unit 111 shown in FIG.
  • the first generation unit 112 of the simulation unit 111 acquires the simulation conditions (step S301).
  • the simulation conditions are acquired from, for example, a storage area provided in the simulation unit 111.
  • the first generation unit 112 generates placement information indicating the placement state of the target object based on the simulation conditions (step S302).
  • the first generation unit 112 outputs the generated arrangement information to the second generation unit 113 of the simulation unit 111.
  • the second generation unit 113 arranges the target object based on the arrangement information generated by the first generation unit 112 and generates a target image (step S303).
  • the second generation unit 113 outputs the generated target image and stores it in the storage unit 106 (step S304).
  • FIG. 13 is a flowchart for explaining the process performed by the object recognition device 30 shown in FIG. 11 before the start of operation.
  • the same parts as those of the object recognition device 10 or the object recognition device 20 are designated by the same reference numerals as those in FIG. 6 or 9, and detailed description thereof will be omitted.
  • the parts different from those in FIG. 6 or 9 will be mainly described.
  • the simulation unit 111 of the object recognition device 30 first performs a simulation process (step S311).
  • the simulation process of step S311 is the process shown in steps S301 to S304 of FIG.
  • the image conversion data set generation unit 114 generates an image conversion data set using the sensor image acquired by the image acquisition unit 101 and the target image generated by the simulation unit 111 (step S312).
  • the image conversion data set generation unit 114 stores the generated image conversion data set in the storage unit 106.
  • the image conversion data set selection unit 115 selects the image conversion data set used by the first learning unit 105 from the image conversion data sets stored in the storage unit 106 (step S313).
  • the image conversion data set selection unit 115 outputs the selected image conversion data set to the first learning unit 105.
  • step S121 the image conversion parameter learning process will be executed using the image conversion data set selected in step S313.
  • the object recognition device 30 creates a target image using simulation, and learns image conversion parameters using the created target image. Further, the object recognition device 30 generates an image conversion data set including a target image created by using simulation and a sensor image acquired by the image acquisition unit 101, and uses the generated image conversion data set to perform image conversion. Learn the parameters. Having such a configuration makes it possible to easily generate a target image and an image conversion data set necessary for learning image conversion parameters. Further, the target image is generated based on the simulation conditions, and is generated based on the arrangement information indicating the arrangement state of the target object. Therefore, by adjusting the simulation conditions, it is possible to generate various target images.
  • the object recognition device 30 selects an image conversion data set to be used by the first learning unit 105 from the image conversion data sets generated by the image conversion data set generation unit 114 based on the sensor image. It has a part 115. By having such a configuration, it becomes possible to learn the image conversion parameters only for the image conversion data set suitable for the surrounding environment, and it is possible to improve the learning efficiency.
  • FIG. 14 is a diagram showing a functional configuration of the object recognition device 40 according to the fourth embodiment.
  • the object recognition device 40 evaluates the image acquisition unit 101, the image conversion unit 102, the recognition unit 103, the output unit 104, the first learning unit 105, the storage unit 106, the image conversion parameter determination unit 107, and the like.
  • the object recognition device 40 has a recognition data set generation unit 116, a second learning unit 117, and a recognition parameter determination unit 118, in addition to the configuration of the object recognition device 30 according to the third embodiment.
  • a recognition data set generation unit 116 a recognition data set generation unit 116, a second learning unit 117, and a recognition parameter determination unit 118, in addition to the configuration of the object recognition device 30 according to the third embodiment.
  • the same functional configuration as in the third embodiment will be described in detail by using the same reference numerals as those in the third embodiment, and the parts different from the third embodiment will be mainly described.
  • the recognition data set generation unit 116 generates annotation data to be used when the recognition unit 103 performs recognition processing based on the recognition method used by the recognition unit 103, and generates a recognition data set including the generated annotation data and a target image. Generate.
  • the recognition data set generation unit 116 stores the generated recognition data set in the storage unit 106.
  • the annotation data differs depending on the recognition method used by the recognition unit 103. For example, when the recognition method is a neural network that outputs the position and size of the target object on the image, the annotation data is the position and size of the target object on the image.
  • the second learning unit 117 learns the recognition parameter, which is a parameter used by the recognition unit 103, based on the recognition data set generated by the recognition data set generation unit 116.
  • the second learning unit 117 can be realized, for example, by the same configuration as the first learning unit 105 shown in FIG.
  • the second learning unit 117 includes a state observation unit 11 and a machine learning unit 12.
  • the machine learning unit 12 includes a reward calculation unit 121 and a function update unit 122.
  • the example shown in FIG. 3 is an example of performing machine learning using reinforcement learning, but the second learning unit 117 uses other known methods such as neural networks, genetic programming, and functional logic programming. Machine learning may be performed according to a support vector machine or the like.
  • the second learning unit 117 stores the learning result of the recognition parameter in the storage unit 106.
  • the recognition parameter includes, for example, when the recognition method uses a neural network, the recognition parameter includes a weighting coefficient between each unit constituting the neural network.
  • the recognition parameter determination unit 118 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used.
  • the recognition parameter determination unit 118 outputs the determined recognition parameter to the recognition unit 103.
  • the recognition parameter determination unit 118 can, for example, set the recognition parameter having the largest evaluation value as the recognition parameter used by the recognition unit 103. Further, when the output unit 104 outputs the evaluation result of the evaluation unit 108 for each recognition parameter and the input reception unit 109 accepts the input for selecting the recognition parameter, the recognition parameter determination unit 118 recognizes the recognition parameter selected by the user. It can also be output to unit 103. Further, since it is considered that the evaluation value of the recognition parameter changes depending on the image conversion parameter, a plurality of evaluation values may be calculated by changing the image conversion parameter used by the image conversion unit 102 for one learned recognition parameter. .. In this case, the image conversion parameter determination unit 107 can determine the image conversion parameter based on the combination of the calculated evaluation value and the image conversion parameter.
  • FIG. 15 is a flowchart for explaining the processing performed by the object recognition device 40 shown in FIG. 14 before the start of operation.
  • the same parts as those of the object recognition device 30 are designated by the same reference numerals as those in FIG. 13, and detailed description thereof will be omitted.
  • the parts different from FIG. 13 will be mainly described.
  • the object recognition device 40 After performing the simulation process of step S311, the object recognition device 40 generates a recognition data set in parallel with the process of steps S312, S313, and S121 (step S401), and uses the generated recognition data set to generate recognition parameters.
  • the recognition parameter learning process for learning is performed (step S402).
  • the object recognition device 40 selects the image conversion parameter and the recognition parameter after the processing of steps S122 and S123 (step S403).
  • the processing of steps S125, S126, S201, and S202 is the same as that of the object recognition device 30.
  • the image conversion unit 102 of the object recognition device 40 determines whether or not the evaluation value of all the image conversion parameters and the combination of the recognition parameters has been calculated (step S404).
  • the object recognition device 40 performs the process of step S129 and determines the recognition parameters (step S405).
  • the object recognition device 40 returns to the process of step S403.
  • the object recognition device 40 As described above, the object recognition device 40 according to the fourth embodiment generates annotation data used by the recognition unit 103 based on the recognition method used by the recognition unit 103, and generates the generated annotation data and the target image. Learn the recognition parameters using the included recognition data set. With such a configuration, the object recognition device 40 can easily generate recognition data sets of various situations.
  • the object recognition device 40 determines the recognition parameter used by the recognition unit 103 based on the evaluation result of the evaluation unit 108 when each of the plurality of recognition parameters is used.
  • the object recognition device 40 can perform recognition processing using recognition parameters suitable for the target object, the surrounding environment, and the like, and can improve the recognition success rate and the gripping success rate. become.
  • Each component of the object recognition device 10, 20, 30, and 40 is realized by a processing circuit.
  • processing circuits may be realized by dedicated hardware, or may be control circuits using a CPU (Central Processing Unit).
  • FIG. 16 is a diagram showing dedicated hardware for realizing the functions of the object recognition devices 10, 20, 30, and 40 according to the first to fourth embodiments.
  • the processing circuit 90 is a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a combination thereof.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • FIG. 17 is a diagram showing a configuration of a control circuit 91 for realizing the functions of the object recognition devices 10, 20, 30, and 40 according to the first to fourth embodiments.
  • the control circuit 91 includes a processor 92 and a memory 93.
  • the processor 92 is a CPU, and is also called a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), or the like.
  • the memory 93 is, for example, a non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable ROM), and EPROM (registered trademark) (Electrically EPROM). Magnetic discs, flexible discs, optical discs, compact discs, mini discs, DVDs (Digital Versatile Disks), etc.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • flash memory EPROM (Erasable Programmable ROM), and EPROM (registered trademark) (Electrically EPROM).
  • Magnetic discs flexible discs, optical discs, compact discs, mini discs, DVDs (Digital Versatile Disks), etc.
  • the control circuit 91 When the above processing circuit is realized by the control circuit 91, it is realized by the processor 92 reading and executing the program corresponding to the processing of each component stored in the memory 93.
  • the memory 93 is also used as a temporary memory in each process executed by the processor 92.
  • the computer program executed by the processor 92 may be provided via a communication network, or may be provided in a state of being stored in a storage medium.
  • the configuration shown in the above embodiments is an example, and can be combined with another known technique, can be combined with each other, and does not deviate from the gist. It is also possible to omit or change a part of the configuration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Dispositif de reconnaissance d'objet (10) caractérisé en ce qu'il comprend : une unité d'acquisition d'image (101) qui acquiert des images d'un objet cible ; une unité de conversion d'image (102) qui utilise des paramètres de conversion d'image pour soumettre des images de capteur, qui sont les images acquises par l'unité d'acquisition d'image (101), à une conversion d'image et délivre les images converties ; une unité de reconnaissance (103) qui reconnaît l'état de l'objet cible sur la base des images converties ; une unité d'évaluation (108) qui, sur la base des résultats de reconnaissance de l'unité de reconnaissance (103), évalue les paramètres de conversion d'image utilisés pour générer les images converties ; et une unité de sortie (104) qui délivre les résultats de reconnaissance et les résultats d'évaluation de l'unité d'évaluation (108).
PCT/JP2020/002577 2020-01-24 2020-01-24 Dispositif de reconnaissance d'objet et procédé de reconnaissance d'objet WO2021149251A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202080092120.2A CN114981837A (zh) 2020-01-24 2020-01-24 物体识别装置及物体识别方法
JP2021572241A JP7361800B2 (ja) 2020-01-24 2020-01-24 物体認識装置および物体認識方法
PCT/JP2020/002577 WO2021149251A1 (fr) 2020-01-24 2020-01-24 Dispositif de reconnaissance d'objet et procédé de reconnaissance d'objet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/002577 WO2021149251A1 (fr) 2020-01-24 2020-01-24 Dispositif de reconnaissance d'objet et procédé de reconnaissance d'objet

Publications (1)

Publication Number Publication Date
WO2021149251A1 true WO2021149251A1 (fr) 2021-07-29

Family

ID=76993210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/002577 WO2021149251A1 (fr) 2020-01-24 2020-01-24 Dispositif de reconnaissance d'objet et procédé de reconnaissance d'objet

Country Status (3)

Country Link
JP (1) JP7361800B2 (fr)
CN (1) CN114981837A (fr)
WO (1) WO2021149251A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060466A (ja) * 2016-10-07 2018-04-12 パナソニックIpマネジメント株式会社 画像処理装置、検知装置、学習装置、画像処理方法、および画像処理プログラム
WO2019064599A1 (fr) * 2017-09-29 2019-04-04 日本電気株式会社 Dispositif et procédé de détection d'anomalie, et support d'enregistrement lisible par ordinateur

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060466A (ja) * 2016-10-07 2018-04-12 パナソニックIpマネジメント株式会社 画像処理装置、検知装置、学習装置、画像処理方法、および画像処理プログラム
WO2019064599A1 (fr) * 2017-09-29 2019-04-04 日本電気株式会社 Dispositif et procédé de détection d'anomalie, et support d'enregistrement lisible par ordinateur

Also Published As

Publication number Publication date
CN114981837A (zh) 2022-08-30
JP7361800B2 (ja) 2023-10-16
JPWO2021149251A1 (fr) 2021-07-29

Similar Documents

Publication Publication Date Title
CN109934115B (zh) 人脸识别模型的构建方法、人脸识别方法及电子设备
Johns et al. Deep learning a grasp function for grasping under gripper pose uncertainty
US11325252B2 (en) Action prediction networks for robotic grasping
Sadeghi et al. Sim2real viewpoint invariant visual servoing by recurrent control
CN107273936B (zh) 一种gan图像处理方法及系统
CN108108764B (zh) 一种基于随机森林的视觉slam回环检测方法
KR20210104777A (ko) 딥 러닝을 이용한 비 유클리드 3d 데이터 세트의 자동 의미론적 분할
JP5160235B2 (ja) 画像中の物体の検出及び追跡
KR101919831B1 (ko) 오브젝트 인식 장치, 분류 트리 학습 장치 및 그 동작 방법
CN108780508A (zh) 用于归一化图像的系统和方法
JP2021522591A (ja) 三次元実物体を実物体の二次元のスプーフと区別するための方法
CN111819568A (zh) 人脸旋转图像的生成方法及装置
CN113370217B (zh) 基于深度学习的物体姿态识别和抓取的智能机器人的方法
JP6675691B1 (ja) 学習用データ生成方法、プログラム、学習用データ生成装置、および、推論処理方法
CN110463376B (zh) 一种插机方法及插机设备
CN105426901A (zh) 用于对摄像头视野中的已知物体进行分类的方法
CN115816460B (zh) 一种基于深度学习目标检测与图像分割的机械手抓取方法
CN114387513A (zh) 机器人抓取方法、装置、电子设备及存储介质
CN116343012B (zh) 基于深度马尔可夫模型的全景图像扫视路径预测方法
CN110705564B (zh) 图像识别的方法和装置
CN115761905A (zh) 一种基于骨骼关节点的潜水员动作识别方法
Liu et al. Robotic picking in dense clutter via domain invariant learning from synthetic dense cluttered rendering
WO2021149251A1 (fr) Dispositif de reconnaissance d'objet et procédé de reconnaissance d'objet
CN116984269A (zh) 一种基于图像识别的煤矸石抓取方法及系统
Makihara et al. Grasp pose detection for deformable daily items by pix2stiffness estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915796

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021572241

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915796

Country of ref document: EP

Kind code of ref document: A1