WO2022118442A1 - Parallax learning device, merge data generation device, parallax learning method, merge data generation method, and parallax learning program - Google Patents

Parallax learning device, merge data generation device, parallax learning method, merge data generation method, and parallax learning program Download PDF

Info

Publication number
WO2022118442A1
WO2022118442A1 PCT/JP2020/045106 JP2020045106W WO2022118442A1 WO 2022118442 A1 WO2022118442 A1 WO 2022118442A1 JP 2020045106 W JP2020045106 W JP 2020045106W WO 2022118442 A1 WO2022118442 A1 WO 2022118442A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
learning
parallax
label
patch
Prior art date
Application number
PCT/JP2020/045106
Other languages
French (fr)
Japanese (ja)
Inventor
智彦 長田
慎吾 安藤
潤 島村
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/045106 priority Critical patent/WO2022118442A1/en
Publication of WO2022118442A1 publication Critical patent/WO2022118442A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images

Definitions

  • the disclosed techniques relate to a parallax learning device, a fusion data generator, a parallax learning method, a fusion data generation method, and a parallax learning program.
  • Non-Patent Document 1 there is a stereo matching technique that estimates the depth after obtaining the parallax amount between two images acquired by using a stereo camera as a sensor (see Non-Patent Document 1 and Non-Patent Document 2).
  • Parallax generally refers to the difference in the appearance of an image caused by the difference in the appearance of two sensors.
  • the parallax estimation method refers to a method of estimating the amount of deviation between images as the parallax amount.
  • a method using deep learning has been proposed in recent years.
  • the purpose is to estimate the distance to the object by depth estimation after obtaining the parallax amount.
  • the prior art has been premised on estimating the amount of parallax between images acquired by sensors that acquire two channels of the same type. Therefore, there is a problem that it is difficult to estimate the amount of parallax because the characteristics of the images acquired by the sensors that acquire different types of channels are different.
  • the disclosed technique was made in view of the above points, and is a parallax learning device that enables accurate estimation and utilization of parallax amount even between images acquired by different sensors, fusion data. It is an object of the present invention to provide a generator, a parallax learning method, a fusion data generation method, and a parallax learning program.
  • the first aspect of the present disclosure is a parallax learning device, with respect to a correct answer label indicating the amount of deviation in the horizontal position between the first image for learning and the second image for learning and the relationship between the correct answer and the position.
  • a filter unit that outputs a filtered label to which a filter has been applied, a first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a second image for learning.
  • a model for outputting the similarity corresponding to the patch image pair based on the plurality of learning second patch images generated by shifting the image in the horizontal direction with respect to the coordinate information and the filtered label. Includes a similarity learning unit that learns parameters.
  • the second aspect of the present disclosure is a parallax learning method, with respect to a correct answer label showing the relationship between the amount of deviation in the horizontal position between the first image for learning and the second image for learning and the correct answer to the position. Then, the filtered label to which the filter is applied is output, and the first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label and the second image for learning are said to be the same. Based on a plurality of second patch images for training generated by shifting horizontally with respect to the coordinate information and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are learned. It is characterized in that the processing is executed by the computer.
  • the third aspect of the present disclosure is a parallax learning program for a correct answer label showing the relationship between the amount of displacement of the horizontal position between the first image for learning and the second image for learning and the correct answer to the position. Then, the filtered label to which the filter is applied is output, and the first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label and the second image for learning are said to be the same. Based on a plurality of training second patch images generated by shifting horizontally based on the coordinate information and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are trained. Let the computer do that.
  • FIG. 1 is a schematic diagram showing that fusion data is generated from a visible image and an infrared image in which parallax occurs.
  • the parallax estimation method can also be used to link the corresponding points of two images to generate fusion data.
  • the fusion data refers to the data obtained by fusing the channels of the images acquired by the two sensors.
  • the task assumed in the present disclosure is to generate fusion data in which corresponding points are matched from two images, a visible image (RGB) taken by a visible light camera and an infrared image taken by an infrared camera. be.
  • the infrared image is an image in which temperature data is recorded for each pixel.
  • fusion data it is possible to learn with abundant information in the field of object recognition by machine learning or the field of semantic segmentation, and improvement in performance can be expected.
  • a scene in which inspection is automated by deep learning is assumed in the application of equipment inspection in which infrared images are used. Therefore, by utilizing this fusion data, it is possible to learn with abundant information, so performance improvement can be expected.
  • FIG. 1 since parallax occurs in the visible image and the infrared image, how to estimate the parallax becomes a problem.
  • Visible images and infrared images have different properties. Visible images are used for object recognition because they have abundant appearance information such as textures. On the other hand, the infrared image has a lower resolution than the visible image, but can acquire temperature information of the object, and is used in the security field or for equipment inspection and the like. Further, although a device capable of simultaneously acquiring a visible image and an infrared image is commercially available, the positions of the visible light camera and the infrared camera are different, so that the two images have a misalignment. Due to the parallax, the position where the object is projected onto the image element shifts, so that the shift occurs on the image data. In the following, the magnitude of the parallax shift on the image data is used as the amount of parallax to be estimated.
  • FIG. 2 is a diagram showing an example of searching for corresponding points in parallel stereo. The image taken by the left camera of the stereo camera is taken as the left image, and the image taken by the right camera of the stereo camera is taken as the right image.
  • the stereo camera is not limited to the left and right, but in the following examples, the case where one left image as a reference is the first image 51 and the other right image is the second image 52 will be described as an example. As shown in FIG. 2, a corresponding point is searched for on the epipolar line 53 of the second image 52 with respect to the first image 51.
  • FIG. 3 is a diagram showing an example of a case where a patch image is created and a search is performed on an epipolar line. An N ⁇ N patch centered on the pixels of the first image is created, and this is used as a reference patch image (first patch image 54A). The central pixel of the reference first patch image is represented by (u, v).
  • the patch image is compared with the reference patch image, and the position where the two patch images overlap is obtained from the reference position.
  • the deviation amount x is obtained as the parallax amount.
  • the overlapping position is the position where the same part of the object is projected. Therefore, the pixel of the second patch image corresponding to the central pixel (u, v) of the reference first patch image is obtained as (u, v + x).
  • the visible image is 3-channel data obtained by converting visible light into RGB data
  • the infrared image is 1-channel data having temperature information in pixel units.
  • an N ⁇ N patch image is cut out centering on a certain pixel of one image.
  • a plurality of N ⁇ N patch images are cut out from the other image centering on a plurality of pixels, and the coordinates having the highest similarity of the pixel groups in the patch image are calculated to obtain the parallax amount.
  • FIG. 4 is a diagram showing an example of the relationship between the value of the correct label of parallax and the amount of deviation. In the graph of FIG.
  • the vertical axis represents a value indicating the correct label
  • the horizontal axis represents a value indicating the deviation amount x at the coordinates on the epipolar line 53.
  • the correct label is represented by a two-dimensional graph of these values.
  • the deviation amount x corresponds to the horizontal position between the first image for learning and the second image for learning.
  • the correct answer label is represented by a graph showing the relationship between the correct answer or the incorrect answer value with respect to the position of the deviation amount x. Assuming that the amount of deviation on the epipolar line is x and the amount of parallax is d, the correct label f (x) is given by the following equation (1). ... (1)
  • the correct label is f (x)
  • f (x) (0,1).
  • the amount of deviation x at the position where the images overlap is the amount of deviation of the correct answer, and the amount of parallax d of the correct answer.
  • the correct answer label f (x) is represented as a graph showing the relationship between the deviation amount x and the value of the correct answer label.
  • the filtered label obtained by filtering the correct label in the above graph is used as the input for deep learning to deal with noisy data.
  • Applying a filter to the correct label means applying a filter to the values of the correct data plotted on the graph.
  • the filter By applying the filter, the value of the correct answer data expressed by the binary value will be expressed as a distribution.
  • a LoG Laplacian of Gaussian
  • the LoG filter is a filter that extracts edges by applying a Laplacian filter after removing noise.
  • FIG. 5 is a diagram showing an example of the relationship between the value of the correct label of the parallax and the deviation amount x when the LoG filter is applied. As shown in FIG. 5, the distribution is formed with the position of the correct answer as the apex in the correct answer label after the filter.
  • the filter is not limited to the LoG filter, and a filter capable of converting correct or incorrect values into a distribution, such as a Gaussian filter, may be applied.
  • the parallax amount can be obtained by applying an inverse filter to the output of the similarity and obtaining the peak.
  • FIG. 7 is a block diagram showing the hardware configuration of the parallax learning device 100 and the fusion data generation device 200. Since the parallax learning device 100 and the fusion data generation device 200 have the same hardware configuration, the parallax learning device 100 will be described below.
  • the parallax learning device 100 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a storage 14, an input unit 15, a display unit 16, and a communication interface. It has (I / F) 17.
  • the configurations are connected to each other via a bus 19 so as to be communicable with each other.
  • the CPU 11 is a central arithmetic processing unit that executes various programs and controls each part. That is, the CPU 11 reads the program from the ROM 12 or the storage 14, and executes the program using the RAM 13 as a work area. The CPU 11 controls each of the above configurations and performs various arithmetic processes according to the program stored in the ROM 12 or the storage 14. In the present embodiment, the parallax learning program is stored in the ROM 12 or the storage 14.
  • the ROM 12 stores various programs and various data.
  • the RAM 13 temporarily stores a program or data as a work area.
  • the storage 14 is composed of a storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various programs including an operating system and various data.
  • the input unit 15 includes a pointing device such as a mouse and a keyboard, and is used for performing various inputs.
  • the display unit 16 is, for example, a liquid crystal display and displays various information.
  • the display unit 16 may adopt a touch panel method and function as an input unit 15.
  • the communication interface 17 is an interface for communicating with other devices such as terminals.
  • a wired communication standard such as Ethernet (registered trademark) or FDDI, or a wireless communication standard such as 4G, 5G, or Wi-Fi (registered trademark) is used.
  • the fusion data generation device 200 also has a CPU 21, a ROM 22, a RAM 23, a storage 24, an input unit 25, a display unit 26, and a communication I / F 27.
  • the configurations are connected to each other via a bus 29 so as to be communicable with each other.
  • the fusion data generation program is stored in the ROM 22 or the storage 24. Since the description of each part of the hardware configuration is the same as that of the parallax learning device 100, the description thereof will be omitted.
  • FIG. 8 is a block diagram showing the configuration of the parallax learning device 100 of the present embodiment.
  • Each functional configuration is realized by the CPU 11 reading the parallax learning program stored in the ROM 12 or the storage 14, expanding it into the RAM 13, and executing the program.
  • the parallax learning device 100 has learning processing units for processing the first image and the second image corresponding to the two cameras.
  • the learning processing units include a first image input unit 101, a first image preprocessing unit 102, a first patch image generation unit 103, a second image input unit 104, and a second image preprocessing unit 105.
  • the second patch image generation unit 106 are included.
  • the parallax learning device 100 includes a label input unit 107, a filter unit 108, a similarity learning unit 109, and a model storage unit 110.
  • the first image input unit 101 and the second image input unit 104 have the configuration of the stereo camera shown in FIG.
  • the first image input unit 101 is the left camera
  • the second image input unit 104 is the right camera. .. Both left and right inputs can be selected for the visible image and the infrared image.
  • each processing unit for learning may be used as an external device, and the parallax learning device 100 may receive the output.
  • the first image input unit 101 inputs the image taken by the left camera as digital data, and outputs the first image for learning to the first image preprocessing unit 102.
  • the first image preprocessing unit 102 receives a first image for learning from the first image input unit 101, performs preprocessing such as contour extraction, parallelization, and distortion correction on the first image, and preprocessed the first image. 1 The image is output to the first patch image generation unit 103.
  • Coordinates (u, v) indicating the correct value of the correct label are input from the label input unit 107 to the first patch image generation unit 103 and the second patch image generation unit 106.
  • the first patch image generation unit 103 N ⁇ N pixels centered on the coordinates (u, v) of the input correct label for the preprocessed first image input from the first image preprocessing unit 102.
  • the first patch image is generated.
  • the first patch image generation unit 103 outputs the generated first patch image to the similarity learning unit 109. In this way, the first patch image is generated from the first image for learning with reference to the coordinate information of the correct label.
  • the second image input unit 104 inputs the image taken by the right camera as digital data, and outputs the second image for learning to the second image preprocessing unit 105.
  • the second image preprocessing unit 105 receives a second image for learning from the second image input unit 104, performs preprocessing such as contour extraction, parallelization, and distortion correction, and uses the preprocessed second image as the second image. 2 Output to the patch image generation unit 106.
  • the preprocessed second image input from the second image preprocessing unit 105 is d min to d max based on the coordinates (u, v) of the input correct label.
  • a plurality of second patch images in which N ⁇ N pixels are cut out are generated for the range of.
  • the range for generating the second patch image from the reference may be d min to d max , and the range that can be the parallax amount d from the information of the stereo camera or the like may be predetermined.
  • a plurality of second patch images are generated by shifting the center of the patch one pixel at a time in the horizontal direction along the epipolar line and cutting out N ⁇ N pixels.
  • the second patch image generation unit 106 outputs the generated second patch image to the similarity learning unit 109. In this way, a plurality of second patch images are generated by shifting horizontally from the second image for learning with reference to the coordinate information of the correct answer label.
  • the label input unit 107 is provided with the coordinate information of the deviation amount x and the parallax amount d for each position as inputs.
  • the correct label for parallax is input as f (x) shown in Eq. (1).
  • the label input unit 107 outputs the coordinates (u, v) of the correct answer value of the correct answer label to the first patch image generation unit 103 and the second patch image generation unit 106. Further, the label input unit 107 outputs the correct parallax label to the filter unit 108.
  • the filter unit 108 applies a filter to the correct answer label of the parallax represented by the equation (1) input from the label input unit 107, and outputs the filtered correct answer label to which the filter is applied to the similarity learning unit 109.
  • the f filter (x) represented by the equation (2) is output as a filtered correct label.
  • the similarity learning unit 109 receives the first patch image from the first patch image generation unit 103, a plurality of second patch images from the second patch image generation unit 106, and the filtered correct label from the filter unit 108.
  • the similarity learning unit 109 learns the parameters of the model based on the first patch image, the plurality of second patch images, and the filtered correct answer label.
  • the similarity learning unit 109 stores the parameters of the learned model in the model storage unit 110.
  • the model is a model for outputting the similarity corresponding to the patch image pair in estimation.
  • a generally known method such as MC-CNN, which is one aspect of the deep learning method, can be used as a method for learning a model for estimating the similarity from two input images.
  • the model storage unit 110 stores the parameters of the model learned by the similarity learning unit 109.
  • FIG. 9 is a flowchart showing the flow of the parallax learning process by the parallax learning device 100.
  • the parallax learning process is performed by the CPU 11 reading the parallax learning program from the ROM 12 or the storage 14, expanding it into the RAM 13, and executing the program.
  • the CPU 11 executes the following processing as each part of the parallax learning device 100.
  • step S100 the CPU 11 receives the input of the first image for learning from the first image input unit 101, preprocesses the first image, and uses the preprocessed first image as the first patch image generation unit 103. Output to.
  • step S102 the CPU 11 receives the input of the second image for learning from the second image input unit 104, preprocesses the second image, and uses the preprocessed second image as the second patch image generation unit 106. Output to.
  • step S104 the CPU 11 receives the input of the correct parallax label from the label input unit 107.
  • the coordinates (u, v) of the correct answer value of the correct answer label are output to the first patch image generation unit 103 and the second patch image generation unit 106, and the correct answer label itself is output to the filter unit 108.
  • step S106 the CPU 11 generates a first patch image for learning and a plurality of second patch images for learning. The details of the patch image generation process for learning in this step will be described later.
  • step S108 the CPU 11 applies a filter to the correct answer label of the parallax, and outputs the filtered correct answer label to which the filter is applied to the similarity learning unit 109.
  • step S110 the CPU 11 learns the model parameters based on the first patch image for learning, the second patch image for learning, and the filtered correct answer label.
  • step S112 the CPU 11 stores the learned model parameters in the model storage unit 110.
  • step S130 the CPU 11 cuts out N ⁇ N pixels from the input preprocessed first image centered on the coordinates (u, v) of the input correct label, and the first patch image for learning. To generate.
  • step S134 the CPU 11 generates a second patch image for learning centered on the coordinates (u, v + x).
  • step S136 it is determined whether or not the CPU 11 has x ⁇ d max . If the condition is satisfied, the process proceeds to step S138, and if the condition is not satisfied, the process ends.
  • the parallax learning device 100 of the present embodiment it is possible to learn the parameters of the model that enables accurate estimation and utilization of the parallax amount even between images acquired by different sensors. ..
  • FIG. 11 is a block diagram showing the configuration of the fusion data generation device 200 of the present embodiment.
  • Each functional configuration is realized by the CPU 21 reading the fusion data generation program stored in the ROM 22 or the storage 24, deploying it in the RAM 23, and executing it.
  • the fusion data generation device 200 has each processing unit for estimation for processing the first image and the second image corresponding to the two cameras.
  • the estimation processing units include a first image input unit 201, a first image preprocessing unit 202, a first feature point extraction unit 203, a first patch image generation unit 204, and a second image input unit 205.
  • a second image preprocessing unit 206 and a second patch image generation unit 207 are included.
  • the fusion data generation device 200 includes a similarity calculation unit 208, a similarity aggregation unit 209, an inverse filter unit 210, a parallax calculation unit 211, a parallax interpolation unit 212, a fusion data generation unit 213, and a model storage.
  • the first image input unit 201 and the second image input unit 205 have the configuration of the stereo camera shown in FIG. 1, the first image input unit 201 is the left camera, and the second image input unit 205 is the right camera. .. Both left and right inputs can be selected for the visible image and the infrared image. Further, each processing unit for estimation may be used as an external device, and the fusion data generation device 200 may receive the output.
  • the first image input unit 201 inputs the image taken by the left camera as digital data, and outputs the first image to be estimated to the first image preprocessing unit 202.
  • the first image preprocessing unit 202 receives the first image to be estimated from the first image input unit 201, performs preprocessing such as contour extraction, parallelization, and distortion correction on the first image, and the preprocessed first image. 1
  • the image is output to the first feature point extraction unit 203.
  • the first feature point extraction unit 203 extracts each of the feature points from the input preprocessed first image, and sets the coordinates of each of the preprocessed first image and the extracted feature points into the first patch image generation unit 204. Output to. Further, the first feature point extraction unit 203 outputs the coordinates of each of the extracted feature points to the second patch image generation unit 207.
  • the first patch image generation unit 204 receives the coordinates of the preprocessed first image and the feature points from the first feature point extraction unit 203, and centers the coordinates of the feature points for each input feature point.
  • the first patch image obtained by cutting out N ⁇ N pixels is output to the similarity calculation unit 208.
  • the first patch image of the first image generated with respect to the feature points is generated.
  • the second image input unit 205 inputs the image taken by the right camera as digital data, and outputs the second image to be estimated to the second image preprocessing unit 206.
  • the second image preprocessing unit 206 receives the second image to be estimated from the second image input unit 205, performs preprocessing such as contour extraction, parallelization, and distortion correction, and uses the preprocessed second image as the second image. 2 Output to the patch image generation unit 207.
  • the second patch image generation unit 207 with respect to the preprocessed second image input from the second image preprocessing unit 206, for each feature point input, from d min based on the coordinates of the feature point.
  • a plurality of second patch images in which N ⁇ N pixels are cut out for the range of d max are generated.
  • the method of generating a plurality of images is the same as that of the parallax learning device 100, and a plurality of second patch images on the epipolar line are generated.
  • a plurality of second patch images generated from the second image with the feature points as a reference are generated.
  • the model storage unit 230 stores the parameters of the model for outputting the similarity corresponding to the patch image pair learned by the parallax learning device 100.
  • the similarity calculation unit 208 receives each of the feature points and the first patch image from the first patch image generation unit 204, and receives a plurality of second patch images from the second patch image generation unit 207.
  • the similarity calculation unit 208 processes a pair of the first patch image and each of the plurality of second patch images as a combination.
  • the similarity calculation unit 208 inputs each of the combinations into the model of the model storage unit 230 for each feature point, and outputs the similarity indicating the similarity of each combination as the output of the model using the parameters.
  • the matching cost is evaluated by the trained model for each combination with the plurality of second patch images with the first patch image as a reference, and the matching cost is calculated as the degree of similarity.
  • FIG. 6 is a diagram showing an example of the similarity before the inverse filter and the similarity after the inverse filter after the inverse filter. Each degree of similarity for each combination corresponds to each point on FIG.
  • the similarity which is the matching cost of each of the calculated combinations, is output to the similarity totaling unit 209.
  • the similarity totaling unit 209 totals the similarity of each of the output combinations for each feature point, and outputs the totaled result to the inverse filter unit 210. For each feature point, the aggregated result as shown in FIG. 6 is obtained.
  • the inverse filter unit 210 outputs to the parallax calculation unit 211 the estimation result after the inverse filter to which the inverse filter is applied to the aggregated result of the similarity of each of the output combinations for each feature point.
  • the aggregated result is converted into the frequency domain by the Fourier transform, the inverse filter is applied in the frequency domain, and the inverse transform is performed to return to the spatial domain.
  • the parallax calculation unit 211 calculates parallax information indicating the amount of parallax for each feature point from the estimated result after filtering for each feature point, and outputs the parallax information to the parallax interpolation unit 212. From the estimation result after the inverse filter for each feature point input from the inverse filter unit 210, the point to be the peak is searched, the deviation on the epipolar line at the time of the peak is obtained as the parallax amount d, and the parallax amount for each feature point is obtained. Let d be the parallax information. By the processing up to this point, the parallax amount d for each feature point in the image can be calculated.
  • the parallax interpolation unit 212 calculates the parallax interpolation information obtained by interpolating the parallax amount for each pixel from the parallax information for each feature point, and outputs it to the fusion data generation unit 213.
  • the parallax interpolation unit 212 obtains the parallax amount of the entire image based on the parallax amount d of the feature points.
  • the parallax between the feature points can be interpolated by approximating the parallax amount for each feature point by the least squares method or the like.
  • the fusion data generation unit 213 generates and outputs fusion data of the first image to be estimated and the second image to be estimated based on the parallax interpolation information.
  • the fusion data is generated by aligning the first image and the second image based on the parallax interpolation information.
  • the fusion data generation unit 213 generates fusion data of 4 channels in which the RGB3 channel of the visible image and the temperature data 1 channel of the infrared image are superimposed.
  • FIG. 12 is a flowchart showing the flow of the fusion data generation process by the fusion data generation device 200.
  • the fusion data generation process is performed by the CPU 21 reading the fusion data generation program from the ROM 22 or the storage 24, expanding it into the RAM 23, and executing the fusion data generation program.
  • the CPU 21 executes the following processing as each part of the fusion data generation device 200.
  • step S200 the CPU 21 receives the input of the first image to be estimated from the first image input unit 201, preprocesses the first image, and uses the preprocessed first image as the first feature point extraction unit 203. Output to.
  • step S202 the CPU 21 receives the input of the second image to be estimated from the second image input unit 205, preprocesses the second image, and uses the preprocessed second image as the second patch image generation unit 207. Output to.
  • step S204 the CPU 21 extracts each of the feature points from the input preprocessed first image, and outputs the coordinates of each of the preprocessed first image and the extracted feature points to the first patch image generation unit 204. do.
  • the total number of feature points extracted in step S204 is N.
  • step S208 the CPU 21 executes processing up to the calculation of parallax information for the selected feature point i.
  • the details of the feature point calculation process in this step will be described later.
  • step S210 the CPU 21 determines whether or not i ⁇ N. If the condition is satisfied, the process proceeds to step S212, and if the condition is not satisfied, the process proceeds to step S214.
  • step S214 the CPU 21 calculates parallax interpolation information obtained by interpolating the parallax amount for each pixel from the parallax information for each feature point obtained in the process of step S210.
  • step S216 the CPU 21 generates and outputs fusion data of the first image to be estimated and the second image to be estimated based on the parallax interpolation information.
  • step S208 the feature point calculation process in step S208 will be described with reference to the flowchart of FIG. The following is the processing for the selected feature point i.
  • step S230 the CPU 21 cuts out N ⁇ N pixels with respect to the input preprocessed first image centered on the coordinates (u, v) of the selected feature point i, and the first patch to be estimated. Generate an image.
  • step S234 the CPU 21 generates a second patch image to be estimated centered on the coordinates (u, v + x).
  • step S236 the CPU 21 sets the first patch image of the estimation target generated in step S230 and the second patch image of the estimation target generated in step S234 as a pair combination, and sets the pair in the model of the model storage unit 230. Enter the combination. Then, by outputting the model using the parameters, the matching cost indicating the similarity of the combination is calculated as the degree of similarity.
  • step S2308 the CPU 21 determines whether or not x ⁇ d max . If the condition is satisfied, the process proceeds to step S240, and if the condition is not satisfied, the process proceeds to step S242.
  • step S242 the CPU 21 aggregates the similarity of each of the output combinations and outputs the aggregated result to the inverse filter unit 210.
  • step S244 the CPU 21 outputs to the parallax calculation unit 211 the estimation result after the inverse filter to which the inverse filter is applied to the aggregation result of each estimation of the output combination.
  • step S246 the CPU 21 calculates the parallax information indicating the parallax amount of the feature point i selected from the estimation result after the filter, and outputs the parallax information to the parallax interpolation unit 212. As a result, parallax information for each feature point i is obtained.
  • the fusion data generation device 200 of the present embodiment it is possible to generate fusion data that enables accurate estimation and utilization of the parallax amount even between images acquired by different sensors. ..
  • processors other than the CPU may execute the parallax learning process or the fusion data generation process executed by the CPU reading the software (program) in the above embodiment.
  • a processor in this case a PLD (Programmable Logic Device) whose circuit configuration can be changed after manufacturing an FPGA (Field-Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), or the like for specifying an ASIC.
  • An example is a dedicated electric circuit or the like, which is a processor having a circuit configuration designed exclusively for it.
  • the parallax learning process or the fusion data generation process may be executed by one of these various processors, or a combination of two or more processors of the same type or different types (for example, a plurality of FPGAs and a CPU). And FPGA in combination, etc.).
  • the hardware-like structure of these various processors is, more specifically, an electric circuit in which circuit elements such as semiconductor elements are combined.
  • the parallax learning program is stored (installed) in the storage 14 in advance, but the present invention is not limited to this.
  • the program is stored in a non-temporary medium such as a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versaille Disk Online Memory), and a USB (Universal Serial Bus) memory. It may be provided in the form. Further, the program may be downloaded from an external device via a network. The same applies to the fusion data generation program.
  • a filtered label is output with a filter applied to the correct label indicating the relationship between the horizontal position deviation between the first image for learning and the second image for learning and the correct answer with respect to the position.
  • a first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a plurality of images generated by shifting horizontally from the second image for learning based on the coordinate information.
  • the parameters of the model for outputting the similarity corresponding to the patch image pair are learned.
  • a parallax learning device configured to.
  • a non-temporary storage medium that stores a program that can be executed by a computer to perform parallax learning processing.
  • a filtered label is output with a filter applied to the correct label indicating the relationship between the horizontal position deviation between the first image for learning and the second image for learning and the correct answer with respect to the position.
  • a first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a plurality of images generated by shifting horizontally from the second image for learning based on the coordinate information.
  • the parameters of the model for outputting the similarity corresponding to the patch image pair are learned.
  • Disparity learning device 101 201 1st image input unit 102, 202 1st image preprocessing unit 103, 204 1st patch image generation unit 104, 205 2nd image input unit 105, 206 2nd image preprocessing unit 106, 207 2nd patch image generation unit 107 Label input unit 108 Filter unit 109 Similarity learning unit 110, 230 Model storage unit 200 Fusion data generation device 203 1st feature point extraction unit 208 Similarity calculation unit 209 Similarity totaling unit 210 Inverse filter unit 211 Misparation calculation unit 212 Disparity interpolation unit 213 Fusion data generation unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The present invention makes it possible to accurately estimate and utilize a parallax amount, even between images acquired by different sensors. This parallax learning device outputs a filtered label obtained by applying a filter to a correct-answer label that indicates a relationship between the amount of misalignment of a horizontal-direction position between a first image for learning and a second image for learning and a correct answer to the position. The parallax learning device furthermore learns parameters for a model for outputting a degree of similarity that corresponds to a patch image pair, the learning being conducted on the basis of: a first patch image for learning that is generated from the first image for learning, with the coordinate information of the correct-answer label used as a reference; a plurality of second patch images for learning that are generated from the second image for learning by being shifted in the horizontal direction, with said coordinate information used as a reference; and the filtered label.

Description

視差学習装置、融合データ生成装置、視差学習方法、融合データ生成方法、及び視差学習プログラムParallax learning device, fusion data generator, parallax learning method, fusion data generation method, and parallax learning program
 開示の技術は、視差学習装置、融合データ生成装置、視差学習方法、融合データ生成方法、及び視差学習プログラムに関する。 The disclosed techniques relate to a parallax learning device, a fusion data generator, a parallax learning method, a fusion data generation method, and a parallax learning program.
 従来より、センサとしてステレオカメラを用いて取得した2つの画像同士の視差量を求めたのち、深度推定を行うステレオマッチングの技術がある(非特許文献1、及び非特許文献2参照)。視差とは、一般的には、2つのセンサの見え方の違いによって発生する画像の見え方の違いを指す。また、視差の推定手法とは、画像間のずれの量を視差量として推定する手法のことを指す。ステレオマッチングによる視差の推定手法の中には、近年では深層学習を用いた手法も提案されている。 Conventionally, there is a stereo matching technique that estimates the depth after obtaining the parallax amount between two images acquired by using a stereo camera as a sensor (see Non-Patent Document 1 and Non-Patent Document 2). Parallax generally refers to the difference in the appearance of an image caused by the difference in the appearance of two sensors. Further, the parallax estimation method refers to a method of estimating the amount of deviation between images as the parallax amount. Among the methods for estimating parallax by stereo matching, a method using deep learning has been proposed in recent years.
 従来技術では、視差量を求めたのち、深度推定により対象物との距離を推定することを目的とすることが一般的である。もっとも、従来技術は2つの同種のチャネルを取得するセンサで取得した画像同士の視差量を推定することを前提としていた。そのため、異なる種類のチャネルを取得するセンサによって取得した画像同士では特徴が異なるため、視差量の推定が困難である、という課題があった。 In the conventional technique, it is general that the purpose is to estimate the distance to the object by depth estimation after obtaining the parallax amount. However, the prior art has been premised on estimating the amount of parallax between images acquired by sensors that acquire two channels of the same type. Therefore, there is a problem that it is difficult to estimate the amount of parallax because the characteristics of the images acquired by the sensors that acquire different types of channels are different.
 開示の技術は、上記の点に鑑みてなされたものであり、異なるセンサによって取得した画像間であっても、精度良く視差量を推定して活用することを可能とする視差学習装置、融合データ生成装置、視差学習方法、融合データ生成方法、及び視差学習プログラムを提供することを目的とする。 The disclosed technique was made in view of the above points, and is a parallax learning device that enables accurate estimation and utilization of parallax amount even between images acquired by different sensors, fusion data. It is an object of the present invention to provide a generator, a parallax learning method, a fusion data generation method, and a parallax learning program.
 本開示の第1態様は、視差学習装置であって、学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力するフィルタ部と、前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する類似性学習部と、を含む。 The first aspect of the present disclosure is a parallax learning device, with respect to a correct answer label indicating the amount of deviation in the horizontal position between the first image for learning and the second image for learning and the relationship between the correct answer and the position. A filter unit that outputs a filtered label to which a filter has been applied, a first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a second image for learning. A model for outputting the similarity corresponding to the patch image pair based on the plurality of learning second patch images generated by shifting the image in the horizontal direction with respect to the coordinate information and the filtered label. Includes a similarity learning unit that learns parameters.
 本開示の第2態様は、視差学習方法であって、学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力し、前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する、処理をコンピュータが実行することを特徴とする。 The second aspect of the present disclosure is a parallax learning method, with respect to a correct answer label showing the relationship between the amount of deviation in the horizontal position between the first image for learning and the second image for learning and the correct answer to the position. Then, the filtered label to which the filter is applied is output, and the first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label and the second image for learning are said to be the same. Based on a plurality of second patch images for training generated by shifting horizontally with respect to the coordinate information and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are learned. It is characterized in that the processing is executed by the computer.
 本開示の第3態様は、視差学習プログラムであって、学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力し、前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する、ことをコンピュータに実行させる。 The third aspect of the present disclosure is a parallax learning program for a correct answer label showing the relationship between the amount of displacement of the horizontal position between the first image for learning and the second image for learning and the correct answer to the position. Then, the filtered label to which the filter is applied is output, and the first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label and the second image for learning are said to be the same. Based on a plurality of training second patch images generated by shifting horizontally based on the coordinate information and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are trained. Let the computer do that.
 開示の技術によれば、異なるセンサによって取得した画像間であっても、精度良く視差量を推定して活用することを可能とする。 According to the disclosed technology, it is possible to accurately estimate and utilize the amount of parallax even between images acquired by different sensors.
視差が生じる可視画像及び赤外線画像から融合データを生成することを示す概略図である。It is a schematic diagram which shows that fusion data is generated from a visible image and an infrared image which a parallax occurs. 平行ステレオにおける対応点の探索の一例を示す図である。It is a figure which shows an example of the search of the corresponding point in a parallel stereo. パッチ画像を作成してエピポーラ線上を探索する場合の一例を示す図である。It is a figure which shows an example of the case of creating a patch image and searching on an epipolar line. 視差の正解ラベルの値とずれ量との関係の一例を示す図である。It is a figure which shows an example of the relationship between the value of the correct label of parallax and the amount of deviation. LoGフィルタを適用した場合の視差の正解ラベルの値とずれ量xとの関係の一例を示す図である。It is a figure which shows an example of the relationship between the value of the correct answer label of the parallax, and the deviation amount x when the LoG filter is applied. 逆フィルタ前の類似度と、逆フィルタ後の類似度との一例を示す図である。It is a figure which shows an example of the similarity before an inverse filter and the similarity after an inverse filter. 視差学習装置及び融合データ生成装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware composition of the parallax learning apparatus and the fusion data generation apparatus. 本実施形態の視差学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the parallax learning apparatus of this embodiment. 視差学習装置による視差学習処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the parallax learning process by the parallax learning apparatus. 学習用のパッチ画像生成処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the patch image generation processing for learning. 本実施形態の融合データ生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the fusion data generation apparatus of this embodiment. 融合データ生成装置による融合データ生成処理の流れを示すフローチャートである。It is a flowchart which shows the flow of fusion data generation processing by fusion data generation apparatus. 特徴点計算処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a feature point calculation process.
 以下、開示の技術の実施形態の一例を、図面を参照しつつ説明する。なお、各図面において同一又は等価な構成要素及び部分には同一の参照符号を付与している。また、図面の寸法比率は、説明の都合上誇張されており、実際の比率とは異なる場合がある。 Hereinafter, an example of the embodiment of the disclosed technique will be described with reference to the drawings. The same reference numerals are given to the same or equivalent components and parts in each drawing. In addition, the dimensional ratios in the drawings are exaggerated for convenience of explanation and may differ from the actual ratios.
 まず、本開示の概要について説明する。図1は、視差が生じる可視画像及び赤外線画像から融合データを生成することを示す概略図である。視差の推定手法は、図1に示すように、2つの画像の対応点を紐づけて、融合データを生成することにも利用できる。融合データとは、2つのセンサによって取得した画像同士のチャネルを融合したデータを指す。本開示が想定するタスクは、可視光カメラで撮影された可視画像(RGB)と、赤外線カメラで撮影された赤外線画像との2つの画像から、対応点をマッチングさせた融合データを生成することである。赤外線画像は、画素毎に温度データを記録した画像である。融合データを用いれば、機械学習による物体認識の分野、又はセマンティックセグメンテーションの分野において、より豊富な情報で学習することができ、性能面の向上が期待できる。具体的な利用シーンの例として、赤外線画像が用いられている設備点検の用途において、深層学習による点検の自動化を行うシーンが想定される。そこで、この融合データを活用することにより豊富な情報で学習ができるため性能向上が期待できる。しかし、図1に示すように、可視画像及び赤外線画像には視差が生じるため、どのように視差を推定するかが問題となる。 First, the outline of this disclosure will be explained. FIG. 1 is a schematic diagram showing that fusion data is generated from a visible image and an infrared image in which parallax occurs. As shown in FIG. 1, the parallax estimation method can also be used to link the corresponding points of two images to generate fusion data. The fusion data refers to the data obtained by fusing the channels of the images acquired by the two sensors. The task assumed in the present disclosure is to generate fusion data in which corresponding points are matched from two images, a visible image (RGB) taken by a visible light camera and an infrared image taken by an infrared camera. be. The infrared image is an image in which temperature data is recorded for each pixel. If fusion data is used, it is possible to learn with abundant information in the field of object recognition by machine learning or the field of semantic segmentation, and improvement in performance can be expected. As an example of a specific usage scene, a scene in which inspection is automated by deep learning is assumed in the application of equipment inspection in which infrared images are used. Therefore, by utilizing this fusion data, it is possible to learn with abundant information, so performance improvement can be expected. However, as shown in FIG. 1, since parallax occurs in the visible image and the infrared image, how to estimate the parallax becomes a problem.
 可視画像及び赤外線画像にはそれぞれの異なる性質がある。可視画像は、テクスチャ等の豊富な外観情報を持つため、物体認識などに利用されている。一方、赤外線画像は可視画像に比べて解像度は低くなるが、対象物の温度情報を取得することができ、セキュリティ分野又は設備点検等の用途で利用されている。また、可視画像及び赤外線画像を同時に取得できる機器が市販されているが、可視光カメラと赤外線カメラの位置が異なるため、2つの画像には視差が生じる。視差が生じることで、対象物が画像素子に投影される位置がずれるため、画像データ上でずれが生じる。以下では、画像データ上の視差のずれの大きさを推定対象の視差量とする。 Visible images and infrared images have different properties. Visible images are used for object recognition because they have abundant appearance information such as textures. On the other hand, the infrared image has a lower resolution than the visible image, but can acquire temperature information of the object, and is used in the security field or for equipment inspection and the like. Further, although a device capable of simultaneously acquiring a visible image and an infrared image is commercially available, the positions of the visible light camera and the infrared camera are different, so that the two images have a misalignment. Due to the parallax, the position where the object is projected onto the image element shifts, so that the shift occurs on the image data. In the following, the magnitude of the parallax shift on the image data is used as the amount of parallax to be estimated.
 可視画像及び赤外線画像の情報を利用して、物体認識、又はセマンティックセグメンテーションを行う場合には、視差によるずれを補正し、可視画像及び赤外線画像の対応点が重なるようにする必要がある。従来技術としては、ステレオカメラによる2つの画像の視差量を求めて、深度推定を行うステレオマッチングの技術がある。平行ステレオの位置関係にあるステレオカメラでは、一方の画素に対して、他方の対応する画素は水平なエピポーラ線上に存在することが知られている。図2は、平行ステレオにおける対応点の探索の一例を示す図である。ステレオカメラの左のカメラで撮影した画像を左画像、ステレオカメラの右のカメラで撮影した画像を右画像とする。ステレオカメラは左右に限定されるものではないが、以降の例では、基準とする一方の左画像を第1画像51とし、他方の右画像を第2画像52とする場合を例に説明する。図2に示すように、第1画像51に対して第2画像52のエピポーラ線53上で対応点を探索する。 When performing object recognition or semantic segmentation using the information of the visible image and the infrared image, it is necessary to correct the deviation due to parallax so that the corresponding points of the visible image and the infrared image overlap. As a conventional technique, there is a stereo matching technique in which the depth is estimated by obtaining the parallax amount of two images by a stereo camera. In a stereo camera having a positional relationship of parallel stereo, it is known that for one pixel, the corresponding pixel of the other exists on a horizontal epipolar line. FIG. 2 is a diagram showing an example of searching for corresponding points in parallel stereo. The image taken by the left camera of the stereo camera is taken as the left image, and the image taken by the right camera of the stereo camera is taken as the right image. The stereo camera is not limited to the left and right, but in the following examples, the case where one left image as a reference is the first image 51 and the other right image is the second image 52 will be described as an example. As shown in FIG. 2, a corresponding point is searched for on the epipolar line 53 of the second image 52 with respect to the first image 51.
 視差の推定手法の原理について説明する。上記の特性を利用して基本的なステレオマッチングの手法で、基準とする一方の画像上のある画素に対応する他方の画像の画素の視差量を求めるには、パッチ画像54(以下、説明の便宜のため必要がある場合を除き符号を省略する)による探索を行う。図3は、パッチ画像を作成してエピポーラ線上を探索する場合の一例を示す図である。第1画像の画素を中心としたN×Nのパッチを作成し、これを基準のパッチ画像(第1パッチ画像54A)とする。基準の第1パッチ画像の中心の画素は(u,v)で表される。そして、第2画像のエピポーラ線上にN×Nのパッチ(第2パッチ画像54B)をずらしながら、基準のパッチ画像と比較し、2つのパッチ画像が重なる位置を求めることにより、基準の位置からのずれ量xが視差量として求められる。重なる位置とは、対象物の同一部分が投影された位置である。よって、基準の第1パッチ画像の中心の画素(u,v)に対応する第2パッチ画像の画素は(u,v+x)と求められる。 Explain the principle of the parallax estimation method. In order to obtain the parallax amount of the pixel of the other image corresponding to one pixel on one image as a reference by a basic stereo matching method using the above characteristics, the patch image 54 (hereinafter, the description will be described). The code is omitted unless necessary for convenience). FIG. 3 is a diagram showing an example of a case where a patch image is created and a search is performed on an epipolar line. An N × N patch centered on the pixels of the first image is created, and this is used as a reference patch image (first patch image 54A). The central pixel of the reference first patch image is represented by (u, v). Then, while shifting the N × N patch (second patch image 54B) on the epipolar line of the second image, the patch image is compared with the reference patch image, and the position where the two patch images overlap is obtained from the reference position. The deviation amount x is obtained as the parallax amount. The overlapping position is the position where the same part of the object is projected. Therefore, the pixel of the second patch image corresponding to the central pixel (u, v) of the reference first patch image is obtained as (u, v + x).
 ここで、上記説明した従来の視差の推定手法を、可視画像及び赤外線画像の異なる種類の画像同士に適用する場合について検討する。可視画像は可視光をRGBのデータに変換した3チャネルのデータであり、赤外線画像は画素単位に温度情報を持つ1チャネルのデータである。視差の推定手法では、一方の画像のある画素を中心としN×Nのパッチ画像を切り出す。そして、他方の画像からも複数の画素を中心にN×Nのパッチ画像を複数切り出し、パッチ画像内の画素群の類似度が最も高い座標を割り出すことで視差量を求めている。しかし、可視画像及び赤外線画像ではそれぞれ記録している情報が異なり、類似度を求めることが困難なため精度良く対応点が求められない。よって、同じセンサを前提とした従来の視差の推定手法では視差量を求めることが困難である。また、視差の推定手法の中でも深層学習を利用した手法であれば柔軟に学習できる可能性はあるが、赤外線画像は可視画像に比べて解像度が低いため、対応点のずれが生じて、学習時、又は推定時のノイズとなる可能性がある。 Here, a case where the conventional parallax estimation method described above is applied to different types of images, a visible image and an infrared image, will be examined. The visible image is 3-channel data obtained by converting visible light into RGB data, and the infrared image is 1-channel data having temperature information in pixel units. In the parallax estimation method, an N × N patch image is cut out centering on a certain pixel of one image. Then, a plurality of N × N patch images are cut out from the other image centering on a plurality of pixels, and the coordinates having the highest similarity of the pixel groups in the patch image are calculated to obtain the parallax amount. However, since the recorded information is different between the visible image and the infrared image and it is difficult to obtain the degree of similarity, it is not possible to obtain a corresponding point with high accuracy. Therefore, it is difficult to obtain the amount of parallax by the conventional parallax estimation method based on the same sensor. In addition, among the methods for estimating parallax, there is a possibility that learning can be performed flexibly if it is a method that uses deep learning, but since the resolution of infrared images is lower than that of visible images, there is a shift in the corresponding points during learning. , Or it may be noise at the time of estimation.
 そこで、本開示では、異なるセンサとして可視光カメラ及び赤外線カメラを用いて撮影した可視画像及び赤外線画像に対し、画素同士の対応点を求めるデータ融合手法を提案する。本開示では、ステレオマッチング手法におけるマッチングコストの計算の際に深層学習を用いるが、従来手法とは学習時における正解ラベルの与え方が異なる。深層学習を用いた従来手法では、正解ラベルは画像が重なる場合をPositive(=1)、画像が重ならない場合をNegative(=0)として正解又は不正解の値を与えている。図4は、視差の正解ラベルの値とずれ量との関係の一例を示す図である。図4のグラフは、縦軸が正解ラベルを示す値、横軸がエピポーラ線53上の座標におけるずれ量xを示す値をとる。正解ラベルは、これらの値の二次元のグラフで表される。ずれ量xは、学習用の第1画像と学習用の第2画像との水平方向の位置に対応する。正解ラベルは、当該ずれ量xの位置に対する正解又は不正解の値の関係を示すグラフで表される。エピポーラ線上のずれ量をxとし、視差量をdとすると正解ラベルf(x)は以下(1)式となる。
Figure JPOXMLDOC01-appb-M000001

                            ・・・(1)
Therefore, in the present disclosure, we propose a data fusion method for finding corresponding points between pixels for visible images and infrared images taken by using a visible light camera and an infrared camera as different sensors. In the present disclosure, deep learning is used when calculating the matching cost in the stereo matching method, but the method of giving the correct label at the time of learning is different from that of the conventional method. In the conventional method using deep learning, the correct answer label gives a correct answer value or an incorrect answer value as Positive (= 1) when the images overlap and Negative (= 0) when the images do not overlap. FIG. 4 is a diagram showing an example of the relationship between the value of the correct label of parallax and the amount of deviation. In the graph of FIG. 4, the vertical axis represents a value indicating the correct label, and the horizontal axis represents a value indicating the deviation amount x at the coordinates on the epipolar line 53. The correct label is represented by a two-dimensional graph of these values. The deviation amount x corresponds to the horizontal position between the first image for learning and the second image for learning. The correct answer label is represented by a graph showing the relationship between the correct answer or the incorrect answer value with respect to the position of the deviation amount x. Assuming that the amount of deviation on the epipolar line is x and the amount of parallax is d, the correct label f (x) is given by the following equation (1).
Figure JPOXMLDOC01-appb-M000001

... (1)
 正解ラベルをf(x)とすると、f(x)=(0,1)で表される。画像が重なる位置のずれ量xが正解のずれ量であると共に、正解の視差量dである。正解の視差量dは、正解のずれ量xの長さにより表される。よってx=dとなるときに正解ラベルの値をf(x)=1とする。一方、画像が重ならないずれ量xの位置では正解ラベルの値はf(x)=0とする。以上のように、正解ラベルf(x)は、ずれ量x及び正解ラベルの値の関係を示すグラフとして表される。 If the correct label is f (x), it is represented by f (x) = (0,1). The amount of deviation x at the position where the images overlap is the amount of deviation of the correct answer, and the amount of parallax d of the correct answer. The parallax amount d of the correct answer is represented by the length of the deviation amount x of the correct answer. Therefore, when x = d, the value of the correct label is set to f (x) = 1. On the other hand, the value of the correct answer label is f (x) = 0 at the position of the amount x where the images do not overlap. As described above, the correct answer label f (x) is represented as a graph showing the relationship between the deviation amount x and the value of the correct answer label.
 しかし、上述した課題で述べた通り可視光カメラと赤外線カメラとの異なるセンサ間では、正解の視差量の位置がずれる可能性があるため、学習時にノイズとなる可能性がある。そこで、本手法では上記グラフの正解ラベルに対してフィルタを掛けたフィルタ済ラベルを深層学習の入力とすることで、ノイズが多いデータに対応する。正解ラベルにフィルタを適用するとは、グラフ上にプロットされた正解データの値に対してフィルタを適用することである。フィルタを適用することにより、二値で表現されていた正解データの値が、分布として表現されることになる。ここでは、ガウシアンフィルタを適用し平滑化を行ったのち、ラプラシアンフィルタを適用するLoG(Laplacian of Gaussian)フィルタを適応した例を示す。LoGフィルタは、ノイズを除去したのちにラプラシアンフィルタをかけることでエッジを抽出するフィルタである。図5は、LoGフィルタを適用した場合の視差の正解ラベルの値とずれ量xとの関係の一例を示す図である。図5に示すように、フィルタ後正解ラベルでは正解の位置を頂点として分布が形成されている。なお、フィルタは、LoGフィルタに限らず、ガウシアンフィルタ等、正解又は不正解の値を分布に変換することが可能なフィルタを適用するようにしてもよい。 However, as mentioned in the above-mentioned problem, the position of the correct parallax amount may shift between the different sensors of the visible light camera and the infrared camera, so there is a possibility of noise during learning. Therefore, in this method, the filtered label obtained by filtering the correct label in the above graph is used as the input for deep learning to deal with noisy data. Applying a filter to the correct label means applying a filter to the values of the correct data plotted on the graph. By applying the filter, the value of the correct answer data expressed by the binary value will be expressed as a distribution. Here, an example is shown in which a LoG (Laplacian of Gaussian) filter to which a Laplacian filter is applied is applied after smoothing is performed by applying a Gaussian filter. The LoG filter is a filter that extracts edges by applying a Laplacian filter after removing noise. FIG. 5 is a diagram showing an example of the relationship between the value of the correct label of the parallax and the deviation amount x when the LoG filter is applied. As shown in FIG. 5, the distribution is formed with the position of the correct answer as the apex in the correct answer label after the filter. The filter is not limited to the LoG filter, and a filter capable of converting correct or incorrect values into a distribution, such as a Gaussian filter, may be applied.
 g(x)をガウシアンフィルタ、l(x)をラプラシアンフィルタとすると、フィルタ後正解ラベルffilt(x)は以下となる。
Figure JPOXMLDOC01-appb-M000002

                                   ・・・(2)
Assuming that g (x) is a Gaussian filter and l (x) is a Laplacian filter, the correct label f filter (x) after the filter is as follows.
Figure JPOXMLDOC01-appb-M000002

... (2)
 以上のように正解ラベルに対しフィルタを用いることで、学習時のノイズを抑えることが期待できる。ただし、この手法の類似度の出力は、図6の上図のようになると想定される。これは、学習時にフィルタを掛けたフィルタ済正解ラベルで学習したため、類似度の出力も同様の分布で出力されるためである。この類似度の出力から視差量を求めるには、類似度の出力に対して逆フィルタを掛けピークを求めることで視差量を求めることができる。 By using a filter for the correct label as described above, it can be expected to suppress noise during learning. However, it is assumed that the output of the similarity of this method is as shown in the upper figure of FIG. This is because the output of the similarity is also output with the same distribution because the training is performed with the filtered correct label that has been filtered at the time of training. In order to obtain the parallax amount from the output of the similarity, the parallax amount can be obtained by applying an inverse filter to the output of the similarity and obtaining the peak.
 本開示によって、可視画像と赤外線画像とを1つのデータに精度よく融合させた融合データが生成できる。また、融合データの可視画像と赤外線画像との両方の情報を活用して深層学習を行うことで、物体認識やセマンティックセグメンテーションにおいて認識精度の向上が期待できる。 With this disclosure, it is possible to generate fusion data in which a visible image and an infrared image are accurately fused into one data. Further, by performing deep learning by utilizing both the visible image and the infrared image of the fusion data, it is expected that the recognition accuracy will be improved in the object recognition and the semantic segmentation.
 以下、本実施形態の構成について説明する。以下、視差学習装置と融合データ生成装置とに分けて説明する。 Hereinafter, the configuration of this embodiment will be described. Hereinafter, the parallax learning device and the fusion data generation device will be described separately.
 図7は、視差学習装置100及び融合データ生成装置200のハードウェア構成を示すブロック図である。視差学習装置100及び融合データ生成装置200は同様のハードウェア構成であるため、以下では視差学習装置100について説明する。 FIG. 7 is a block diagram showing the hardware configuration of the parallax learning device 100 and the fusion data generation device 200. Since the parallax learning device 100 and the fusion data generation device 200 have the same hardware configuration, the parallax learning device 100 will be described below.
 図7に示すように、視差学習装置100は、CPU(Central Processing Unit)11、ROM(Read Only Memory)12、RAM(Random Access Memory)13、ストレージ14、入力部15、表示部16及び通信インタフェース(I/F)17を有する。各構成は、バス19を介して相互に通信可能に接続されている。 As shown in FIG. 7, the parallax learning device 100 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a storage 14, an input unit 15, a display unit 16, and a communication interface. It has (I / F) 17. The configurations are connected to each other via a bus 19 so as to be communicable with each other.
 CPU11は、中央演算処理ユニットであり、各種プログラムを実行したり、各部を制御したりする。すなわち、CPU11は、ROM12又はストレージ14からプログラムを読み出し、RAM13を作業領域としてプログラムを実行する。CPU11は、ROM12又はストレージ14に記憶されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。本実施形態では、ROM12又はストレージ14には、視差学習プログラムが格納されている。 The CPU 11 is a central arithmetic processing unit that executes various programs and controls each part. That is, the CPU 11 reads the program from the ROM 12 or the storage 14, and executes the program using the RAM 13 as a work area. The CPU 11 controls each of the above configurations and performs various arithmetic processes according to the program stored in the ROM 12 or the storage 14. In the present embodiment, the parallax learning program is stored in the ROM 12 or the storage 14.
 ROM12は、各種プログラム及び各種データを格納する。RAM13は、作業領域として一時的にプログラム又はデータを記憶する。ストレージ14は、HDD(Hard Disk Drive)又はSSD(Solid State Drive)等の記憶装置により構成され、オペレーティングシステムを含む各種プログラム、及び各種データを格納する。 ROM 12 stores various programs and various data. The RAM 13 temporarily stores a program or data as a work area. The storage 14 is composed of a storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various programs including an operating system and various data.
 入力部15は、マウス等のポインティングデバイス、及びキーボードを含み、各種の入力を行うために使用される。 The input unit 15 includes a pointing device such as a mouse and a keyboard, and is used for performing various inputs.
 表示部16は、例えば、液晶ディスプレイであり、各種の情報を表示する。表示部16は、タッチパネル方式を採用して、入力部15として機能してもよい。 The display unit 16 is, for example, a liquid crystal display and displays various information. The display unit 16 may adopt a touch panel method and function as an input unit 15.
 通信インタフェース17は、端末等の他の機器と通信するためのインタフェースである。当該通信には、例えば、イーサネット(登録商標)若しくはFDDI等の有線通信の規格、又は、4G、5G、若しくはWi-Fi(登録商標)等の無線通信の規格が用いられる。融合データ生成装置200についても同様に、CPU21、ROM22、RAM23、ストレージ24、入力部25、表示部26及び通信I/F27を有する。各構成は、バス29を介して相互に通信可能に接続されている。ROM22又はストレージ24には、融合データ生成プログラムが格納されている。ハードウェア構成の各部についての説明は、視差学習装置100と同様であるため省略する。 The communication interface 17 is an interface for communicating with other devices such as terminals. For the communication, for example, a wired communication standard such as Ethernet (registered trademark) or FDDI, or a wireless communication standard such as 4G, 5G, or Wi-Fi (registered trademark) is used. Similarly, the fusion data generation device 200 also has a CPU 21, a ROM 22, a RAM 23, a storage 24, an input unit 25, a display unit 26, and a communication I / F 27. The configurations are connected to each other via a bus 29 so as to be communicable with each other. The fusion data generation program is stored in the ROM 22 or the storage 24. Since the description of each part of the hardware configuration is the same as that of the parallax learning device 100, the description thereof will be omitted.
 次に、視差学習装置100の各機能構成について説明する。図8は、本実施形態の視差学習装置100の構成を示すブロック図である。各機能構成は、CPU11がROM12又はストレージ14に記憶された視差学習プログラムを読み出し、RAM13に展開して実行することにより実現される。 Next, each functional configuration of the parallax learning device 100 will be described. FIG. 8 is a block diagram showing the configuration of the parallax learning device 100 of the present embodiment. Each functional configuration is realized by the CPU 11 reading the parallax learning program stored in the ROM 12 or the storage 14, expanding it into the RAM 13, and executing the program.
 図8に示すように、視差学習装置100は、2つのカメラに対応する第1画像と第2画像とを処理するための学習用の各処理部を有する。学習用の各処理部としては、第1画像入力部101と、第1画像前処理部102と、第1パッチ画像生成部103と、第2画像入力部104と、第2画像前処理部105と、第2パッチ画像生成部106とが含まれる。また、視差学習装置100は、ラベル入力部107と、フィルタ部108と、類似性学習部109と、モデル記憶部110とを含んで構成される。なお、第1画像入力部101及び第2画像入力部104は図1に示したステレオカメラの構成とし、第1画像入力部101が左のカメラ、第2画像入力部104が右のカメラとする。可視画像と赤外線画像との入力は左右どちらも選択可能とする。また、学習用の各処理部を外部の装置として、出力を視差学習装置100が受け付ける構成としてもよい。 As shown in FIG. 8, the parallax learning device 100 has learning processing units for processing the first image and the second image corresponding to the two cameras. The learning processing units include a first image input unit 101, a first image preprocessing unit 102, a first patch image generation unit 103, a second image input unit 104, and a second image preprocessing unit 105. And the second patch image generation unit 106 are included. Further, the parallax learning device 100 includes a label input unit 107, a filter unit 108, a similarity learning unit 109, and a model storage unit 110. The first image input unit 101 and the second image input unit 104 have the configuration of the stereo camera shown in FIG. 1, the first image input unit 101 is the left camera, and the second image input unit 104 is the right camera. .. Both left and right inputs can be selected for the visible image and the infrared image. Further, each processing unit for learning may be used as an external device, and the parallax learning device 100 may receive the output.
 第1画像入力部101では、左のカメラで撮影した画像をディジタルデータとして入力し、学習用の第1画像を第1画像前処理部102へ出力する。 The first image input unit 101 inputs the image taken by the left camera as digital data, and outputs the first image for learning to the first image preprocessing unit 102.
 第1画像前処理部102では、第1画像入力部101から学習用の第1画像を受け付け、第1画像に輪郭抽出、平行化、及び歪み補正等の前処理を施し、前処理済みの第1画像を第1パッチ画像生成部103へ出力する。 The first image preprocessing unit 102 receives a first image for learning from the first image input unit 101, performs preprocessing such as contour extraction, parallelization, and distortion correction on the first image, and preprocessed the first image. 1 The image is output to the first patch image generation unit 103.
 第1パッチ画像生成部103及び第2パッチ画像生成部106には、ラベル入力部107から正解ラベルの正解の値を示す座標(u,v)が入力される。 Coordinates (u, v) indicating the correct value of the correct label are input from the label input unit 107 to the first patch image generation unit 103 and the second patch image generation unit 106.
 第1パッチ画像生成部103では、第1画像前処理部102から入力された前処理済みの第1画像に対し、入力された正解ラベルの座標(u,v)を中心として、N×N画素を切り取った第1パッチ画像を生成する。第1パッチ画像生成部103は、生成した第1パッチ画像を類似性学習部109へ出力する。このように、第1パッチ画像は、学習用の第1画像から正解ラベルの座標情報を基準として生成する。 In the first patch image generation unit 103, N × N pixels centered on the coordinates (u, v) of the input correct label for the preprocessed first image input from the first image preprocessing unit 102. The first patch image is generated. The first patch image generation unit 103 outputs the generated first patch image to the similarity learning unit 109. In this way, the first patch image is generated from the first image for learning with reference to the coordinate information of the correct label.
 第2画像入力部104では、右のカメラで撮影した画像をディジタルデータとして入力し、学習用の第2画像を第2画像前処理部105へ出力する。 The second image input unit 104 inputs the image taken by the right camera as digital data, and outputs the second image for learning to the second image preprocessing unit 105.
 第2画像前処理部105では、第2画像入力部104から学習用の第2画像を受け付け、輪郭抽出、平行化、及び歪み補正等の前処理を施し、前処理済みの第2画像を第2パッチ画像生成部106へ出力する。 The second image preprocessing unit 105 receives a second image for learning from the second image input unit 104, performs preprocessing such as contour extraction, parallelization, and distortion correction, and uses the preprocessed second image as the second image. 2 Output to the patch image generation unit 106.
 第2パッチ画像生成部106では、第2画像前処理部105から入力された前処理済みの第2画像に対し、入力された正解ラベルの座標(u,v)を基準にdminからdmaxの範囲について、N×N画素を切り取った複数の第2パッチ画像を生成する。基準から第2パッチ画像を生成する範囲はdminからdmaxとし、ステレオカメラ等の情報から視差量dとなり得る範囲を予め定めておけばよい。第2パッチ画像は、エピポーラ線上に沿って水平方向に1画素ずつパッチの中心をずらし、N×N画素を切り取って複数生成する。第2パッチ画像生成部106は、生成した第2パッチ画像を類似性学習部109へ出力する。このように、第2パッチ画像は、学習用の第2画像から正解ラベルの座標情報を基準として水平方向にずらして複数生成する。 In the second patch image generation unit 106, the preprocessed second image input from the second image preprocessing unit 105 is d min to d max based on the coordinates (u, v) of the input correct label. A plurality of second patch images in which N × N pixels are cut out are generated for the range of. The range for generating the second patch image from the reference may be d min to d max , and the range that can be the parallax amount d from the information of the stereo camera or the like may be predetermined. A plurality of second patch images are generated by shifting the center of the patch one pixel at a time in the horizontal direction along the epipolar line and cutting out N × N pixels. The second patch image generation unit 106 outputs the generated second patch image to the similarity learning unit 109. In this way, a plurality of second patch images are generated by shifting horizontally from the second image for learning with reference to the coordinate information of the correct answer label.
 ラベル入力部107には、ずれ量xの座標情報と位置ごとの視差量dを入力として与える。視差の正解ラベルは、(1)式で示したf(x)として入力する。ラベル入力部107は、正解ラベルの正解の値の座標(u,v)を、第1パッチ画像生成部103、及び第2パッチ画像生成部106に出力する。また、ラベル入力部107は、視差の正解ラベルをフィルタ部108へ出力する。 The label input unit 107 is provided with the coordinate information of the deviation amount x and the parallax amount d for each position as inputs. The correct label for parallax is input as f (x) shown in Eq. (1). The label input unit 107 outputs the coordinates (u, v) of the correct answer value of the correct answer label to the first patch image generation unit 103 and the second patch image generation unit 106. Further, the label input unit 107 outputs the correct parallax label to the filter unit 108.
 フィルタ部108では、ラベル入力部107から入力された(1)式で表す視差の正解ラベルに対し、フィルタを適用し、フィルタを適用したフィルタ済正解ラベルを類似性学習部109へ出力する。LoGフィルタを適用した場合は、(2)式で表すffilt(x)をフィルタ済正解ラベルとして出力する。 The filter unit 108 applies a filter to the correct answer label of the parallax represented by the equation (1) input from the label input unit 107, and outputs the filtered correct answer label to which the filter is applied to the similarity learning unit 109. When the LoG filter is applied, the f filter (x) represented by the equation (2) is output as a filtered correct label.
 類似性学習部109は、第1パッチ画像生成部103から第1パッチ画像、第2パッチ画像生成部106から複数の第2パッチ画像、フィルタ部108からフィルタ済正解ラベルを受け付ける。類似性学習部109は、第1パッチ画像と、複数の第2パッチ画像と、フィルタ済正解ラベルとに基づいて、モデルのパラメータを学習する。類似性学習部109は、学習したモデルのパラメータをモデル記憶部110に格納する。モデルは、推定におけるパッチ画像ペアに対応する類似度を出力するためのモデルである。当該モデルのパラメータの学習には、2つの入力画像から類似度を推定するモデルを学習する手法として、深層学習の手法の一態様であるMC-CNNなど一般的に知られた手法を利用できる。 The similarity learning unit 109 receives the first patch image from the first patch image generation unit 103, a plurality of second patch images from the second patch image generation unit 106, and the filtered correct label from the filter unit 108. The similarity learning unit 109 learns the parameters of the model based on the first patch image, the plurality of second patch images, and the filtered correct answer label. The similarity learning unit 109 stores the parameters of the learned model in the model storage unit 110. The model is a model for outputting the similarity corresponding to the patch image pair in estimation. For learning the parameters of the model, a generally known method such as MC-CNN, which is one aspect of the deep learning method, can be used as a method for learning a model for estimating the similarity from two input images.
 モデル記憶部110には、類似性学習部109で学習されたモデルのパラメータが格納される。 The model storage unit 110 stores the parameters of the model learned by the similarity learning unit 109.
 次に、視差学習装置100の作用について説明する。 Next, the operation of the parallax learning device 100 will be described.
 図9は、視差学習装置100による視差学習処理の流れを示すフローチャートである。CPU11がROM12又はストレージ14から視差学習プログラムを読み出して、RAM13に展開して実行することにより、視差学習処理が行なわれる。CPU11が視差学習装置100の各部として以下の処理を実行する。 FIG. 9 is a flowchart showing the flow of the parallax learning process by the parallax learning device 100. The parallax learning process is performed by the CPU 11 reading the parallax learning program from the ROM 12 or the storage 14, expanding it into the RAM 13, and executing the program. The CPU 11 executes the following processing as each part of the parallax learning device 100.
 ステップS100において、CPU11は、第1画像入力部101から学習用の第1画像の入力を受け付けて、第1画像に前処理を施し、前処理済みの第1画像を第1パッチ画像生成部103へ出力する。 In step S100, the CPU 11 receives the input of the first image for learning from the first image input unit 101, preprocesses the first image, and uses the preprocessed first image as the first patch image generation unit 103. Output to.
 ステップS102において、CPU11は、第2画像入力部104から学習用の第2画像の入力を受け付けて、第2画像に前処理を施し、前処理済みの第2画像を第2パッチ画像生成部106へ出力する。 In step S102, the CPU 11 receives the input of the second image for learning from the second image input unit 104, preprocesses the second image, and uses the preprocessed second image as the second patch image generation unit 106. Output to.
 ステップS104において、CPU11は、ラベル入力部107から視差の正解ラベルの入力を受け付ける。正解ラベルの正解の値の座標(u,v)を、第1パッチ画像生成部103、及び第2パッチ画像生成部106に出力し、正解ラベル自体をフィルタ部108へ出力する。 In step S104, the CPU 11 receives the input of the correct parallax label from the label input unit 107. The coordinates (u, v) of the correct answer value of the correct answer label are output to the first patch image generation unit 103 and the second patch image generation unit 106, and the correct answer label itself is output to the filter unit 108.
 ステップS106において、CPU11は、学習用の第1パッチ画像及び複数の学習用の第2パッチ画像を生成する。本ステップの学習用のパッチ画像生成処理の詳細については後述する。 In step S106, the CPU 11 generates a first patch image for learning and a plurality of second patch images for learning. The details of the patch image generation process for learning in this step will be described later.
 ステップS108において、CPU11は、視差の正解ラベルに対し、フィルタを適用し、フィルタを適用したフィルタ済正解ラベルを類似性学習部109へ出力する。 In step S108, the CPU 11 applies a filter to the correct answer label of the parallax, and outputs the filtered correct answer label to which the filter is applied to the similarity learning unit 109.
 ステップS110において、CPU11は、学習用の第1パッチ画像と、複数の学習用の第2パッチ画像と、フィルタ済正解ラベルとに基づいて、モデルのパラメータを学習する。 In step S110, the CPU 11 learns the model parameters based on the first patch image for learning, the second patch image for learning, and the filtered correct answer label.
 ステップS112において、CPU11は、学習したモデルのパラメータをモデル記憶部110に格納する。 In step S112, the CPU 11 stores the learned model parameters in the model storage unit 110.
 次に、ステップS106の学習用のパッチ画像生成処理について、図10のフローチャートを参照して説明する。 Next, the patch image generation process for learning in step S106 will be described with reference to the flowchart of FIG.
 ステップS130では、CPU11が、入力された前処理済みの第1画像に対し、入力された正解ラベルの座標(u,v)を中心として、N×N画素を切り取った学習用の第1パッチ画像を生成する。 In step S130, the CPU 11 cuts out N × N pixels from the input preprocessed first image centered on the coordinates (u, v) of the input correct label, and the first patch image for learning. To generate.
 ステップS132では、CPU11が、正解ラベルの座標(u,v)を基準に、パッチ画像の中心とする位置についてx=dminと設定する。 In step S132, the CPU 11 sets x = d min for the position centered on the patch image with reference to the coordinates (u, v) of the correct label.
 ステップS134では、CPU11が、座標(u,v+x)を中心として、学習用の第2パッチ画像を生成する。 In step S134, the CPU 11 generates a second patch image for learning centered on the coordinates (u, v + x).
 ステップS136では、CPU11が、x≦dmaxであるか否かを判定する。条件を満たす場合にはステップS138へ移行し、条件を満たさない場合には処理を終了する。 In step S136, it is determined whether or not the CPU 11 has x ≦ d max . If the condition is satisfied, the process proceeds to step S138, and if the condition is not satisfied, the process ends.
 ステップS138では、CPU11が、x=x+1とxの値をカウントアップし、ステップS134に戻って学習用の第2パッチ画像を生成する処理を繰り返す。以上により、学習用の第1パッチ画像、及び複数の第2パッチ画像が生成される。 In step S138, the CPU 11 counts up the values of x = x + 1 and x, returns to step S134, and repeats the process of generating the second patch image for learning. As a result, the first patch image for learning and a plurality of second patch images are generated.
 以上説明したように本実施形態の視差学習装置100によれば、異なるセンサによって取得した画像間であっても、精度良く視差量を推定して活用することを可能とするモデルのパラメータを学習できる。 As described above, according to the parallax learning device 100 of the present embodiment, it is possible to learn the parameters of the model that enables accurate estimation and utilization of the parallax amount even between images acquired by different sensors. ..
 次に、融合データ生成装置200の各機能構成について説明する。図11は、本実施形態の融合データ生成装置200の構成を示すブロック図である。各機能構成は、CPU21がROM22又はストレージ24に記憶された融合データ生成プログラムを読み出し、RAM23に展開して実行することにより実現される。 Next, each functional configuration of the fusion data generation device 200 will be described. FIG. 11 is a block diagram showing the configuration of the fusion data generation device 200 of the present embodiment. Each functional configuration is realized by the CPU 21 reading the fusion data generation program stored in the ROM 22 or the storage 24, deploying it in the RAM 23, and executing it.
 図11に示すように、融合データ生成装置200は、2つのカメラに対応する第1画像と第2画像とを処理するための推定用の各処理部を有する。推定用の各処理部としては、第1画像入力部201と、第1画像前処理部202と、第1特徴点抽出部203と、第1パッチ画像生成部204と、第2画像入力部205と、第2画像前処理部206と、第2パッチ画像生成部207とが含まれる。また、融合データ生成装置200は、類似性計算部208と、類似性集計部209と、逆フィルタ部210と、視差計算部211と、視差補間部212と、融合データ生成部213と、モデル記憶部230とを含んで構成される。なお、第1画像入力部201及び第2画像入力部205は図1に示したステレオカメラの構成とし、第1画像入力部201が左のカメラ、第2画像入力部205が右のカメラとする。可視画像と赤外線画像との入力は左右どちらも選択可能とする。また、推定用の各処理部を外部の装置として、出力を融合データ生成装置200が受け付ける構成としてもよい。 As shown in FIG. 11, the fusion data generation device 200 has each processing unit for estimation for processing the first image and the second image corresponding to the two cameras. The estimation processing units include a first image input unit 201, a first image preprocessing unit 202, a first feature point extraction unit 203, a first patch image generation unit 204, and a second image input unit 205. A second image preprocessing unit 206 and a second patch image generation unit 207 are included. Further, the fusion data generation device 200 includes a similarity calculation unit 208, a similarity aggregation unit 209, an inverse filter unit 210, a parallax calculation unit 211, a parallax interpolation unit 212, a fusion data generation unit 213, and a model storage. It is configured to include a part 230. The first image input unit 201 and the second image input unit 205 have the configuration of the stereo camera shown in FIG. 1, the first image input unit 201 is the left camera, and the second image input unit 205 is the right camera. .. Both left and right inputs can be selected for the visible image and the infrared image. Further, each processing unit for estimation may be used as an external device, and the fusion data generation device 200 may receive the output.
 第1画像入力部201では、左のカメラで撮影した画像をディジタルデータとして入力し、推定対象の第1画像を第1画像前処理部202へ出力する。 The first image input unit 201 inputs the image taken by the left camera as digital data, and outputs the first image to be estimated to the first image preprocessing unit 202.
 第1画像前処理部202では、第1画像入力部201から推定対象の第1画像を受け付け、第1画像に輪郭抽出、平行化、及び歪み補正等の前処理を施し、前処理済みの第1画像を第1特徴点抽出部203へ出力する。 The first image preprocessing unit 202 receives the first image to be estimated from the first image input unit 201, performs preprocessing such as contour extraction, parallelization, and distortion correction on the first image, and the preprocessed first image. 1 The image is output to the first feature point extraction unit 203.
 第1特徴点抽出部203では、入力された前処理済み第1画像から特徴点の各々を抽出し、前処理済み第1画像と抽出した特徴点の各々の座標を第1パッチ画像生成部204へ出力する。また、第1特徴点抽出部203では、抽出した特徴点の各々の座標を第2パッチ画像生成部207へ出力する。 The first feature point extraction unit 203 extracts each of the feature points from the input preprocessed first image, and sets the coordinates of each of the preprocessed first image and the extracted feature points into the first patch image generation unit 204. Output to. Further, the first feature point extraction unit 203 outputs the coordinates of each of the extracted feature points to the second patch image generation unit 207.
 第1パッチ画像生成部204では、第1特徴点抽出部203から前処理済み第1画像と特徴点の各々の座標を受け付け、入力された特徴点ごとに、当該特徴点の座標を中心とするN×N画素を切り取った第1パッチ画像を類似性計算部208へ出力する。これにより、第1画像の特徴点の各々について、当該特徴点を基準として生成された第1画像の第1パッチ画像が生成される。 The first patch image generation unit 204 receives the coordinates of the preprocessed first image and the feature points from the first feature point extraction unit 203, and centers the coordinates of the feature points for each input feature point. The first patch image obtained by cutting out N × N pixels is output to the similarity calculation unit 208. As a result, for each of the feature points of the first image, the first patch image of the first image generated with respect to the feature points is generated.
 第2画像入力部205では、右のカメラで撮影した画像をディジタルデータとして入力し、推定対象の第2画像を第2画像前処理部206へ出力する。 The second image input unit 205 inputs the image taken by the right camera as digital data, and outputs the second image to be estimated to the second image preprocessing unit 206.
 第2画像前処理部206では、第2画像入力部205から推定対象の第2画像を受け付け、輪郭抽出、平行化、及び歪み補正等の前処理を施し、前処理済みの第2画像を第2パッチ画像生成部207へ出力する。 The second image preprocessing unit 206 receives the second image to be estimated from the second image input unit 205, performs preprocessing such as contour extraction, parallelization, and distortion correction, and uses the preprocessed second image as the second image. 2 Output to the patch image generation unit 207.
 第2パッチ画像生成部207では、第2画像前処理部206から入力された前処理済みの第2画像に対し、入力された特徴点ごとに、当該特徴点の座標を基準に、dminからdmaxの範囲について、N×N画素を切り取った複数の第2パッチ画像を生成する。複数生成する手法は、視差学習装置100と同様であり、エピポーラ線上の複数の第2パッチ画像が生成される。これにより、第1画像の特徴点の各々について、第2画像から当該特徴点を基準として生成された複数の第2パッチ画像が生成される。 In the second patch image generation unit 207, with respect to the preprocessed second image input from the second image preprocessing unit 206, for each feature point input, from d min based on the coordinates of the feature point. A plurality of second patch images in which N × N pixels are cut out for the range of d max are generated. The method of generating a plurality of images is the same as that of the parallax learning device 100, and a plurality of second patch images on the epipolar line are generated. As a result, for each of the feature points of the first image, a plurality of second patch images generated from the second image with the feature points as a reference are generated.
 モデル記憶部230には、視差学習装置100で学習された、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータが格納されている。 The model storage unit 230 stores the parameters of the model for outputting the similarity corresponding to the patch image pair learned by the parallax learning device 100.
 類似性計算部208では、第1パッチ画像生成部204から特徴点の各々及び第1パッチ画像を受け付け、第2パッチ画像生成部207から複数の第2パッチ画像を受け付ける。類似性計算部208では、第1パッチ画像と、複数の第2パッチ画像の各々とのペアを組み合わせとして処理する。類似性計算部208は、特徴点ごとに、モデル記憶部230のモデルに組み合わせの各々を入力し、パラメータを用いたモデルの出力として、組み合わせの各々の類似性を示す類似度を出力する。組み合わせについては、第1パッチ画像を基準とし、複数の第2パッチ画像とのそれぞれの組み合わせを、学習したモデルによりマッチングコストを評価し、マッチングコストを類似度として計算する。図6は、逆フィルタ前の類似度と、逆フィルタ後の逆フィルタ後の類似度との一例を示す図である。組み合わせごとの各類似度が上記図6上の各点に対応する。計算した組み合わせの各々のマッチングコストである類似度を類似性集計部209へ出力する。 The similarity calculation unit 208 receives each of the feature points and the first patch image from the first patch image generation unit 204, and receives a plurality of second patch images from the second patch image generation unit 207. The similarity calculation unit 208 processes a pair of the first patch image and each of the plurality of second patch images as a combination. The similarity calculation unit 208 inputs each of the combinations into the model of the model storage unit 230 for each feature point, and outputs the similarity indicating the similarity of each combination as the output of the model using the parameters. Regarding the combination, the matching cost is evaluated by the trained model for each combination with the plurality of second patch images with the first patch image as a reference, and the matching cost is calculated as the degree of similarity. FIG. 6 is a diagram showing an example of the similarity before the inverse filter and the similarity after the inverse filter after the inverse filter. Each degree of similarity for each combination corresponds to each point on FIG. The similarity, which is the matching cost of each of the calculated combinations, is output to the similarity totaling unit 209.
 類似性集計部209は、特徴点ごとに、出力された組み合わせの各々の類似度を集計し、集計結果を逆フィルタ部210へ出力する。特徴点ごとに、上記図6上のような集計結果が得られる。 The similarity totaling unit 209 totals the similarity of each of the output combinations for each feature point, and outputs the totaled result to the inverse filter unit 210. For each feature point, the aggregated result as shown in FIG. 6 is obtained.
 逆フィルタ部210は、特徴点ごとに、出力された組み合わせの各々の類似度の集計結果に対して、逆フィルタを適用した逆フィルタ後推定結果を視差計算部211へ出力する。逆フィルタでは、計算量を抑える目的で、集計結果をフーリエ変換により周波数領域へ変換し、周波数領域で逆フィルタを適用し、フーリエ逆変換を行うことで空間領域に戻す処理を行う。 The inverse filter unit 210 outputs to the parallax calculation unit 211 the estimation result after the inverse filter to which the inverse filter is applied to the aggregated result of the similarity of each of the output combinations for each feature point. In the inverse filter, for the purpose of reducing the amount of calculation, the aggregated result is converted into the frequency domain by the Fourier transform, the inverse filter is applied in the frequency domain, and the inverse transform is performed to return to the spatial domain.
 視差計算部211は、特徴点ごとのフィルタ後推定結果から特徴点ごとの視差量を示す視差情報を計算し、視差補間部212へ出力する。逆フィルタ部210から入力された特徴点ごとの逆フィルタ後推定結果から、ピークとなる点を探索し、ピークとなる時のエピポーラ線上のずれを視差量dとして求め、この特徴点ごとの視差量dを視差情報とする。ここまでの処理により、画像中の特徴点ごとの視差量dを計算できる。 The parallax calculation unit 211 calculates parallax information indicating the amount of parallax for each feature point from the estimated result after filtering for each feature point, and outputs the parallax information to the parallax interpolation unit 212. From the estimation result after the inverse filter for each feature point input from the inverse filter unit 210, the point to be the peak is searched, the deviation on the epipolar line at the time of the peak is obtained as the parallax amount d, and the parallax amount for each feature point is obtained. Let d be the parallax information. By the processing up to this point, the parallax amount d for each feature point in the image can be calculated.
 視差補間部212は、特徴点ごとの視差情報から画素ごとの視差量を補間した視差補間情報を計算し、融合データ生成部213へ出力する。視差補間部212では、特徴点の視差量dをもとに画像全体の視差量を求める。具体例を示すと、構造物点検のユースケースの場合は、対象物は平面であることが多いため、平面であると仮定する。この仮定のもと、特徴点ごとの視差量を最小二乗法等で近似することで、特徴点間の視差を補間することができる。補間の際には、外れ値の影響を抑えるためロバスト推定の手法を組み込むこともできる。この処理により、特徴点以外の画素の視差量を推定し、画像全体の画素ごとの視差量を視差補間情報として出力する。 The parallax interpolation unit 212 calculates the parallax interpolation information obtained by interpolating the parallax amount for each pixel from the parallax information for each feature point, and outputs it to the fusion data generation unit 213. The parallax interpolation unit 212 obtains the parallax amount of the entire image based on the parallax amount d of the feature points. To give a specific example, in the case of a structure inspection use case, it is assumed that the object is a flat surface because it is often a flat surface. Based on this assumption, the parallax between the feature points can be interpolated by approximating the parallax amount for each feature point by the least squares method or the like. When interpolating, a robust estimation method can be incorporated to suppress the effects of outliers. By this processing, the amount of parallax of pixels other than the feature points is estimated, and the amount of parallax for each pixel of the entire image is output as parallax interpolation information.
 融合データ生成部213は、視差補間情報に基づいて推定対象の第1画像と推定対象の第2画像との融合データを生成し出力する。融合データは、視差補間情報をもとに第1画像と第2画像との位置合わせを行うことにより生成する。融合データ生成部213では、可視画像のRGB3チャネルと赤外線画像の温度データ1チャネルを重畳させた4チャネルの融合データを生成する。 The fusion data generation unit 213 generates and outputs fusion data of the first image to be estimated and the second image to be estimated based on the parallax interpolation information. The fusion data is generated by aligning the first image and the second image based on the parallax interpolation information. The fusion data generation unit 213 generates fusion data of 4 channels in which the RGB3 channel of the visible image and the temperature data 1 channel of the infrared image are superimposed.
 次に、融合データ生成装置200の作用について説明する。 Next, the operation of the fusion data generation device 200 will be described.
 図12は、融合データ生成装置200による融合データ生成処理の流れを示すフローチャートである。CPU21がROM22又はストレージ24から融合データ生成プログラムを読み出して、RAM23に展開して実行することにより、融合データ生成処理が行なわれる。CPU21が融合データ生成装置200の各部として以下の処理を実行する。 FIG. 12 is a flowchart showing the flow of the fusion data generation process by the fusion data generation device 200. The fusion data generation process is performed by the CPU 21 reading the fusion data generation program from the ROM 22 or the storage 24, expanding it into the RAM 23, and executing the fusion data generation program. The CPU 21 executes the following processing as each part of the fusion data generation device 200.
 ステップS200において、CPU21は、第1画像入力部201から推定対象の第1画像の入力を受け付けて、第1画像に前処理を施し、前処理済みの第1画像を第1特徴点抽出部203へ出力する。 In step S200, the CPU 21 receives the input of the first image to be estimated from the first image input unit 201, preprocesses the first image, and uses the preprocessed first image as the first feature point extraction unit 203. Output to.
 ステップS202において、CPU21は、第2画像入力部205から推定対象の第2画像の入力を受け付けて、第2画像に前処理を施し、前処理済みの第2画像を第2パッチ画像生成部207へ出力する。 In step S202, the CPU 21 receives the input of the second image to be estimated from the second image input unit 205, preprocesses the second image, and uses the preprocessed second image as the second patch image generation unit 207. Output to.
 ステップS204において、CPU21は、入力された前処理済み第1画像から特徴点の各々を抽出し、前処理済み第1画像と抽出した特徴点の各々の座標を第1パッチ画像生成部204へ出力する。 In step S204, the CPU 21 extracts each of the feature points from the input preprocessed first image, and outputs the coordinates of each of the preprocessed first image and the extracted feature points to the first patch image generation unit 204. do.
 ステップS206において、CPU21は、特徴点の選択をi=1と設定する。ステップS204で抽出された特徴点の総数はNとする。 In step S206, the CPU 21 sets the selection of the feature point to i = 1. The total number of feature points extracted in step S204 is N.
 ステップS208において、CPU21は、選択された特徴点iについて、視差情報の計算までの処理を実行する。本ステップの特徴点計算処理の詳細については後述する。 In step S208, the CPU 21 executes processing up to the calculation of parallax information for the selected feature point i. The details of the feature point calculation process in this step will be described later.
 ステップS210において、CPU21は、i≦Nであるか否かを判定する。条件を満たす場合にはステップS212へ移行し、条件を満たさない場合にはステップS214へ移行する In step S210, the CPU 21 determines whether or not i ≦ N. If the condition is satisfied, the process proceeds to step S212, and if the condition is not satisfied, the process proceeds to step S214.
 ステップS212において、CPU21は、i=i+1とiの値をカウントアップして次の特徴点を選択して特徴点計算処理を繰り返す。 In step S212, the CPU 21 counts up the values of i = i + 1 and i, selects the next feature point, and repeats the feature point calculation process.
 ステップS214において、CPU21は、ステップS210の処理で得られた特徴点ごとの視差情報から画素ごとの視差量を補間した視差補間情報を計算する。 In step S214, the CPU 21 calculates parallax interpolation information obtained by interpolating the parallax amount for each pixel from the parallax information for each feature point obtained in the process of step S210.
 ステップS216において、CPU21は、視差補間情報に基づいて推定対象の第1画像と推定対象の第2画像との融合データを生成し出力する。 In step S216, the CPU 21 generates and outputs fusion data of the first image to be estimated and the second image to be estimated based on the parallax interpolation information.
 次に、ステップS208の特徴点計算処理について、図13のフローチャートを参照して説明する。以下は、選択された特徴点iについての処理である。 Next, the feature point calculation process in step S208 will be described with reference to the flowchart of FIG. The following is the processing for the selected feature point i.
 ステップS230において、CPU21は、入力された前処理済みの第1画像に対し、選択された特徴点iの座標(u,v)を中心として、N×N画素を切り取った推定対象の第1パッチ画像を生成する。 In step S230, the CPU 21 cuts out N × N pixels with respect to the input preprocessed first image centered on the coordinates (u, v) of the selected feature point i, and the first patch to be estimated. Generate an image.
 ステップS232において、CPU21は、特徴点iの座標を基準に、パッチ画像の中心とする位置についてx=dminと設定する。 In step S232, the CPU 21 sets x = d min for the position centered on the patch image with reference to the coordinates of the feature point i.
 ステップS234において、CPU21は、座標(u,v+x)を中心として、推定対象の第2パッチ画像を生成する。 In step S234, the CPU 21 generates a second patch image to be estimated centered on the coordinates (u, v + x).
 ステップS236において、CPU21は、ステップS230で生成した推定対象の第1パッチ画像と、ステップS234で生成した推定対象の第2パッチ画像とをペアの組み合わせとし、モデル記憶部230のモデルに当該ペアの組み合わせを入力する。そして、パラメータを用いたモデルの出力により、当該組み合わせの類似性を示すマッチングコストを類似度として計算する。 In step S236, the CPU 21 sets the first patch image of the estimation target generated in step S230 and the second patch image of the estimation target generated in step S234 as a pair combination, and sets the pair in the model of the model storage unit 230. Enter the combination. Then, by outputting the model using the parameters, the matching cost indicating the similarity of the combination is calculated as the degree of similarity.
 ステップS238において、CPU21は、x≦dmaxであるか否かを判定する。条件を満たす場合にはステップS240へ移行し、条件を満たさない場合にはステップS242へ移行する。 In step S238, the CPU 21 determines whether or not x ≦ d max . If the condition is satisfied, the process proceeds to step S240, and if the condition is not satisfied, the process proceeds to step S242.
 ステップS240において、CPU21は、x=x+1とxの値をカウントアップし、ステップS234に戻って推定対象の第2パッチ画像を生成、及び類似度の計算を繰り返す。 In step S240, the CPU 21 counts up the values of x = x + 1 and x, returns to step S234 to generate a second patch image to be estimated, and repeats the calculation of the similarity.
 ステップS242において、CPU21は、出力された組み合わせの各々の類似度を集計し、集計結果を逆フィルタ部210へ出力する。 In step S242, the CPU 21 aggregates the similarity of each of the output combinations and outputs the aggregated result to the inverse filter unit 210.
 ステップS244において、CPU21は、出力された組み合わせの各々の推定の集計結果に対して、逆フィルタを適用した逆フィルタ後推定結果を視差計算部211へ出力する。 In step S244, the CPU 21 outputs to the parallax calculation unit 211 the estimation result after the inverse filter to which the inverse filter is applied to the aggregation result of each estimation of the output combination.
 ステップS246において、CPU21は、フィルタ後推定結果から選択されている特徴点iの視差量を示す視差情報を計算し、視差補間部212へ出力する。これにより、特徴点iごとの視差情報が求められる。 In step S246, the CPU 21 calculates the parallax information indicating the parallax amount of the feature point i selected from the estimation result after the filter, and outputs the parallax information to the parallax interpolation unit 212. As a result, parallax information for each feature point i is obtained.
 以上説明したように本実施形態の融合データ生成装置200によれば、異なるセンサによって取得した画像間であっても、精度良く視差量を推定して活用することを可能とする融合データを生成できる。 As described above, according to the fusion data generation device 200 of the present embodiment, it is possible to generate fusion data that enables accurate estimation and utilization of the parallax amount even between images acquired by different sensors. ..
 なお、上記実施形態でCPUがソフトウェア(プログラム)を読み込んで実行した視差学習処理又は融合データ生成処理を、CPU以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、FPGA(Field-Programmable Gate Array)等の製造後に回路構成を変更可能なPLD(Programmable Logic Device)、及びASIC(Application Specific Integrated Circuit)等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、視差学習処理又は融合データ生成処理を、これらの各種のプロセッサのうちの1つで実行してもよいし、同種又は異種の2つ以上のプロセッサの組み合わせ(例えば、複数のFPGA、及びCPUとFPGAとの組み合わせ等)で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子等の回路素子を組み合わせた電気回路である。 It should be noted that various processors other than the CPU may execute the parallax learning process or the fusion data generation process executed by the CPU reading the software (program) in the above embodiment. As a processor in this case, a PLD (Programmable Logic Device) whose circuit configuration can be changed after manufacturing an FPGA (Field-Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), or the like for specifying an ASIC. An example is a dedicated electric circuit or the like, which is a processor having a circuit configuration designed exclusively for it. Further, the parallax learning process or the fusion data generation process may be executed by one of these various processors, or a combination of two or more processors of the same type or different types (for example, a plurality of FPGAs and a CPU). And FPGA in combination, etc.). Further, the hardware-like structure of these various processors is, more specifically, an electric circuit in which circuit elements such as semiconductor elements are combined.
 また、上記実施形態では、視差学習プログラムがストレージ14に予め記憶(インストール)されている態様を説明したが、これに限定されない。プログラムは、CD-ROM(Compact Disk Read Only Memory)、DVD-ROM(Digital Versatile Disk Read Only Memory)、及びUSB(Universal Serial Bus)メモリ等の非一時的(non-transitory)記憶媒体に記憶された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。融合データ生成プログラムについても同様とする。 Further, in the above embodiment, the embodiment in which the parallax learning program is stored (installed) in the storage 14 in advance has been described, but the present invention is not limited to this. The program is stored in a non-temporary medium such as a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versaille Disk Online Memory), and a USB (Universal Serial Bus) memory. It may be provided in the form. Further, the program may be downloaded from an external device via a network. The same applies to the fusion data generation program.
 以上の実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional notes will be further disclosed.
 (付記項1)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記プロセッサは、
 学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力し、
 前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する、
 ように構成されている視差学習装置。
(Appendix 1)
With memory
With at least one processor connected to the memory
Including
The processor
A filtered label is output with a filter applied to the correct label indicating the relationship between the horizontal position deviation between the first image for learning and the second image for learning and the correct answer with respect to the position.
A first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a plurality of images generated by shifting horizontally from the second image for learning based on the coordinate information. Based on the second patch image for training and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are learned.
A parallax learning device configured to.
 (付記項2)
 視差学習処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力し、
 前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する、
 非一時的記憶媒体。
(Appendix 2)
A non-temporary storage medium that stores a program that can be executed by a computer to perform parallax learning processing.
A filtered label is output with a filter applied to the correct label indicating the relationship between the horizontal position deviation between the first image for learning and the second image for learning and the correct answer with respect to the position.
A first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a plurality of images generated by shifting horizontally from the second image for learning based on the coordinate information. Based on the second patch image for training and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are learned.
Non-temporary storage medium.
100 視差学習装置
101、201       第1画像入力部
102、202       第1画像前処理部
103、204       第1パッチ画像生成部
104、205       第2画像入力部
105、206       第2画像前処理部
106、207       第2パッチ画像生成部
107 ラベル入力部
108 フィルタ部
109 類似性学習部
110、230       モデル記憶部
200 融合データ生成装置
203 第1特徴点抽出部
208 類似性計算部
209 類似性集計部
210 逆フィルタ部
211 視差計算部
212 視差補間部
213 融合データ生成部
100 Disparity learning device 101, 201 1st image input unit 102, 202 1st image preprocessing unit 103, 204 1st patch image generation unit 104, 205 2nd image input unit 105, 206 2nd image preprocessing unit 106, 207 2nd patch image generation unit 107 Label input unit 108 Filter unit 109 Similarity learning unit 110, 230 Model storage unit 200 Fusion data generation device 203 1st feature point extraction unit 208 Similarity calculation unit 209 Similarity totaling unit 210 Inverse filter unit 211 Misparation calculation unit 212 Disparity interpolation unit 213 Fusion data generation unit

Claims (7)

  1.  学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力するフィルタ部と、
     前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する類似性学習部と、
     を含む視差学習装置。
    A filter unit that outputs a filtered label to which a filter is applied to a correct label indicating the relationship between the horizontal position deviation amount of the first image for learning and the second image for learning and the correct answer with respect to the position. ,
    A first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a plurality of images generated by shifting horizontally from the second image for learning based on the coordinate information. A similarity learning unit that learns the parameters of the model for outputting the similarity corresponding to the patch image pair based on the second patch image for learning and the filtered label.
    Parallax learning device including.
  2.  前記正解ラベルの正解又は不正解の値は二値で与えられており、
     前記フィルタ部は、前記正解又は不正解の値を分布に変換することが可能なフィルタを用いる請求項1に記載の視差学習装置。
    The correct or incorrect value of the correct label is given as a binary value.
    The parallax learning device according to claim 1, wherein the filter unit uses a filter capable of converting the correct or incorrect values into a distribution.
  3.  推定対象の第1画像の特徴点の各々についての、前記推定対象の第1画像から前記特徴点を基準として生成された推定対象の第1パッチ画像と、推定対象の第2画像から前記特徴点を基準として生成された複数の推定対象の第2パッチ画像の各々との組み合わせの各々と、請求項1又は請求項2に記載の視差学習装置により学習された前記モデルのパラメータとを受け付け、
     前記特徴点ごとに、前記モデルに前記組み合わせの各々を入力し、前記パラメータを用いた前記モデルの出力として、前記組み合わせの各々の類似性を示す推定を出力する類似性計算部と、
     前記特徴点ごとに、出力された前記組み合わせの各々の推定の集計結果に対して、逆フィルタを適用した逆フィルタ後推定結果を出力する逆フィルタ部と、
     前記特徴点ごとの逆フィルタ後推定結果から前記特徴点ごとの視差量を示す視差情報を計算する視差計算部と、
     前記特徴点ごとの前記視差情報から画素ごとの視差量を補間した視差補間情報を計算する視差補間部と、
     前記視差補間情報に基づいて前記第1画像と前記第2画像との融合データを生成する融合データ生成部と、
     を含む融合データ生成装置。
    For each of the feature points of the first image to be estimated, the first patch image of the estimation target generated from the first image of the estimation target with reference to the feature points, and the feature points from the second image of the estimation target. Each of the combinations with each of the second patch images of the plurality of estimation targets generated with reference to the above, and the parameters of the model learned by the disparity learning device according to claim 1 or 2, are accepted.
    A similarity calculation unit that inputs each of the combinations to the model for each feature point and outputs an estimate indicating the similarity of each of the combinations as the output of the model using the parameters.
    An inverse filter unit that outputs a post-filter estimation result to which an inverse filter is applied to the aggregated result of each estimation of the combination output for each feature point.
    A parallax calculation unit that calculates parallax information indicating the amount of parallax for each feature point from the estimation result after the inverse filter for each feature point.
    A parallax interpolation unit that calculates parallax interpolation information by interpolating the amount of parallax for each pixel from the parallax information for each feature point, and
    A fusion data generation unit that generates fusion data of the first image and the second image based on the parallax interpolation information, and
    Fusion data generator including.
  4.  学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力し、
     前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する、
     処理をコンピュータに実行させる視差学習方法。
    A filtered label is output with a filter applied to the correct label indicating the relationship between the horizontal position deviation between the first image for learning and the second image for learning and the correct answer with respect to the position.
    A first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a plurality of images generated by shifting horizontally from the second image for learning based on the coordinate information. Based on the second patch image for training and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are learned.
    A parallax learning method that causes a computer to perform processing.
  5.  前記正解ラベルの正解又は不正解の値は二値で与えられており、
     前記フィルタには、前記正解又は不正解の値を分布に変換することが可能なフィルタを用いる請求項4に記載の視差学習方法。
    The correct or incorrect value of the correct label is given as a binary value.
    The parallax learning method according to claim 4, wherein the filter uses a filter capable of converting the correct or incorrect values into a distribution.
  6.  推定対象の第1画像の特徴点の各々についての、前記推定対象の第1画像から前記特徴点を基準として生成された推定対象の第1パッチ画像と、推定対象の第2画像から前記特徴点を基準として生成された複数の推定対象の第2パッチ画像の各々との組み合わせの各々と、請求項1又は請求項2に記載の視差学習装置により学習された前記モデルのパラメータとを受け付け、
     前記特徴点ごとに、前記モデルに前記組み合わせの各々を入力し、前記パラメータを用いた前記モデルの出力として、前記組み合わせの各々の類似性を示す推定を出力し、
     前記特徴点ごとに、出力された前記組み合わせの各々の推定の集計結果に対して、逆フィルタを適用した逆フィルタ後推定結果を出力し、
     前記特徴点ごとの逆フィルタ後推定結果から前記特徴点ごとの視差量を示す視差情報を計算し、
     前記特徴点ごとの前記視差情報から画素ごとの視差量を補間した視差補間情報を計算し、
     前記視差補間情報に基づいて前記第1画像と前記第2画像との融合データを生成する、
     処理をコンピュータに実行させる融合データ生成方法。
    For each of the feature points of the first image to be estimated, the first patch image of the estimation target generated from the first image of the estimation target with reference to the feature points, and the feature points from the second image of the estimation target. Each of the combinations with each of the second patch images of the plurality of estimation targets generated with reference to the above, and the parameters of the model learned by the disparity learning device according to claim 1 or 2, are accepted.
    For each feature point, each of the combinations is input to the model, and as the output of the model using the parameters, an estimate showing the similarity of each of the combinations is output.
    For each of the feature points, the estimation result after the inverse filter is applied to the aggregated result of the estimation of each of the output combinations, and the estimation result is output.
    Parallax information indicating the amount of parallax for each feature point is calculated from the estimation result after the inverse filter for each feature point.
    The parallax interpolation information obtained by interpolating the parallax amount for each pixel is calculated from the parallax information for each feature point.
    The fusion data of the first image and the second image is generated based on the parallax interpolation information.
    A fusion data generation method that causes a computer to perform processing.
  7.  学習用の第1画像と学習用の第2画像との水平方向の位置のずれ量及び前記位置に対する正解の関係を示す正解ラベルに対して、フィルタを適用したフィルタ済ラベルを出力し、
     前記学習用の第1画像から前記正解ラベルの座標情報を基準として生成した学習用の第1パッチ画像と、前記学習用の第2画像から前記座標情報を基準として水平方向にずらして生成した複数の学習用の第2パッチ画像と、前記フィルタ済ラベルとに基づいて、パッチ画像ペアに対応する類似度を出力するためのモデルのパラメータを学習する、
     処理をコンピュータに実行させる視差学習プログラム。
    A filtered label is output with a filter applied to the correct label indicating the relationship between the horizontal position deviation between the first image for learning and the second image for learning and the correct answer with respect to the position.
    A first patch image for learning generated from the first image for learning based on the coordinate information of the correct answer label, and a plurality of images generated by shifting horizontally from the second image for learning based on the coordinate information. Based on the second patch image for training and the filtered label, the parameters of the model for outputting the similarity corresponding to the patch image pair are learned.
    A parallax learning program that lets a computer perform processing.
PCT/JP2020/045106 2020-12-03 2020-12-03 Parallax learning device, merge data generation device, parallax learning method, merge data generation method, and parallax learning program WO2022118442A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/045106 WO2022118442A1 (en) 2020-12-03 2020-12-03 Parallax learning device, merge data generation device, parallax learning method, merge data generation method, and parallax learning program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/045106 WO2022118442A1 (en) 2020-12-03 2020-12-03 Parallax learning device, merge data generation device, parallax learning method, merge data generation method, and parallax learning program

Publications (1)

Publication Number Publication Date
WO2022118442A1 true WO2022118442A1 (en) 2022-06-09

Family

ID=81852701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/045106 WO2022118442A1 (en) 2020-12-03 2020-12-03 Parallax learning device, merge data generation device, parallax learning method, merge data generation method, and parallax learning program

Country Status (1)

Country Link
WO (1) WO2022118442A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018133064A (en) * 2017-02-17 2018-08-23 キヤノン株式会社 Image processing apparatus, imaging apparatus, image processing method, and image processing program
WO2020188120A1 (en) * 2019-03-21 2020-09-24 Five AI Limited Depth extraction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018133064A (en) * 2017-02-17 2018-08-23 キヤノン株式会社 Image processing apparatus, imaging apparatus, image processing method, and image processing program
WO2020188120A1 (en) * 2019-03-21 2020-09-24 Five AI Limited Depth extraction

Similar Documents

Publication Publication Date Title
US11132809B2 (en) Stereo matching method and apparatus, image processing apparatus, and training method therefor
US9754377B2 (en) Multi-resolution depth estimation using modified census transform for advanced driver assistance systems
EP3201881B1 (en) 3-dimensional model generation using edges
US8755630B2 (en) Object pose recognition apparatus and object pose recognition method using the same
US20120127275A1 (en) Image processing method for determining depth information from at least two input images recorded with the aid of a stereo camera system
Warren et al. Online calibration of stereo rigs for long-term autonomy
US20180091798A1 (en) System and Method for Generating a Depth Map Using Differential Patterns
EP3182369B1 (en) Stereo matching method, controller and system
Ma et al. A modified census transform based on the neighborhood information for stereo matching algorithm
Kumari et al. A survey on stereo matching techniques for 3D vision in image processing
Hua et al. Extended guided filtering for depth map upsampling
EP3309743B1 (en) Registration of multiple laser scans
JP2005037378A (en) Depth measurement method and depth measurement device
EP3293700A1 (en) 3d reconstruction for vehicle
US11042986B2 (en) Method for thinning and connection in linear object extraction from an image
US20230206594A1 (en) System and method for correspondence map determination
CN102447917A (en) Three-dimensional image matching method and equipment thereof
CN111105452A (en) High-low resolution fusion stereo matching method based on binocular vision
CN111739071B (en) Initial value-based rapid iterative registration method, medium, terminal and device
Mei et al. Radial lens distortion correction using cascaded one-parameter division model
JP6359985B2 (en) Depth estimation model generation device and depth estimation device
WO2022118442A1 (en) Parallax learning device, merge data generation device, parallax learning method, merge data generation method, and parallax learning program
CN110610503B (en) Three-dimensional information recovery method for electric knife switch based on three-dimensional matching
CN109741389B (en) Local stereo matching method based on region base matching
Le et al. A new depth image quality metric using a pair of color and depth images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964294

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964294

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP