WO2019117393A1

WO2019117393A1 - Learning apparatus and method for depth information generation, depth information generation apparatus and method, and recording medium related thereto

Info

Publication number: WO2019117393A1
Application number: PCT/KR2018/001156
Authority: WO
Inventors: 손광훈; 박기홍
Original assignee: 연세대학교 산학협력단
Priority date: 2017-12-13
Filing date: 2018-01-26
Publication date: 2019-06-20
Also published as: KR101976290B1

Abstract

A learning apparatus and method for depth information generation, a depth information generation apparatus and method, and a recording medium related thereto are disclosed. The disclosed learning apparatus for depth information generation learns a second depth information generation method by merging stereo camera depth information generated by merging a left image and a right image acquired from a stereo camera device and LIDAR depth information acquired from a LIDAR device, and comprises: a first depth information generation unit for learning a first depth information generation method by merging the stereo camera depth information and the LIDAR depth information; and a second depth information generation unit for learning the second depth information generation method by merging first depth information and a standard image, having become the standard in stereo camera depth information merging between the left image and the right image, wherein the first depth information generation unit learns by using reference stereo camera depth information and reference LIDAR depth information as input values and using reference actual depth information as a label, the second depth information generation unit learns by using, as input values, a reference standard image and reference first depth information, which is generated by the first depth information generation unit by using the reference stereo camera depth information and the reference LIDAR depth information as the input values, and using the reference actual depth information as a label, and the first depth information generation unit learns by further considering, in an error backpropagation step during a learning process, an error value of the error backpropagation step of the second depth information generation unit. According to the disclosed apparatus, more accurate depth information can be quickly generated by using learning.

Description

Learning apparatus and method for generating depth information, apparatus and method for generating depth information, and recording medium therefor

The present invention relates to a learning apparatus and method for generating depth information, an apparatus and method for generating depth information, and a recording medium therefor.

Recently, with the development of smart car and drone industry, the necessity of 3D map is emerging. 3D maps can be used directly in the identification of smart cannon drone autonomous navigation routes, and also in applications. For example, the 3D structure information used in Augmented Reality (AR) also requires a 3D map.

The core technology of 3D map production is 3D modeling by grasping the depth information. However, the conventional method of acquiring depth information using a stereo camera has a lot of errors due to noise and algorithm limit, The method of acquiring depth information with the measurement sensor has a problem of low resolution.

In order to solve the problems of the related art as described above, the present invention provides a learning apparatus and method for depth information generation capable of generating more accurate depth information by fusing ladder depth information and stereo camera depth information using learning An apparatus and method for generating depth information, and a recording medium therefor.

According to a first embodiment of the present invention, stereo camera depth information generated by fusing left and right images acquired from a stereo camera device and ladder depth information acquired from a Lada device are combined, A first depth information learning method for generating a first depth information by fusing the stereo camera depth information and the ladder depth information, Generating unit; And a second depth information generation unit for learning a method of generating the second depth information by fusing the reference image and the first depth information based on stereo camera depth information fusion among the left and right images, The first depth information generator learns reference stereo camera depth information and reference depth information as input values and reference actual depth information as labels, and the second depth information generator learns the reference stereo camera depth information and reference The first depth information generating unit learns the reference first depth information and the reference reference image generated by the first depth information generating unit as input values and the reference actual depth information as a label with the depth information as an input value, In the error back propagation process, the error back propagation process of the second depth information generation unit And the error value of the depth information is further considered.

Wherein the first depth information generator and the second depth information generator are learned using a composite neural network algorithm.

And the first depth information generating unit and the second depth information generating unit use die rate convolution.

Wherein the first depth information generator comprises: a first filter unit for performing a diatomic convolution on the ladder depth information; A second filter unit for performing di-rate convolution on the stereo camera depth information; A first fused unit for fading the Lada depth information on which the Diya convolution is performed and the stereo camera depth information on which the Diya convolution is performed; And a third filter unit for performing the diatomic convolution on the depth information fused in the first fused unit to generate the first depth information.

Wherein the second depth information generator comprises: a fourth filter unit for performing a diatomic convolution on the reference image; A fifth filter unit for performing a diatomic convolution on the first depth information; A second fusion unit for fusing the reference image on which the dyerate convolution is performed and the first depth information on which the dyerate convolution is performed; And a third filter unit for performing diore rate convolution on depth information fused in the second fusion unit to generate the second depth information.

The first depth information generator uses an error value calculated using the following equation in an error back propagation process during a learning process.

In the above equation,

Is an error value used in the error back propagation process by the first depth information generator,

Is an error value of the first depth information,

Is an error value of the second depth information transmitted from the second depth information generator.

According to the second embodiment of the present invention, stereo camera depth information generated by fusing a left image and a right image acquired from a stereo camera device is fused with ladder depth information acquired from a Lada device, A first depth information generating unit for generating first depth information by fusing the stereo camera depth information and the ladder depth information; And a second depth information generator for generating the second depth information by fusing the first depth information with a reference image that is a reference for stereo camera depth information fusion among the left and right images, Wherein the generating unit learns the reference stereo camera depth information and the reference line depth information as input values and the reference actual depth information as labels, and the second depth information generating unit learns the reference stereo camera depth information and the reference depth The reference first depth information and the reference reference image generated by the first depth information generator are used as input values and the reference actual depth information is used as a label, In the error back propagation process in the pre-learning process, The multiple depth to an error value of the inverse spread process, characterized in that the further study in consideration information generating apparatus is provided.

According to a third embodiment of the present invention, second depth information is generated by fusing stereo camera depth information generated by merging left and right images acquired from a stereo camera apparatus with ladder depth information obtained from a Lada apparatus (A) generating first depth information by fusing reference stereo camera depth information and reference ladder depth information; And (b) generating second depth information by fusing a reference image, which is a reference of the reference stereo camera depth information fusion, from the reference left image and the reference right image, and the first depth information; (c) learning the step (b) through an error back propagation process with an error value being a second error value which is a difference between the reference actual depth information and the second depth information; And (d) a second error value transmitted in step (c) and a first error value, which is a difference between the reference actual depth information and the first depth information, as an error value, And a step of learning the step (a) through the step (a).

The step (a) includes the steps of: (a1) performing di-rate convolution on the reference depth information; (a2) performing di-rate convolution on the reference stereo camera depth information; (a3) fusing the ladder depth information on which the diaRate convolution is performed and the stereo camera depth information on which the diaRate convolution is performed; And (a4) generating the first depth information by performing rate matching on the depth information fused in the step (a3).

According to a fourth embodiment of the present invention, second depth information is generated by fusing stereo camera depth information generated by fusion of a left image and a right image acquired from a stereo camera apparatus with ladder depth information obtained from a Lada apparatus The method comprising the steps of: (a) generating first depth information by fusing the stereo camera depth information and the ladder depth information; And (b) generating the second depth information by fusing the reference image and the first depth information based on the stereo camera depth information fusion among the left and right images, wherein the step (a) The reference stereo camera depth information and the reference line depth information are learned as an input value and the reference actual depth information is used as a label, and the step (b) is a step of inputting the reference stereo camera depth information and reference depth information The reference first depth information and the reference reference image generated by the first depth information generator are used as input values and the reference actual depth information is used as a label, and the step (a) The error value of the error back propagation process of step (b) is further considered in the error back propagation process A depth information generation method is provided.

According to a fifth embodiment of the present invention, there is provided a computer-readable recording medium having recorded thereon a program for performing the learning method or the depth information generating method for generating the depth information.

The present invention is advantageous in that more accurate depth information can be generated quickly using learning.

FIG. 1 is a diagram for explaining a combined product neural network algorithm. FIG.

2 is a diagram for explaining a convolution method of a composite-object-based neural network.

3 is a diagram for explaining a downsampling method of a composite-articulated network.

4 is a diagram for explaining the diate convolution.

5 is a structural diagram of a learning apparatus for generating depth information according to a preferred embodiment of the present invention.

6 is a diagram for explaining a learning process of a learning apparatus for generating depth information according to a preferred embodiment of the present invention.

7 is a structural diagram of an apparatus for generating depth information according to an embodiment of the present invention.

8 is a flowchart illustrating a learning method for generating depth information according to an exemplary embodiment of the present invention.

9 is a flowchart illustrating a depth information generating method according to an exemplary embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

The learning apparatus for generating depth information according to the preferred embodiment of the present invention acquires stereo camera depth information generated by fusing left and right images acquired using a stereo camera apparatus and acquires stereo camera depth information using a LIDAR The depth information is fused to generate more accurate depth information. In order to generate the depth information accurately and quickly, the present invention can use a Deep Learning algorithm. For example, a CNN (Convolution Neural Network) can be used. In particular, a learning apparatus for generating depth information according to a preferred embodiment of the present invention uses a dilate convolution, and includes two learning networks each composed of a first depth information generating unit and a second depth information generating unit It is possible to learn depth information more quickly and accurately.

Although Lada depth information contains accurate depth information, it is not suitable for direct application to real 3D map because resolution is low. Stereo camera depth information has a high resolution but has a large depth error. It is difficult to produce accurate 3D map . The learning apparatus for generating depth information according to the preferred embodiment of the present invention learns a method of generating depth information by fusing the ladder depth information and the stereo depth information so that it is possible to generate depth information of a high resolution with a relatively small error have.

Particularly, a learning apparatus for generating depth information according to a preferred embodiment of the present invention uses two learning networks composed of a first depth information generating unit and a second depth information generating unit, the first depth information generation unit also considers the error value generated in the second depth information generation unit in the process of propagation so that more accurate and efficient learning can be performed and the depth information generation method can be learned more accurately and quickly.

Hereinafter, the convolutional and diatomic convolution of the convolutional neural network applicable to the embodiment of the present invention will be briefly described, and then, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a diagram for explaining the convolution of a composite-object neural network.

Referring to FIG. 1, the composite neural network algorithm performs convolution on an input image to extract a feature map for an input image, and identifies or classifies an input image through a feature map )do. The feature map includes feature information on the input image. For feature map extraction, the convolution can be repeated, and the number of iterations can be variously determined according to the embodiment.

When the size of the filter (or kernel 210) used for the convolution is determined, convolution is performed through a weighted sum of pixel values of the input image 200 and a weight assigned to each pixel of the filter. That is, the pixel value 230 of the convolution layer can be determined by multiplying the weight of the filter by the pixel value for each corresponding pixel for a specific region of the input image where the filter is overlapped.

As shown in the drawing, pixels of a specific region of the input image 200 overlapping with the weights (4, 0, 0, 0, 0, 0, 0, 0, A weighted sum is performed on the values (0, 0, 0, 0, 1, 1, 0, 1, 2) The size of the input image 200 is 7X7, and the size of the filter 210 is 3X3 < RTI ID = 0.0 > A convolution layer of size 5X5 can be generated.

Since the pixel value according to the convolution becomes the pixel value 230 of the center pixel of the overlapped area, the size of the convolutional layer, i.e., the convoluted image, relative to the input image decreases. However, when padding an outer area of an input image with a specific pixel value, a convolution layer having a size of 7 × 7 equal to the size of the input image can be generated. The number of convolution layers may be determined according to the number of filters used.

Fig. 2 is a diagram for explaining di-rate convolution.

Referring to FIG. 2, in the convolution layer generation process, the diate convolution considers points at nine positions as shown in FIG. 2 (a). In FIG. 2 (a), k is set to 2, and k is a distance to a neighboring point among the points at nine positions. In FIG. 2 (b), if the k-value is increased and the dia-rate convolution is repeatedly performed, the range that can be considered at one position increases exponentially. Therefore, by repeating the diate convolution by changing the k value, it becomes possible to quickly learn the entire information even with a small number of convolution operations.

Hereinafter, a structure of a learning apparatus for generating depth information according to a preferred embodiment of the present invention will be described in detail.

FIG. 3 is a structural diagram of a learning apparatus for generating depth information according to a preferred embodiment of the present invention, FIG. 4 is a view for explaining a learning process of a learning apparatus for generating depth information according to a preferred embodiment of the present invention to be.

3 and 4, a learning apparatus for generating depth information according to an exemplary embodiment of the present invention may include a first depth information generator 110 and a second depth information generator 120 .

The first depth information generator 110 sets the reference depth information 10 and the reference stereo camera depth information 20 as input values and sets the reference depth information 30 and the reference depth information 30 as labels, Upon receiving the camera depth information, the first depth information 40 can be learned.

The first depth information generator 110 may generate the first depth information 40 using the following equation.

In Equation 1, D _F is the first depth information, D _L is the reference depth depth information, D _S is the reference stereo camera depth information, and φ _F is an internal parameter of the first depth information generating unit.

The reference ladder depth information 10 may be obtained using a ladder device and the reference stereo camera depth information 20 may be generated by fusing the left and right images obtained using the stereo camera device. A 3D image generation method using a known stereo camera can be used for fusion of the left image and the right image. On the other hand, the reference actual depth information 30 means actual depth information, and can be acquired using a high-cost, low-efficiency precision depth sensor, for example. The first depth information generator 110 can be learned using the above-described composite neural network, and can use a diatomic convolution.

More specifically, the first depth information generator 110 may include a first filter 112, a second filter 114, a first fusion unit 116, and a third filter 118.

The first filter 112 may perform a diatomic convolution on the reference ladder depth information 10 and the second filter 114 may perform a dyadic convolution on the reference stereo camera depth information 20 have. The first fusion unit 116 merges the reference depth information 10 on which the dihedral convolution has been performed and the reference stereo camera depth information 20 on which the dia-rate convolution has been performed. The third filter 118 performs diate rate convolution on the depth information fused in the first fused portion 116 to generate the first depth information 40. In one example, It is possible to perform the diate convolution by applying k values of the diate convolution filter performed in the first filter 112 and the second filter 114 in the reverse order.

Meanwhile, the second depth information generator 120 may include a reference reference image 26 and a first depth information 40 as input values, a reference depth information 30 as a reference label, The second depth information 50 can be learned to be generated.

The second depth information generator 120 may generate the second depth information 50 using the following equation.

In Equation 2, D _* is the second, and the depth information, I _l is the reference standard image, φ _R is the internal parameter generation of the second depth information.

The reference reference image 26 is an image based on the generation of the reference stereo camera depth information 20 among the left image and the right image. The second depth information generator 120 can also be learned using the above-described composite neural network, and can use the diate convolution.

The first depth information 40 generated by the first depth information generator 110 has comparatively accurate depth information. However, if the depth value in the detailed edge area is corrected, the error can be further reduced. Therefore, the second depth information generator 120 can be learned to generate more accurate depth information using the reference reference image 26 directly shot by the stereo camera.

Specifically, the second depth information generator 120 may include a fourth filter 122, a fifth filter 124, a second fusion unit 126, and a sixth filter 128.

In the fourth filter 122, a diatomic convolution for the reference reference image 26 may be performed, and in the fifth filter 124, a diatomic convolution for the first depth information 40 may be performed. In the second fusion unit 126, a fusion of the reference reference image 26 on which the dia-rate convolution is performed and the first depth information 40 on which the dia-rate convolution is performed is performed. The sixth filter 128 performs diate rate convolution on the depth information fused in the second fused portion 126 to generate the second depth information 50. For example, It is possible to perform k-value convolution by applying k values of the di-rate convolution filter performed in the second filter 122 and the fourth filter 124 in the reverse order.

In the learning process, the first depth information generator 110 and the second depth information generator 120 compare the generated depth information with a value input as a label, and perform a back propagation process It is learned to generate better depth information values.

The error back propagation process starts at the end of the second depth information generator 120 and the difference value between the second depth information 50 and the reference actual depth information 30 is re- .

That is, the second error value is transmitted from the sixth filter 128 of the second depth information generator 120 through the convolution filter in the direction of the fourth filter 122 and the fifth filter 124, The second depth information generator 120 is learned to generate the second depth information 50 closer to the reference actual depth information 30. [

Meanwhile, the first depth information generator 110 also performs learning through an error back propagation process. The first depth information generator 110 generates the first depth information 40 as a difference between the generated first depth information 40 and the reference depth information 30 inputted as a label, And the second error value is transmitted again in the reverse order.

The error value used in the error back propagation process in the first depth information generator 110 may be calculated using the following equation.

In Equation (3)

Is an error value used in the error back propagation process in the first depth information generator,

Is a first error value,

Is a second error value.

That is, in the first depth information generator 110 according to the preferred embodiment of the present invention, both the first error value and the second error value transmitted from the second depth information generator 120 are used for the error back propagation process The first depth information generator 110 may be learned to generate the first depth information 40 that allows the second depth information generator 120 to generate more accurate second depth information 50 have.

As described above, the learning apparatus for generating depth information according to the preferred embodiment of the present invention uses two learning networks, which are a first depth information generating unit and a second depth information generating unit, So that it can be learned to generate more accurate depth information quickly.

5 is a structural diagram of an apparatus for generating depth information according to an embodiment of the present invention.

Referring to FIG. 5, the apparatus for generating depth information according to an exemplary embodiment of the present invention may include a first depth information generator 710 and a second depth information generator 720.

The depth information generating apparatus according to the preferred embodiment of the present invention may be learned in advance like the learning process of the learning apparatus for generating depth information according to the preferred embodiment of the present invention described above.

The first depth information generator 710 may receive the depth information and the depth information of the stereo camera to generate the first depth information.

The second depth information generator 720 may generate the second depth information by receiving the first depth information and the reference image.

6 is a flowchart illustrating a learning method for generating depth information according to an exemplary embodiment of the present invention.

Referring to FIG. 6, a method for generating depth information according to an exemplary embodiment of the present invention includes a first depth information generation step S810, a second depth information generation step S820, a first error back propagation step S830) and a second error back propagation step (S840).

The first depth information generation step S810 is a step of generating the first depth information 40 in the first depth information generation unit 110. [

The second depth information generation step S820 is a step of generating the second depth information 50 in the second depth information generation unit 120. [

The first error back propagation step (S830) is a step of learning the resultant neural network through the error back propagation process in the second depth information generating unit (120).

The second error back propagation step S840 is a step of learning the resultant neural network through the error back propagation process in the first depth information generator 110. [

7 is a flowchart illustrating a depth information generating method according to an exemplary embodiment of the present invention.

Referring to FIG. 7, a depth information generating method according to an exemplary embodiment of the present invention may include a first depth information generating step S910 and a second depth information generating step S920.

The first depth information generating step S910 is a step of generating the first depth information in the first depth information generating unit 710. [

The second depth information generation step S920 is a step of generating second depth information in the second depth information generation unit 720. [

The above-described technical features may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- Those skilled in the art will appreciate that various modifications and changes may be made thereto without departing from the scope of the present invention. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

Claims

The depth information is generated by learning the method of generating the second depth information by fusing the stereo camera depth information generated by merging the left and right images acquired from the stereo camera device with the Lada depth information obtained from the Lada device As a learning apparatus,

A first depth information generator for learning a method of generating first depth information by fusing the stereo camera depth information and the ladder depth information; And

And a second depth information generation unit for learning a method of generating the second depth information by fusing the reference image and the first depth information based on stereo camera depth information fusion among the left and right images,

Wherein the first depth information generator learns reference stereo camera depth information and reference depth information as input values and reference actual depth information as labels,

Wherein the second depth information generator uses the reference stereo camera depth information and the reference depth information as input values and outputs the reference first depth information and the reference reference image generated by the first depth information generator as input values, Depth information is learned by labeling,

Wherein the first depth information generator learns an error value of the error back propagation process of the second depth information generator during error forward propagation in the learning process.
The method according to claim 1,

Wherein the first depth information generating unit and the second depth information generating unit are learned using a composite neural network algorithm.
3. The method of claim 2,

Wherein the first depth information generating unit and the second depth information generating unit use a dyerate convolution.
The method according to claim 1,

Wherein the first depth information generator comprises:

A first filter unit for performing dihedral convolution on the ladder depth information;

A second filter unit for performing di-rate convolution on the stereo camera depth information;

A first fused unit for fading the Lada depth information on which the Diya convolution is performed and the stereo camera depth information on which the Diya convolution is performed; And

And a third filter unit for performing degree-of-convolution on the depth information fused in the first fusion unit to generate the first depth information.
The method according to claim 1,

Wherein the second depth information generator comprises:

A fourth filter unit for performing di-rate convolution on the reference image;

A fifth filter unit for performing a diatomic convolution on the first depth information;

A second fusion unit for fusing the reference image on which the dyerate convolution is performed and the first depth information on which the dyerate convolution is performed; And

And a third filter unit for performing diore rate convolution on depth information fused in the second fusion unit to generate the second depth information.
The method according to claim 1,

Wherein the first depth information generator uses an error value calculated using the following equation in an error back propagation process during a learning process.

In the above equation,
Is an error value used in the error back propagation process by the first depth information generator,
Is an error value of the first depth information,
Is an error value of the second depth information transmitted from the second depth information generator.
A depth information generating apparatus for generating second depth information by fusing stereo camera depth information generated by fusing left and right images acquired from a stereo camera apparatus with radar depth information obtained from a radar apparatus,

A first depth information generator for generating first depth information by fusing the stereo camera depth information and the ladder depth information; And

And a second depth information generator for generating the second depth information by fusing the reference image and the first depth information based on stereo camera depth information fusion among the left and right images,

Wherein the first depth information generator learns reference stereo camera depth information and reference depth information as input values and reference actual depth information as labels,

Wherein the second depth information generator uses the reference stereo camera depth information and the reference depth information as input values and outputs the reference first depth information and the reference reference image generated by the first depth information generator as input values, Depth information is previously learned on the label,

Wherein the first depth information generator is learned by considering an error value of an error back propagation process of the second depth information generator in an error back propagation process in a pre-learning process.
8. The method of claim 7,

Wherein the first depth information generating unit and the second depth information generating unit are learned using a composite neural network algorithm.
9. The method of claim 8,

Wherein the first depth information generating unit and the second depth information generating unit use a dyerate convolution.
8. The method of claim 7,

Wherein the first depth information generator comprises:

A first filter unit for performing dihedral convolution on the ladder depth information;

A second filter unit for performing di-rate convolution on the stereo camera depth information;

A first fused unit for fading the Lada depth information on which the Diya convolution is performed and the stereo camera depth information on which the Diya convolution is performed; And

And a third filter unit for performing a diatomic convolution on the depth information fused by the first fused unit to generate the first depth information.
8. The method of claim 7,

Wherein the second depth information generator comprises:

A fourth filter unit for performing di-rate convolution on the reference image;

A fifth filter unit for performing a diatomic convolution on the first depth information;

A second fusion unit for fusing the reference image on which the dyerate convolution is performed and the first depth information on which the dyerate convolution is performed; And

And a third filter unit for performing the diatomic convolution on the depth information fused by the second fused unit to generate the second depth information.
8. The method of claim 7,

Wherein the first depth information generator uses an error value calculated using the following equation in an error back propagation process in a pre-learning process.

In the above equation,
Is an error value used in the error back propagation process by the first depth information generator,
Is an error value of the first depth information,
Is an error value of the second depth information transmitted from the second depth information generator.
The depth information is generated by learning the method of generating the second depth information by fusing the stereo camera depth information generated by merging the left and right images acquired from the stereo camera device with the Lada depth information obtained from the Lada device As a learning method,

(a) generating first depth information by fusing reference stereo camera depth information and reference ladder depth information; And

(b) generating second depth information by fusing a reference image, which is a reference for fusion of the reference stereo camera depth information, of the reference left image and the reference right image with the first depth information;

(c) learning the step (b) through an error back propagation process with an error value being a second error value which is a difference between the reference actual depth information and the second depth information; And

(d) calculating a sum of the second error value and the reference depth information, which is the difference between the reference depth information and the first depth information, as an error value, And learning the step (a).
14. The method of claim 13,

The step (a)

(a1) performing di-rate convolution on the reference ladder depth information;

(a2) performing di-rate convolution on the reference stereo camera depth information;

(a3) fusing the ladder depth information on which the diaRate convolution is performed and the stereo camera depth information on which the diaRate convolution is performed; And

(a4) generating the first depth information by performing di-rate convolution on the depth information fused in step (a3).
14. The method of claim 13,

The step (b)

(b1) performing divergent convolution on the reference image;

(b2) performing di-rate convolution on the first depth information;

(b3) fusing the reference image on which the diaRate convolution is performed and the first depth information on which the diaRate convolution is performed; And

(b4) performing the divergent convolution on the depth information fused in the step (b3) to generate the second depth information.
14. The method of claim 13,

Wherein the step (d) uses an error value calculated using the following equation.

In the above equation,
Is the error value used in step (d)
Is a first error value,
Is the second error value delivered in step (c).
A depth information generation method for generating second depth information by fusing stereo camera depth information generated by fusing left and right images acquired from a stereo camera device with ladder depth information obtained from a ladder device,

(a) generating first depth information by fusing the stereo camera depth information and the ladder depth information; And

(b) generating the second depth information by fusing the reference image and the first depth information based on the stereoscopic camera depth information fusion among the left and right images,

In the step (a), reference stereo camera depth information and reference depth information are used as input values, and reference actual depth information is used as a label,

Wherein the reference depth information and the reference reference depth image generated by the first depth information generating unit are input values and the reference actual depth Information is pre-learned on the label,

Wherein the step (a) is performed by further considering an error value of the error back propagation process of the step (b) in the error back propagation process in the pre-learning process.
17. The method of claim 16,

The step (a)

(a1) performing di-rate convolution on the ladder depth information;

(a2) performing di-rate convolution on the stereo camera depth information;

(a3) fusing the ladder depth information on which the diaRate convolution is performed and the stereo camera depth information on which the diaRate convolution is performed; And

(a4) generating the first depth information by performing di-rate convolution on the depth information fused in step (a3)

The step (b)

(b1) performing divergent convolution on the reference image;

(b2) performing di-rate convolution on the first depth information;

(b3) fusing the reference image on which the diaRate convolution is performed and the first depth information on which the diaRate convolution is performed; And

(b4) generating depth information by performing divergent convolution on depth information fused in step (b3).
18. The method of claim 17,

Wherein the step (a) uses an error value calculated using the following equation in an error back propagation process in a pre-learning process.

In the above equation,
Is an error value used in step (a)
Is an error value of the first depth information,
Is an error value of the second depth information transmitted in the error back propagation process of step (b).
A computer-readable recording medium on which a learning method for generating depth information according to any one of claims 13 to 16 or a method for generating depth information according to any one of claims 17 to 19 is recorded.