TW201520978A

TW201520978A - Method for estimating two-dimensional image depth values and system thereof

Info

Publication number: TW201520978A
Application number: TW102142562A
Authority: TW
Inventors: Day-Fann Shen; Yung-Ming Chen
Original assignee: Univ Nat Yunlin Sci & Tech
Priority date: 2013-11-22
Filing date: 2013-11-22
Publication date: 2015-06-01
Also published as: TWI524307B

Abstract

The present invention relates to a method for estimating two-dimensional image depth values and a system thereof, and characterized in that a neural network is used, and during a training process of the neural network, single-resolution and multiple-resolution texture images of a 2D sample image are cut into a block pixel for use as an input vector, of which a known depth value is used as a corresponding output, so as to obtain a set of neural weights; then the neural network is used again and loaded with trained neural weights, and the 2D image is used as a redefined input vector, and the input vector is fed into the neural network for obtaining the corresponding depth value, thereby obtaining a depth map of the image in order to facilitate the subsequent transformation of a 2D image into a 3D image.

Description

Method and system for estimating depth value of two-dimensional image

本發明係關於一種2D轉3D的影像處理技術，尤指運用類神經網路的演算法，對二維影像深度值所進行的一種估測技術。The present invention relates to a 2D to 3D image processing technology, and more particularly to an estimation technique for performing 2D image depth values using a neural network-like algorithm.

按，一般在2D至3D轉換上，深度圖(Depth Map)一直扮演著重要的角色。2D影像加上由2D影像透過某種演算法所估測出的深度圖，之後利用深度影像繪圖法(Depth-image-based rendering,DIBR)的技術合成出左眼和右眼影像，然後搭配3D硬體設備，即可呈現出立體視覺的效果，如文獻[12]。Press, Depth Map has always played an important role in 2D to 3D conversion. The 2D image plus the depth map estimated by the 2D image through a certain algorithm, and then the left eye and right eye images are synthesized by the technique of Depth-image-based rendering (DIBR), and then matched with 3D. Hardware devices can present stereoscopic effects, as in the literature [12].

最早的二維影像深度值估測技術係採用左右眼影像深度估測法，其原理仿照人的雙眼其兩眼視差(Disparity)距離約為6cm。由於人的雙眼經由不斷的學習可判斷出物體的遠近，即深度(Depth)。將兩台相機擺放在同一水平線上其相機間隔六公分並同時拍攝同一物體(object)。經由公式的計算可計算出相機到物體的實際深度，如文獻[4]，即可產生深度圖。The earliest 2D image depth value estimation technique uses the left and right eye image depth estimation method, which is based on the principle that the two eyes have a Disparity distance of about 6 cm. Since the eyes of the person can continuously judge the distance of the object, that is, the depth (Depth). Place the two cameras on the same horizontal line with the camera spaced six centimeters apart and simultaneously shoot the same object. The actual depth of the camera to the object can be calculated by the calculation of the formula, as in [4], the depth map can be generated.

往後的二維影像深度值估測技術的發展上，本發明把它歸納為下列兩類：第一類，係在影像分類上的估測如：k-means、分水嶺(Watershed)或邊緣資訊(Edge information)。在深度線索上的估測如：移動視差(Mobile parallax)、線性透視(Linear perspective)、大氣透視(Atmospheric perspective)、紋理梯度(Texture gradient)等等。In the development of the second-dimensional image depth value estimation technology in the future, the present invention classifies it into the following two categories: the first category, which is an estimation of image classification such as: k-means, watershed or edge information. (Edge information). Estimates on depth cues such as Mobile parallax, Linear perspective, Atmospheric perspective, Texture gradient, and more.

但目前只有史丹佛大學採用(Markov Random Field，MRF)訓練方式做range sensor深度圖估測，如文獻[13]，係採用監督式學習的作法(Markov Random Field，MRF)，並把影像分成室內和室外做訓練。由於local影像特徵不足，所以加入影像global內容，使用階層式架構來建立模型，最後所產生之深度圖與實際深度很相似。However, only Stanford University uses the (Markov Random Field, MRF) training method to perform range sensor depth map estimation. For example, the literature [13] adopts the method of supervised learning (Markov Random Field, MRF) and divides the image into indoors. And do training outdoors. Since the local image features are insufficient, the image global content is added, and the hierarchical structure is used to build the model. The resulting depth map is similar to the actual depth.

然上述習式二維影像深度值估測技術作業繁瑣，耗時且所估測的深度值仍不夠準確，故現階段的二維影像深度值估測技術實有改善的空間。However, the above-mentioned two-dimensional image depth value estimation technique is cumbersome, time-consuming and the estimated depth value is still not accurate enough, so the current 2D image depth value estimation technology has room for improvement.

引用文獻：文獻[4]：A. Klaus, M. Sormann, and K. Karner, "Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure," Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, 2006, pp. 15-18.。文獻[12]：Cheng. Chao-Chung, Li. Chung-Te, and Chen. Liang-Gee, "A novel 2Dd-to-3D conversion system using edge information,"Consumer Electronics, IEEE Transactions on, vol. 56, pp. 1739-1745, 2010.。文獻[13]：Ashutosh Saxena, H. Chung-Sung, and Y. Ng-Andrew, "3-D Depth Reconstruction from a Single Still Image,"International Journal of Computer Vision (IJCV) , Aug 2007.。文獻[21]：類神經網路-MATLAB的應用(第三版)，羅華強編著。Citations: [4]: A. Klaus, M. Sormann, and K. Karner, "Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure," Pattern Recognition, 2006. ICPR 2006. 18th International Conference On, 2006, pp. 15-18. Literature [12]: Cheng. Chao-Chung, Li. Chung-Te, and Chen. Liang-Gee, "A novel 2Dd-to-3D conversion system using edge information," Consumer Electronics, IEEE Transactions on, vol. 56, Pp. 1739-1745, 2010. [13]: Ashutosh Saxena, H. Chung-Sung, and Y. Ng-Andrew, "3-D Depth Reconstruction from a Single Still Image," International Journal of Computer Vision (IJCV) , Aug 2007. Literature [21]: The application of neural network-MATLAB (third edition), edited by Luo Huaqiang.

因此本發明之主要目的係提供估測二維影像深度值較為準確的一種二維影像深度值之估測方法及其系統。Therefore, the main object of the present invention is to provide an estimation method and system for estimating a depth value of a two-dimensional image with a relatively accurate depth value of a two-dimensional image.

為達上述目的，本發明一種二維影像深度值之估測方法所運用之技術手段係包含有：（a）、一第一定義步驟，係取一樣本影像並定義其(x,y)_T f輸入向量及所對應深度值d(x,y) 的目標向量；（b）、一創建網路步驟，係建立一類神經網路並對其作參數設置；（c）、一訓練網路步驟，係輸入該(x,y)_T f輸入向量及其深度值的目標向量值，據以訓練該類神經網路，並得一NN權重值；（d）、一第二定義步驟、係定義該樣本影像的(x,y)_t f輸入向量；及（e）、一輸出步驟，係輸入該(x,y)_t f輸入向量於已訓練好的該類神經網路，並載入該NN權重值，可得該樣本影像的估測深度值de(x,y) 。In order to achieve the above object, the technical means for estimating the depth value of the two-dimensional image of the present invention comprises: (a) a first defining step, taking the same image and defining (x, y) _T f input vector and the target vector of the corresponding depth value d(x, y) ; (b), a network creation step, establish a type of neural network and parameterize it; (c), a training network step Entering the target vector value of the (x, y) _T f input vector and its depth value, thereby training the neural network to obtain a NN weight value; (d), a second definition step, and a system definition (x, y) _t f input vector of the sample image; and (e), an output step, inputting the (x, y) _t f input vector to the trained neural network, and loading the The NN weight value gives the estimated depth value de(x, y) of the sample image.

上述估測方法中，另包含一步驟（f），其中該步驟（f）係為一比較步驟，其根據該深度值de(x,y) ，據以評估該類神經網路的訓練效果，若能符合需求，則停止訓練該類神經網路網路並可拿來使用；若未能符合需求，則該(x,y)_T f輸入向量係設為多重輸入向量，或/及增加該類神經網路網路中的神經元數量，並重新訓練該類神經網路，據以能符合需求。The above estimation method further includes a step (f), wherein the step (f) is a comparison step, according to the depth value de(x, y) , to evaluate the training effect of the neural network. If it meets the requirements, it stops training and can be used for this type of neural network; if it does not meet the requirements, the (x, y) _T f input vector is set to multiple input vectors, or / and increase The number of neurons in a neural network is re-trained and re-trained to meet the needs.

上述估測方法，係使用平均絕對誤差(Mean absolute error，MAE)來評估該類神經網路的訓練效果。The above estimation method uses the Mean absolute error (MAE) to evaluate the training effect of this type of neural network.

上述步驟（f）中的(x,y)_T f多重輸入向量係設為一第一重輸入向量、一第二重輸入向量、一第三重重輸入向量及一第四重重輸入向量。The (x, y) _T f multiple input vector in the above step (f) is set as a first re-input vector, a second re-input vector, a third re-input vector, and a fourth multi-input vector.

上述步驟（f）中輸入向量(x,y)_T f的輸入法係採用頭頭相接法、zig-zag法或頭尾相接法。The input method of the input vector (x, y) _T f in the above step (f) is a head-to-head connection method, a zig-zag method or a head-to-tail connection method.

上述步驟（f）中的神經元數量設定至少為700個。The number of neurons in the above step (f) is set to be at least 700.

上述步驟（b）中的類神經網路係設為一倒傳遞網路與一創建前饋網路的組合。The neural network in the above step (b) is set as a combination of a reverse transmission network and a creation of a feedforward network.

上述該類神經網路內部所設的一隱藏層係使用對數雙彎曲轉移函數(logsig)或正切雙彎曲轉移函數(tansig)，又該類神經網路內部所設的一輸出層係使用線性轉移函數(purelin)。A hidden layer set in the above-mentioned neural network uses a log double bending transfer function (logsig) or a tangent double bending transfer function (tansig), and an output layer set inside the neural network uses linear transfer. Function (purelin).

上述該NN權重值係包含一權重（W）及一偏移值（b）。The NN weight value described above includes a weight (W) and an offset value (b).

一種二維影像深度值之估測系統，係使用上述估測方法所建構的系統。A system for estimating a depth value of a two-dimensional image is a system constructed using the above estimation method.

藉由上述之技術手段具有如下功效之增進： 1.本發明係率先提出一新的二維影像深度值之估測方法，運用類神經網路於二維影像之深度圖估測，可以有效地、精準地預測2D影像的深度圖之深度值；及 2.本發明經由提出多重解析度預測方法，在雙重預測下(MAE可達到8.47)，在三重預測下(MAE可達到7.23)，在(MAE可達到5.47)，為了在提升其預測精準度，於該類神經往路的訓練過程中，增加網路神經元數量，當神經元數量增加到1000時(MAE可達到1.0)。故本發明當增加重數及神經元數量時可有效提升2D影像的深度值之預測精確度。The above technical means have the following effects: 1. The present invention is the first to propose a new method for estimating the depth value of a two-dimensional image, and using the neural network to estimate the depth map of the two-dimensional image can effectively Precisely predicting the depth value of the depth map of the 2D image; and 2. The present invention proposes a multiple resolution prediction method under double prediction (MAE can reach 8.47), and under triple prediction (MAE can reach 7.23), at ( MAE can reach 5.47). In order to improve its prediction accuracy, the number of network neurons is increased during the training of such nerves. When the number of neurons increases to 1000 (MAE can reach 1.0). Therefore, the present invention can effectively improve the prediction accuracy of the depth value of the 2D image when the number of the number of neurons and the number of neurons are increased.

請參閱第1圖、第2圖所示，本發明係關於一種二維影像深度值之估測方法及其系統，係包含有：（a）、一第一定義步驟，係取一樣本影像並定義其(x,y)_T f輸入向量及所對應深度值d(x,y) 的目標向量；（b）、一創建網路步驟，係建立一類神經網路並對其作參數設置；（c）、一訓練網路步驟，係輸入該(x,y)_T f輸入向量及其深度值的目標向量值，據以訓練該類神經網路，並得一NN權重值；（d）、一第二定義步驟、係定義該樣本影像的(x,y)_t f輸入向量；（e）、一輸出步驟，係輸入該(x,y)_t f輸入向量於已訓練好的該類神經網路，並載入該NN權重值，可得該樣本影像的估測深度值de(x,y) ；及（f）、一比較步驟，其係根據該深度值de(x,y) ，據以評估該類神經網路的訓練效果，若符合需求，則停止訓練該類神經網路，並拿來運用；若未能符合需求，則該輸入向量(x,y_T f)係設為多重輸入向量，或/及增加該類神經網路網路中的神經元數量，並重新訓練該類神經網路，據以符合需求。茲進一步說明本發明方法的各步驟如下：Referring to FIG. 1 and FIG. 2 , the present invention relates to a method for estimating a depth value of a two-dimensional image and a system thereof, comprising: (a) a first defining step, taking the same image and Defining its (x,y) _T f input vector and the target vector of the corresponding depth value d(x,y) ; (b), creating a network, establishing a neural network and parameterizing it; c) a training network step of inputting the target vector value of the (x, y) _T f input vector and its depth value, thereby training the neural network to obtain a NN weight value; (d), a second defining step of defining an (x, y) _t f input vector of the sample image; (e) an output step of inputting the (x, y) _t f input vector to the trained neural field The network, and loading the NN weight value, obtains the estimated depth value de(x, y) of the sample image; and (f), a comparison step according to the depth value de(x, y) , According to the evaluation of the training effect of the neural network, if the requirements are met, the training of the neural network is stopped and used; if the demand is not met, the input vector (x, y _T f) is set to multiple input vectors, or / and increase the number of neurons in this type of neural network, and retrain this type of neural network to meet the needs. Further illustrate that the steps of the method of the invention are as follows:

本發明係採用類神經網路(以下簡稱NN)技術及灰階紋路影像來做深度圖估測，並把深度圖估測分成兩個過程：（a）訓練過程，（b）預測過程。如第2圖所示為本發明NN深度圖訓練與預測之系統架構。而下表1係為NN深度圖訓練與預測符號定義。【表1】 The present invention uses a neural network-like (hereinafter referred to as NN) technique and a gray-scale texture image to perform depth map estimation, and divides the depth map estimation into two processes: (a) a training process, and (b) a prediction process. As shown in Fig. 2, the system architecture of the NN depth map training and prediction of the present invention is shown. Table 1 below is the NN depth map training and prediction symbol definition. 【Table 1】

關於本發明NN訓練過程：在訓練過程中，首先輸入灰階紋路影像(x,y)_T f及灰階深度圖d(x,y) ，由單重或多重灰階紋路影像切方塊像素作為輸入向量，而其已知之深度值做為相對應之輸出，來訓練類神經網路，其訓練結果是一組神經元的權重值(W)。Regarding the NN training process of the present invention, in the training process, first input gray scale image (x, y) _T f and gray depth map d (x, y) , and cut square pixels from single or multiple gray scale image The vector is input, and its known depth value is used as the corresponding output to train the neural network. The training result is the weight value (W) of a group of neurons.

關於本發明預測過程中：在預測過程中，分成兩個樣本做為測試。一、測試樣本1.灰階紋路影像(x,y)_t1 f。二、測試樣本2.灰階紋路影像(x,y)_t2 f。依輸入之二維灰階紋路影像取出輸入向量，其次載入已訓練好之神經元權重值，最後輸出估測深度值de(x,y) ，即完成整個深度圖訓練及預測的流程。Regarding the prediction process of the present invention: in the prediction process, two samples are divided into tests. First, the test sample 1. Grayscale texture image (x, y) _t1 f. Second, the test sample 2. Grayscale texture image (x, y) _t2 f. The input vector is extracted according to the input two-dimensional gray-scale texture image, and then the trained neuron weight value is loaded, and finally the estimated depth value de(x, y) is output, that is, the entire depth map training and prediction process is completed.

關於本發明第一定義步驟（a）方面：在定義輸入向量及目標值之前，首先介紹本發明所使用測試樣本的影像以及其各重(multi-resolution)之間的關係。而下表2係為灰階紋路影像與深度圖之對應關係。【表2】 Regarding the first definition step (a) of the present invention, before defining the input vector and the target value, the image of the test sample used in the present invention and the relationship between its multi-resolution are first introduced. The following table 2 is the correspondence between grayscale image and depth map. 【Table 2】

本發明類神經網路（Neural Network）的每一個輸入向量（Input Vector）輸入是一個固定5*5的block，可以取自不同的紋理層（Texture Layer）。表2的灰階紋路影像與深度圖之對應關係，其關係式如下：N(i)=(I(i)-n+1)*(J(i)-n+1)(block：n*n，i：1~8)；P(i)*Q(i)=2^(i-1) *2^(i-1) 。Each Input Vector input of the Neural Network of the present invention is a fixed 5*5 block that can be taken from a different Texture Layer. The correspondence between the gray-scale texture image and the depth map in Table 2 is as follows: N(i)=(I(i)-n+1)*(J(i)-n+1)(block:n* n, i: 1~8); P(i)*Q(i)=2 ^(i-1) *2 ^(i-1) .

本發明經由初步實驗結果得知，由於輸入向量太多導致電腦設備無法負荷其運算。為了減少輸入向量，故重新定義其灰階紋路影像大小與深度圖對應關係及其關係式，如表3所示者。【表3】 The present invention has found through preliminary experimental results that the computer device cannot load its operation due to too many input vectors. In order to reduce the input vector, the relationship between the grayscale texture image size and the depth map and its relation are redefined, as shown in Table 3. 【table 3】

上表的關係式：N(i)=(I(i)-n+1)*(J(i)-n+1)；(block：n*n，i：4~8)；其中下表4係為上述簡稱的意義。【表4】 The relationship of the above table: N(i)=(I(i)-n+1)*(J(i)-n+1); (block:n*n,i:4~8); 4 is the meaning of the above abbreviation. 【Table 4】

單重解析度定義輸入向量及目標值：本發明所使用的輸入灰階紋路影像(x,y)_T f和灰階深度值d(x,y) 大小皆為68*90像素，擷取向量大小為5*5像素，且採用重疊（overlapping）的方式去作向量擷取，擷取方式依序由上到下，由左到右（間隔皆為一個像素）。Single resolution defines the input vector and the target value: the input grayscale texture image (x, y) _T f and the grayscale depth value d (x, y) used in the present invention are both 68*90 pixels, and the vector is captured. The size is 5*5 pixels, and the overlay method is used for vector capture. The capture method is from top to bottom, from left to right (the interval is one pixel).

首先輸入灰階紋路影像(x,y)₄ f和與其對應之灰階深度值d(x,y) ，經由本文所提出擷取向量大小及方式，去定義輸入向量及目標值，如第3a圖所示者。First, input the grayscale image (x, y) ₄ f and its corresponding grayscale depth value d (x, y) , and define the input vector and target value, such as the 3a, by the vector size and mode proposed in this paper. The figure shown.

其中該(x,y)₄ F係以(x,y)₄ f為中心之5*5向量。擷取向量公式=(x-2：x+2,y-2：y+2)，如第3b圖所示。而且該(x,y)₄ f影像起始範圍：x∈[1：M]，y∈[1：N]；而(x,y)₄ F輸入向量中心點有效坐標範圍：x∈[3：M-2]，y∈[3：N-2]。其中該d(x,y) 目標值起始範圍：x∈[1：M]，y∈[1：N]，且該d(x,y) 目標值有效對應坐標範圍：x∈[3：M-2]，y∈[3：N-2]。Wherein (x, y) ₄ F is a 5*5 vector centered on (x, y) ₄ f . Take the vector formula = (x-2: x + 2, y-2: y + 2), as shown in Figure 3b. Moreover, the (x, y) ₄ f image starting range: x ∈ [1: M], y ∈ [1: N]; and (x, y) ₄ F input vector center point effective coordinate range: x ∈ [3 :M-2], y∈[3:N-2]. Wherein the d(x,y) target value starting range: x∈[1:M], y∈[1:N], and the d(x,y) target value is valid corresponding to the coordinate range: x∈[3: M-2], y∈[3:N-2].

關於本發明創建網路網路步驟（b）方面：本發明使用Matlab7.14版Neural Network Toolbox7.0.3，來建立倒傳遞網路的架構。每個Input都用一個適當的權重值(Weight)來加權，經過加權後的輸入和偏移值(bias)總和，形成轉移函數f 的輸入。神經元能使用任何可微分的轉移函數f 來產生神經元的輸出，如第4圖所示。倒傳遞網路具有權重值、偏移值，且其隱藏層(Hidden Layer)具有雙彎曲轉移函數（logsig或tansig函式），如第5a圖及第5b圖所示者，而其輸出層(Output Layer)具有線性轉移函數（purelin函式），如第5c圖所示，如此可使得網路能夠逼近於有限個不連續點的任何函數。下表5係為神經元模型各簡稱之說明。【表5】 Regarding the step (b) of creating a network network according to the present invention: the present invention uses the Matlab version 7.14 Neural Network Toolbox 7.0.3 to establish an architecture of the reverse transfer network. Each Input is weighted with an appropriate weight, and the weighted input and the sum of the offsets form the input to the transfer function f . The neuron can use any divisible transfer function f to produce the output of the neuron, as shown in Figure 4. The inverted transfer network has a weight value, an offset value, and its hidden layer (Hidden Layer) has a double-bend transfer function (logsig or tansig function), as shown in Figures 5a and 5b, and its output layer ( The Output Layer) has a linear transfer function (purelin function), as shown in Figure 5c, which allows the network to approximate any function of a finite number of discontinuities. Table 5 below is a description of each abbreviation of the neuron model. 【table 5】

由單重或多重紋路灰階影像切方塊像素作為輸入向量，而其已知之深度值做為相對應之輸出，來訓練本發明的類神經網路。因為MATLAB Neural Network Toolbox網路的只允許一維的輸入向量，所以把5*5輸入向量轉為25*1（係採頭頭相接的方式），其轉換方式如第6圖所示。故本發明的倒傳遞網路架構(單重解析度深度預測)，可如第7圖所示。The single- or multi-grain grayscale image is used as an input vector, and its known depth value is used as a corresponding output to train the neural network of the present invention. Because the MATLAB Neural Network Toolbox network only allows one-dimensional input vectors, the 5*5 input vector is converted to 25*1 (the way the head is connected), as shown in Figure 6. Therefore, the inverse transfer network architecture (single resolution depth prediction) of the present invention can be as shown in FIG.

關於本發明類神經網路的參數設定：經由已知實際估測深度，如表6所示。初步實驗，訓練及預測都採用68*90影像大小，擷取局部的預測結果來做為觀察。當隱藏層使用logsig轉移函數時，所造成預測結果不佳，如表7所示。而使用tansig轉移函數時預測效果與實際估測深度比較相近，如表8所示。所以本發明隱藏層轉移函數選用tansig會較佳，而輸出層選用purelin。【表6】【表7】【表8】 Parameter setting for the neural network of the present invention: the depth is estimated by known actual, as shown in Table 6. Preliminary experiments, training and predictions were performed using 68*90 image sizes, and local predictions were taken for observation. When the hidden layer uses the logsig transfer function, the prediction results are not good, as shown in Table 7. The prediction effect is similar to the actual estimated depth when using the tansig transfer function, as shown in Table 8. Therefore, it is better to use tansig for the hidden layer transfer function of the present invention, and purein for the output layer. [Table 6] [Table 7] [Table 8]

本發明進一步創建前饋網路(Feedforward Network)MATLAB指令：newff，其指令語法如下：net=newff(Iv,[N₁ N₂ …N_i ],…{TF₁ TF₂ …TF_i },BTF,BLF,PF)。而表9則為創建前饋網路符號說明。【表9】 The present invention further creates a Feedforward Network MATLAB instruction: newff, whose instruction syntax is as follows: net=newff(Iv,[N ₁ N ₂ ...N _i ],...{TF ₁ TF ₂ ...TF _i }, BTF , BLF, PF). Table 9 shows the description of the feedforward network symbol. [Table 9]

通常該前饋網路時常具有一個或多個隱藏層，而本發明單重解析度深度預測網路架構係參考文獻[21]，而此網路中所使用的權重值和偏移值總數為18901個。其算法如式1：【式1】 Usually, the feedforward network often has one or more hidden layers, and the single-resolution deep prediction network architecture of the present invention is referred to [21], and the total weight value and offset value used in the network are 18,901. The algorithm is as shown in Equation 1: [Formula 1]

首先創建一個具輸入元素的二層前饋網路，其中輸入層輸入25個具代表性特徵向量，隱藏層(第一層)有700個神經元，輸出層(第二層)有1個神經元；第一層的轉移函數是正切雙彎曲轉移函數tansig，第二層的轉移函數是線性轉移函數purelin，網路的輸出能夠取任意值。訓練函數是Levenberg-Marquardt演算法trainlm。且訓練最大的迭代次數為200，而最終的性能目標為1e-5。每一個輸入都用一個適當的權重值(W)來加權。此加權後輸入和偏移值(b)總和，形成轉移函數f 的輸入。神經元能使用任何可微分的轉移函數f 來產生神經元的輸出，如第8a圖所示。First create a two-layer feedforward network with input elements, where the input layer inputs 25 representative feature vectors, the hidden layer (the first layer) has 700 neurons, and the output layer (the second layer) has 1 nerve. The transfer function of the first layer is the tangent double bending transfer function tansig, and the transfer function of the second layer is the linear transfer function purelin, and the output of the network can take any value. The training function is the Levenberg-Marquardt algorithm trainlm. And the maximum number of iterations for training is 200, and the final performance goal is 1e-5. Each input is weighted with an appropriate weight value (W). The input of this weighted sum and the offset value (b) form the input of the transfer function f . The neuron can use any divisible transfer function f to produce the output of the neuron, as shown in Figure 8a.

關於本發明訓練網路步驟（c）方面：網路創建好之後，即可開始進行訓練網路的動作。本發明使用平均絕對誤差(Mean absolute error，MAE)來評估網路學習效果，在訓練網路過程中直到網路訓練出的深度值與已知實際值估測深度誤差不大下即可停止網路訓練，並儲存此網路用來做接下來的深度預測。訓練影像為1張大小皆為68*90。Regarding the step (c) of the training network of the present invention: after the network is created, the action of the training network can be started. The invention uses the Mean absolute error (MAE) to evaluate the network learning effect, and stops the network during the training network process until the depth value of the network training and the known actual value estimation depth are not large. Road training, and store this network for the next deep prediction. The training image is 68*90 in size.

所謂訓練過程(Training Process)：訓練影像(x,y)₄ f為68*90，取出25*1 pixel為輸入向量與其相對應之深度值d(x,y) 來訓練網路模型，將其稱之為「訓練過程」。直到網路能夠逼近於一個符合所需的深度值，即停止網路的訓練，第8b圖係為BPN訓練過程。The so-called training process (Training Process): training image (x, y) ₄ f is 68 * 90, take 25 * 1 pixel as the input vector and its corresponding depth value d (x, y) to train the network model, Call it the "training process." Until the network can approach a depth value that meets the required depth, that is, stop network training, Figure 8b is the BPN training process.

關於本發明第二定義步驟（d）方面：定義新的輸入向量，並輸入訓練灰階紋路影像(x,y)_T f及測試灰階紋路影像(x,y)_t f，其輸入擷取向量方式及大小等同本發明的步驟（a）。新定義的輸入向量將用在於預測網路所使用，其範圍如下：(x,y)_T f輸入向量範圍：x[1：M]，y[1：N]，輸入向量中心點有效坐標範圍x[3：M-2]，y[3：N-2]。而(x,y)_t f 輸入向量範圍：x[1：2*M]，y[1：2*N]，輸入向量中心點有效坐標範圍：x[3：2*M-2]，y[3：2*N-2]。而下表10為定義新的輸入向量一覽表。【表10】 A second defining step (d) The aspect of the invention relates to: define a new input vector, and the training input grayscale image lines (x, y) _T f and test grayscale image lines (x, y) _t f, which is input to the capture The amount and size are equivalent to step (a) of the present invention. The newly defined input vector will be used in the prediction network, and its scope is as follows: (x, y) _T f input vector range: x [1:M],y [1:N], input vector center point effective coordinate range x [3:M-2],y [3: N-2]. And (x,y) _t f input vector range: x [1:2*M],y [1:2*N], input vector center point effective coordinate range: x [3:2*M-2],y [3:2*N-2]. Table 10 below defines a new list of input vectors. [Table 10]

又預測網路的預測過程(Prediction Process)，係經由已訓練過的網路模型來預測答案，將其稱之為「預測過程」。當給定此網路從不曾看過的輸入向量時，網路傾向於給定合理的答案，輸出所需的深度值，如第9a圖所示者為BPN預測過程。It also predicts the network's prediction process, which is based on the trained network model to predict the answer, which is called the "predictive process." Given an input vector that the network has never seen before, the network tends to give a reasonable answer and output the required depth value, as shown in Figure 9a for the BPN prediction process.

本發明分成二測試樣本去做深度預測，測試樣本1.預測：f_T =f₄ ,f_t =f₄ ；測試樣本2.預測：f_T =f₄ ,f_t =f₃ 。而關於d(x,y)，該d(x,y)起始範圍：x[1：M]，y[1：N]，該d(x,y)有效範圍：x[3：M-2]，y[3：N-2]。關於de(x,y)，該de(x,y)起始範圍：x[3：M-2]，y[3：N-2]，該de(x,y)有效範圍：x[3：M-2]，y[3：N-2]，如第9b圖所示者。The present invention divides into two test samples for depth prediction, test sample 1. Prediction: f _T = f ₄ , f _t = f ₄ ; test sample 2. Prediction: f _T = f ₄ , f _t = f ₃ . And for d(x, y), the starting range of d(x, y): x [1:M],y [1:N], the valid range of d(x,y): x [3:M-2],y [3: N-2]. Regarding de(x, y), the starting range of de(x, y): x [3:M-2],y [3:N-2], the de(x,y) effective range: x [3:M-2],y [3: N-2], as shown in Figure 9b.

本發明拿兩張影像Image01和Image02來做為測試，第10a圖及第10b圖將顯示出所預測的結果及參數設置。為了比較已知實際深度d(x,y) 與預測深度de(x,y) 有何差異，本發明使用平均絕對誤差(Mean absolute error，MAE)來評估網路學習效果，其計算公式如式2。其中，n為輸出深度值個數，d 為已知實際深度值，de 為預測深度值。【式2】 The present invention takes two images Image01 and Image02 as tests, and Figures 10a and 10b show the predicted results and parameter settings. In order to compare the difference between the known actual depth d(x, y) and the predicted depth de(x, y) , the present invention uses the Mean absolute error (MAE) to evaluate the network learning effect, and the calculation formula is as follows. 2. Where n is the number of output depth values, d is the known actual depth value, and de is the predicted depth value. [Formula 2]

而Neural Network訓練參數為(#neuron=700,hidden layer=1,epochs=200,goal=1e–5)。The Neural Network training parameters are (#neuron=700, hidden layer=1, epochs=200, goal=1e–5).

本發明所提出的初步深度預測原理及架構，經由實驗結果得知，在單重解析度預測下，由測試樣本1.及測試樣本2.所預測出的結果顯示。當在單重解析度深度預測下的方法，雖然可以預測(x,y)₄ f的輸入向量，但對於(x,y)₃ f的預測卻無法得到所需的結果。其一，推測其預測失敗的原因在於訓練的過程中輸入灰階紋路影像向量還不夠多，導致在預測時造成失敗。其二，在倒傳遞類神經網路的參數配置上無法掌握，如：神經元數量、隱層藏層數及一些細部的參數的調整且目前在神經元數量及隱藏層層數配置上並無一定規範，只能不斷嘗試去找尋最佳答案。The principle and structure of the preliminary depth prediction proposed by the present invention are obtained through experimental results, and the results predicted by the test sample 1. and the test sample 2. are displayed under the single-resolution prediction. When the process at the predicted depth resolution singlet, although predictable (x, y) of the input vector ₄ f, but for the (x, y) ₃ f prediction can not give the desired results. First, the reason for the prediction failure is that the input gray-scale image vector is not enough during the training process, resulting in failure in prediction. Second, it is impossible to grasp the parameter configuration of the inverted transit neural network, such as the number of neurons, the number of hidden layers and the adjustment of some detailed parameters. Currently, there is no configuration of the number of neurons and the number of hidden layers. Certainly standard, can only try to find the best answer.

為改善單重解析度預測的缺失，接著本發明採用史丹佛大學所提出的階層式架構(Hierarchical Structure)的概念，如文獻[13]。利用此概念本發明把單重解析度預測加以改進，並發展出多重解析度深度預測。底下本發明會先介紹多重解析度深度預測結果，其次對於所預測出之深度圖進行主客觀的評估，最後使用深度影像繪圖法(Depth-image-based rendering,DIBR)的技術合成出左右眼影像。在透過3D顯示器，即可呈現出立體影像(Stereoscopic Image)。In order to improve the lack of single-resolution prediction, the present invention uses the concept of Hierarchical Structure proposed by Stanford University, as in the literature [13]. Using this concept, the present invention improves the single-resolution prediction and develops multiple-resolution depth prediction. The present invention first introduces the multi-resolution depth prediction result, and then the subjective and objective evaluation of the predicted depth map, and finally uses the Depth-image-based rendering (DIBR) technique to synthesize the left and right eye images. . Stereoscopic Image can be presented through the 3D display.

而本發明增加多重目的：係在訓練過程中，增加灰階紋路影像向量來做訓練，使得在做深度預測時可以提升其精準度。接下來本發明將開始介紹多重解析度預測。The present invention adds multiple purposes: in the training process, the grayscale texture image vector is added for training, so that the accuracy can be improved when performing depth prediction. Next, the present invention will begin to introduce multiple resolution predictions.

首先，定義各重輸入向量，其不同點在於訓練過程中所定義的輸入向量(x,y)_T f與對應之目標值d(x,y) 不同，其餘步驟同單重解析度深度預測，如第11圖及第12a圖所示。第一重輸入向量F₄ (x,y)：以(x,y)₄ f為中心之5*5向量，(x,y)₄ f起始範圍：x[1：M]，y[1：N]。第二重輸入向量F₅ (x,y)：以₅ f(⌈⌉,⌈⌉)為中心之5*5向量，(x,y)₅ f起始範圍：x[1：M/2]，y[1：N/2]。第三重輸入向量F₆ (x,y)：以₆ f(⌈⌉,⌈⌉)為中心之5*5向量，(x,y)₆ f起始範圍：x[1：M/4]，y[1：N/4]。第四重輸入向量F₇ (x,y)：以₇ f(⌈⌉,⌈⌉)為中心之5*5向量，(x,y)₇ f 起始範圍：x[1：M/8]，y[1：N/8]。First, define each heavy input vector, the difference is that the input vector (x, y) _T f defined in the training process is different from the corresponding target value d (x, y) , and the remaining steps are the same as the single-resolution depth prediction. As shown in Figure 11 and Figure 12a. A first weighted input vector F ₄ (x, y): In (x, y) ₄ f 5 * 5 vector centers, (x, y) ₄ f start range: x [1:M],y [1:N]. The second heavy input vector F ₅ (x, y): at ₅ f (⌈ Hey, hey. ⌉) is the center of the 5*5 vector, (x, y) ₅ f starting range: x [1:M/2],y [1:N/2]. The third heavy input vector F ₆ (x, y): with ₆ f (⌈ Hey, hey. ⌉) is the center of the 5*5 vector, (x, y) ₆ f starting range: x [1:M/4],y [1:N/4]. The fourth heavy input vector F ₇ (x, y): at ₇ f (⌈ Hey, hey. ⌉) is the center of the 5*5 vector, (x, y) ₇ f starting range: x [1:M/8],y [1:N/8].

關於雙重解析度預測方面：其不同點在於訓練過程中所定義的輸入向量與目標值不同，其餘步驟同單重解析度預測，如第12b圖係為多重解析度訓練過程。其倒傳遞網路架構(多重解析度預測)，如第13圖所示。Regarding the double-resolution prediction aspect, the difference is that the input vector defined in the training process is different from the target value, and the remaining steps are the same as the single-resolution prediction, as shown in Fig. 12b as the multi-resolution training process. Its inverted network architecture (multiple resolution prediction) is shown in Figure 13.

關於雙重解析度定義輸入向量及目標值：第一重輸入向量(x,y)₄ F，第二重輸入向量(x,y)₅ F，其向量擷取中心點(x,y)₄ f和(x,y)₅ f與其相對應之深度值d(x,y) 中心點，如第14圖所示，其算法及過程如下列步驟：For double resolution, define the input vector and the target value: the first heavy input vector (x, y) ₄ F, the second heavy input vector (x, y) ₅ F, whose vector draws the center point (x, y) ₄ f And (x, y) ₅ f and its corresponding depth value d (x, y) center point, as shown in Figure 14, the algorithm and process are as follows:

第一重輸入向量F₄ (x,y)：(x,y)₄ f起始範圍：x[1：M]，y[1：N]，而(x,y)₄ F輸入向量中心點有效坐標範圍：x[5：M-4]，y[5：N-4]。第二重輸入向量F₅ (x,y)：一開始先做(x,y)₄ f平均次取樣成(x,y)₅ f，其公式如下：(x,₅ f=f₄ (x,y)，其x,y範圍：x[1：M]，y[1：N]，且，該(x,y)₅ f起始範圍：x[1：M/2]，y[1：N/2]，而該(x,y)₅ F輸入向量中心點有效坐標範圍：x[3：M/2-2]，y[3：N/2-2]。The first heavy input vector F ₄ (x, y): (x, y) ₄ f starting range: x [1:M],y [1:N], and (x,y) ₄ F input vector center point effective coordinate range: x [5:M-4],y [5: N-4]. The second heavy input vector F ₅ (x, y): firstly (x, y) ₄ f is first averaged into (x, y) ₅ f, and its formula is as follows: (x, ₅ f = f ₄ (x , y), its x, y range: x [1:M],y [1:N], and , the (x,y) ₅ f starting range: x [1:M/2],y [1:N/2], and the (x,y) ₅ F input vector center point effective coordinate range: x [3:M/2-2],y [3:N/2-2].

另d(x,y) ，該d(x,y) 起始範圍：x[1：M]，y[1：N]，而該d(x,y) 目標值中心點有效坐標範圍：x[5：M-4]，y[5：N-4]。Another d(x,y) , the starting range of the d(x,y) : x [1:M],y [1:N], and the d(x,y) target value center point effective coordinate range: x [5:M-4],y [5: N-4].

關於三重解析度預測，三重解析度定義輸入向量及目標值：第一重輸入向量(x,y)₄ F，第二重輸入向量(x,y)₅ F，第三重輸入向量(x,y)₆ F，其向量擷取中心點(x,y)₄ f、(x,y)₅ f及(x,y)₆ f與其相對應之深度值d(x,y) 中心點，如第15圖所示，其算法及過程如下列步驟：Regarding triple resolution prediction, triple resolution defines the input vector and the target value: the first heavy input vector (x, y) ₄ F, the second heavy input vector (x, y) ₅ F, and the third heavy input vector (x, y) ₆ F, whose vector is extracted from the center point (x, y) ₄ f, (x, y) ₅ f and (x, y) ₆ f and its corresponding depth value d (x, y) center point, such as As shown in Figure 15, the algorithm and process are as follows:

第一重輸入向量F₄ (x,y)：1.f₄ (x,y)起始範圍：x[1：M]，y[1：N]；2.F₄ (x,y)輸入向量中心點有效坐標範圍：x[10：M-9]，y[10：N-9]第二重輸入向量F₅ (x,y)：1.一開始先做(x,y)₄ f平均次取樣成(x,y)₅ f，其公式如下：(x₅ f=f₄ (x,y)，其x,y範圍：x[1：M]，y[1：N]。；其中(x,y)₅ f 起始範圍：x[1：M/2]，y[1：N/2]，該(x,y)₅ F輸入向量中心點有效坐標範圍：x[5：M/2-4]，y[5：N/2-4]。第三重輸入向量F₆ (x,y)：1.一開始先做(x,y)₅ f平均次取樣成(x,y)₆ f，其公式如下：2.f₆ (x,y)= f₅ (x,y)，其x,y範圍：x[1：M/2]，y[1：N/2]，；3.f₆ (x,y)起始範圍：x[1：M/4]，y[1：N/4]；4.F₆ (x,y)輸入向量中心點有效坐標範圍：x[3：M/4-2]，y[3：N/4-2]。另該d(x,y) ：1.d(x,y) 起始範圍：x[1：M]，y[1：N]；2.d(x,y) 目標值中心點有效坐標範圍：x[10：M-9]，y[10：N-9]。The first heavy input vector F ₄ (x, y): 1.f ₄ (x, y) starting range: x [1:M],y [1:N];2.F ₄ (x,y) input vector center point effective coordinate range: x [10:M-9],y [10: N-9] The second heavy input vector F ₅ (x, y): 1. First, do (x, y) ₄ f average sub-sampling into (x, y) ₅ f, the formula is as follows: x ₅ f=f ₄ (x,y), its x,y range:x [1:M],y [1:N]. Where (x,y) ₅ f starting range: x [1:M/2],y [1:N/2], the (x,y) ₅ F input vector center point effective coordinate range: x [5:M/2-4],y [5: N/2-4]. The third heavy input vector F ₆ (x, y): 1. First, do (x, y) ₅ f average sub-sample into (x, y) ₆ f, the formula is as follows: 2.f ₆ (x, y ) = f ₅ (x, y), its x, y range: x [1:M/2],y [1:N/2], ;3.f ₆ (x,y) starting range: x [1:M/4],y [1:N/4];4.F ₆ (x,y) input vector center point effective coordinate range: x [3:M/4-2],y [3: N/4-2]. Another d(x,y) :1. d(x,y) starting range: x [1:M],y [1:N];2. d(x,y) target value center point effective coordinate range: x [10:M-9],y [10:N-9].

關於四重解析度預測：四重解析度定義輸入向量及目標值，第一重輸入向量(x,y)₄ F，第二重輸入向量(x,y)₅ F，第三重輸入向量(x,y)₆ F，第四重輸入向量(x,y)₇ F，其向量擷取中心點(x,y)₄ f、(x,y)₅ f、(x,y)₆ f及(x,y)₇ f與其相對應之深度值d(x,y) 中心點，如第11圖所示，其算法及過程如下列步驟：Regarding quadratic resolution prediction: quadruple resolution defines the input vector and the target value, the first heavy input vector (x, y) ₄ F, the second heavy input vector (x, y) ₅ F, and the third heavy input vector ( x, y) ₆ F, the fourth heavy input vector (x, y) ₇ F, whose vector draws the center point (x, y) ₄ f, (x, y) ₅ f, (x, y) ₆ f and (x, y) ₇ f corresponds to the depth point d (x, y) center point, as shown in Figure 11, the algorithm and process are as follows:

第一重輸入向量F₄ (x,y)：1.f₄ (x,y)起始範圍：x[1：M]，y[1：N]；2.F₄ (x,y)輸入向量中心點有效坐標範圍：x[20：M-19]，y[20：N-19]。第二重輸入向量F₅ (x,y)：1.一開始先做(x,y)₄ f平均次取樣成(x,y)₅ f，其公式如下；2.f₅ (x,y=f₄ (x,y)，其x,y範圍：x[1：M]，y[1：N]，；3.f₅ (x,y)起始範圍：x[1：M/2]，y[1：N/2]；4.F₅ (x,y)輸入向量中心點有效坐標範圍：x[10：M/2-9]，y[10：N/2-9]。第三重輸入向量F₆ (x,y)：1.一開始先做(x,y)₅ f平均次取樣成(x,y)₆ f，其公式如下；2.f₆ (x,y)=f₅ (x,y)，其x,y範圍：x[1：M/2]，y[1：N/2]，；3.f₆ (x,y)起始範圍：x[1：M/4]，y[1：N/4]；4.F₆ (x,y)輸入向量中心點有效坐標範圍：x[5：M/4-4]，y[5：N/4-4]。第四重輸入向量F₇ (x,y)：1.一開始先做(x,y)₆ f平均次取樣成(x,y)₇ f，其公式如下；2.f₇ (x,y)=f₆ (x,y)，其x,y範圍：x[1：M/4]，y[1：N/4]，；3.f₇ (x,y)起始範圍：x[1：M/8]，y[1：N/8]；4.F₇ (x,y)輸入向量中心點有效坐標範圍：x[3：M/4-2]，y[3：N/4-2]；另d(x,y) ：1.d(x,y) 起始範圍：x[1：M]，y[1：N]；2.d(x,y) 目標值中心點有效坐標範圍：x[20：M-19]，y[20：N-19]；3.額外加入global資訊，且將MATLAB指令：imresize，並使用imresize將M*N影像縮小成5*5即可。The first heavy input vector F ₄ (x, y): 1.f ₄ (x, y) starting range: x [1:M],y [1:N];2.F ₄ (x,y) input vector center point effective coordinate range: x [20:M-19],y [20:N-19]. The second heavy input vector F ₅ (x, y): 1. First, do (x, y) ₄ f average sub-sampling into (x, y) ₅ f, the formula is as follows; 2.f ₅ (x, y =f ₄ (x,y), its x,y range:x [1:M],y [1:N], ;3.f ₅ (x,y) starting range: x [1:M/2],y [1:N/2];4.F ₅ (x,y) input vector center point effective coordinate range: x [10:M/2-9],y [10: N/2-9]. The third input vector F ₆ (x, y): 1. First, do (x, y) ₅ f average sub-sample into (x, y) ₆ f, the formula is as follows; 2.f ₆ (x, y )=f ₅ (x,y), its x,y range:x [1:M/2],y [1:N/2], ;3.f ₆ (x,y) starting range: x [1:M/4],y [1:N/4];4.F ₆ (x,y) input vector center point effective coordinate range: x [5:M/4-4],y [5: N/4-4]. The fourth heavy input vector F ₇ (x, y): 1. First, do (x, y) ₆ f average sub-sample into (x, y) ₇ f, the formula is as follows; 2.f ₇ (x, y )=f ₆ (x,y), its x,y range:x [1:M/4],y [1:N/4], ;3.f ₇ (x,y) starting range: x [1:M/8],y [1:N/8];4.F ₇ (x,y) input vector center point effective coordinate range: x [3:M/4-2],y [3:N/4-2]; another d(x,y) :1. d(x,y) starting range: x [1:M],y [1:N];2. d(x,y) target value center point effective coordinate range: x [20:M-19],y [20:N-19]; 3.Add extra global information, and use the MATLAB command: imresize, and use imresize to reduce the M*N image to 5*5.

本發明將取如下的二測試樣本進行深度值的估測實驗，測試樣本一：1.雙重預測(f_T =f₄ +f₅ ,f_t =f₄ )；2.三重預測(f_T =f₄ +f₅ +f₆ ,f_t =f₄ )；3.四重預測(f_T =f₄ +f₅ +f₆ +f₇ ,f_t =f₄ )。測試樣本二：1.雙重預測(f_T =f₄ +f₅ ,f_t =f₃ )；2.三重預測(f_T =f₄ +f₅ +f₆ ,f_t =f₃ )；3.四重預測(f_T =f₄ +f₅ +f₆ +f₇ ,f_t =f₃ )。The present invention will take the following two test samples for the depth value estimation experiment, test sample one: 1. double prediction (f _T = f ₄ + f ₅ , f _t = f ₄ ); 2. triple prediction (f _T = f ₄ +f ₅ +f ₆ , f _t =f ₄ ); 3. Four-fold prediction (f _T = f ₄ + f ₅ + f ₆ + f ₇ , f _t = f ₄ ). Test sample two: 1. Double prediction (f _T = f ₄ + f ₅ , f _t = f ₃ ); 2. Triple prediction (f _T = f ₄ + f ₅ + f ₆ , f _t = f ₃ ); Quadruple prediction (f _T = f ₄ + f ₅ + f ₆ + f ₇ , f _t = f ₃ ).

然後將上述測試樣本進行二種實驗，實驗一為多重解析度預測，實驗二為增加神經元數量。首先本發明對多重解析度預測實驗進行說明。Then, the above test samples were subjected to two experiments, the first one was a multi-resolution prediction, and the second was an increase in the number of neurons. First, the present invention describes a multiple resolution prediction experiment.

關於該雙重、三重及四重訓練：Neural Network訓練參數為(#neuron=700,hiddenlayer=1,epochs=200,goal=1e–5)。如第16圖所示者，則為樣本一的多重訓練時間；如第17圖所示者，係為樣本一所對應之深度圖；如第18圖所示者，係為樣本一的多重預測結果。如第19圖所示者，係為樣本二所對應之深度圖；如第20圖所示者，則為樣本二的多重預測結果。Regarding the dual, triple and quadruple training: the Neural Network training parameters are (#neuron=700, hiddenlayer=1, epochs=200, goal=1e–5). As shown in Fig. 16, it is the multiple training time of sample one; as shown in Fig. 17, it is the depth map corresponding to sample one; as shown in Fig. 18, it is the multiple prediction of sample one. result. As shown in Fig. 19, it is the depth map corresponding to sample 2; as shown in Fig. 20, it is the multiple prediction result of sample 2.

上述實驗一的結論：係經由訓練樣本_T f=f₄ (x,y)所訓練出的網路，來預測其測試樣本(x,y)_t f=f₃ (x,y)，雖然多重解析度可以有效地預測。雙重預測(MAE可達到8.47)，三重預測(MAE可達到7.23)，四重預測(MAE可達到5.47)。但其預測精準度還不是最佳。而下述實驗二，本發明將增加其神經元數量，以提升其預測精準度。The conclusion of Experiment 1 above is that the test sample (x, y) _t f = f ₃ (x, y) is predicted via the network trained by the training sample _T f = f ₄ (x, y) , although multiple The resolution can be effectively predicted. Double prediction (MAE can reach 8.47), triple prediction (MAE can reach 7.23), and quadruple prediction (MAE can reach 5.47). However, its prediction accuracy is not optimal. In the second experiment below, the present invention will increase the number of neurons to improve its prediction accuracy.

實驗二，增加神經元數量：Neural Network訓練參數為(#neuron=700,hiddenlayer=1,epochs=200,goal=1e–5)。以實驗一的最佳結果四重解析度預測再增加神經元數目，除了以固定700個神經元訓練外，本發明額外增加750,800和1000神經元做測試。實驗成果請參閱第20圖至第22圖所示者。Experiment 2, increase the number of neurons: Neural Network training parameters are (#neuron=700, hiddenlayer=1, epochs=200, goal=1e–5). Using the best results of Experiment I, the quadruple resolution predicted to increase the number of neurons. In addition to training with 700 neurons, the present invention additionally added 750, 800 and 1000 neurons for testing. For the experimental results, please refer to the figures shown in Figures 20 to 22.

另經統計可得出一神經元與訓練時間及MAE之關係，請參閱第23a圖及第23b圖者。Statistics can also be used to derive the relationship between a neuron and training time and MAE. Please refer to Figures 23a and 23b.

而本發明經上述實驗二可以獲得結論：除了原本固定700個神經元之外，另外還增加神經元750、800及1000的實驗，其實驗結果證明，當增加重數及神經元數量時，有助於提升深度預測時的精準度。However, the present invention can be concluded by the above experiment 2: in addition to the original fixation of 700 neurons, the experiment of increasing the number of neurons 750, 800 and 1000, the experimental results show that when the number of neurons and the number of neurons are increased, there are Helps improve the accuracy of depth prediction.

另本發明再做一個實驗三，該實驗三係重新設定輸入向量並調整每個像素位置。分為兩種方式：1.zig-zag scan法；2.頭尾相接法。其中該zig-zag scan法係將5*5向量擷取完成後，使用zig-zag重新排序每個像素的位置，如第24a圖，並重新作訓練。而頭尾相接法係將5*5向量擷取完成後，使用頭尾相接重新排序每個像素的位置，如第24b圖，並重新作訓練。上述兩種方法Neural Network訓練參數為(#neuron=700,hiddenlayer=1,epochs=200,goal=1e–5)。其結果如第25圖所示。故使用zig-zag及頭尾相接重新排序每個像素位置並作訓練，其預測結果不如初始方法。基於上述最好之方法運用於底下作多張影像深度圖估測。In addition, the present invention performs an experiment three, which resets the input vector and adjusts each pixel position. There are two ways: 1.zig-zag scan method; 2. head and tail connection method. After the zig-zag scan method completes the 5*5 vector, use zig-zag to reorder the position of each pixel, as shown in Figure 24a, and retrain. After the head-to-tail method is used to retrieve the 5*5 vector, the position of each pixel is reordered using the head-to-tail connection, as shown in Figure 24b, and training is resumed. The Neural Network training parameters of the above two methods are (#neuron=700, hiddenlayer=1, epochs=200, goal=1e–5). The result is shown in Fig. 25. Therefore, using zig-zag and head-to-tail to reorder each pixel position and train, the prediction result is not as good as the initial method. Based on the best method described above, it is used to make multiple image depth map estimates.

本實驗三進一步以多張影像作深度圖預測，選用十四張影像及其深度圖(大小皆為34*45)，如第26圖所示。測試時Inside影像及Outside影像固定。其目的：訓練張數越多時，觀察inside及outside品質變化之規律。1.Lab01：訓練兩張影像，測試Inside兩張及(固定)Outside四張。2.Lab02：訓練五張影像，測試Inside四張及(固定)Outside四張。3.Lab03：訓練十張影像，測試Inside四張及(固定)Outside四張。In the third experiment, a plurality of images are further used for depth map prediction, and fourteen images and depth maps (both sizes are 34*45) are selected, as shown in Fig. 26. The Inside image and the Outside image are fixed during the test. Its purpose: to observe the law of the quality of inside and outside when the number of training sheets is increased. 1.Lab01: Train two images, test two inside and (fixed) outside four. 2.Lab02: Train five images, test four inside and four (fixed) outside. 3.Lab03: Train ten images, test four inside and four (fixed) outside.

本發明在進行Lab01：訓練兩張影像，測試Inside兩張及(固定)Outside四張。Neuralnetwork訓練參數為(#neuron=200，hiddenlayer=1，epochs=200，goal=1e–5)。其成果如第27圖所示者。The present invention performs Lab01: training two images, testing two inside and two (fixed) Outside. The Neuralnetwork training parameters are (#neuron=200, hiddenlayer=1, epochs=200, goal=1e–5). The results are as shown in Figure 27.

本發明在進行Lab02：訓練五張影像，測試Inside四張及(固定)Outside四張。Neuralnetwork訓練參數為(#neuron=500,hiddenlayer=1,epochs=200,goal=1e–5)。其成果如第28圖所示者。The present invention is performing Lab02: training five images, testing four inside and four (fixed) Outside. The Neuralnetwork training parameters are (#neuron=500, hiddenlayer=1, epochs=200, goal=1e–5). The results are as shown in Figure 28.

本發明在進行Lab03：訓練十張影像，測試Inside四張及(固定)Outside四張。Neuralnetwork訓練參數為(#neuron=1000,hiddenlayer=1,epochs=200,goal=1e–5)。其成果如第29圖所示者。The present invention performs Lab03: training ten images, testing four inside and four (fixed) Outside. The Neuralnetwork training parameters are (#neuron=1000, hiddenlayer=1, epochs=200, goal=1e–5). The results are as shown in Figure 29.

經由上述實驗三所獲得的結論：本發明所挑選Inside影像(2)及Outside影像(12)來作為探討。首先，當本發明以兩張影像作訓練時，其Inside預測(MAE可達到2.47)，但Outside預測效果其差(MAE為)。其次，以五張影像作訓練時，Inside預測(MAE可達到2.65)，Outside預測(MAE為27.25)。最後，以十張影像作為訓練時，Inside預測(MAE可達到2.70)，Outside預測(MAE為10.92)。綜合上述得到一個結果，當訓練張數增加時，其Inside預測會稍微退步，但其Outside預測會有顯著的進步。另本發明可以使用中值濾波器(Median filter)來濾除影像中小黑點，使影像看起來平滑(smooth)。其Inside預測結果經過後處理，如第30a圖所示，經由後處理中值濾波器修正過後，其MAE最好可達到2.03。The conclusions obtained through the above experiment 3 are as follows: the Inside image (2) and the Outside image (12) selected in the present invention are discussed. First, when the present invention is trained with two images, its Inside prediction (MAE can reach 2.47), but the Outside prediction effect is poor (MAE is). Secondly, when training with five images, Inside predicts (MAE can reach 2.65) and Outside predicts (MAE is 27.25). Finally, when training with ten images, Inside predicts (MAE can reach 2.70) and Outside predicts (MAE is 10.92). Combining the above results in a result, when the number of training sheets increases, its Inside prediction will slightly degrade, but its Outside prediction will have significant progress. In addition, the present invention can use a median filter to filter out small black spots in the image to make the image look smooth. The results of the Inside prediction are post-processed. As shown in Figure 30a, the MAE is preferably adjusted to 2.03 after being corrected by the post-processing median filter.

經上述實驗，當在增加重數及神經元時，MAE值有明顯的下降，且其估測出深度圖有越來越趨近於實際深度圖，再將該實際深度圖經DIBR技術合成出左右眼影像的交錯式立體影像，並輸出於顯示器上，如第30b圖所示者，係為本發明所選用的影像、所估測的深度圖及其交錯式立體影像。Through the above experiments, when the number of neurons and neurons are increased, the MAE value is significantly decreased, and it is estimated that the depth map is more and more close to the actual depth map, and then the actual depth map is synthesized by DIBR technology. The interlaced stereo image of the left and right eye images is outputted on the display, as shown in Fig. 30b, which is the image selected for the present invention, the estimated depth map, and the interlaced stereo image.

綜合以上所述，本發明係關於一種「二維影像深度值之估測系統」，係率先提出一新的二維影像深度值之估測方法，運用類神經網路於二維影像之深度圖估測，可以有效地、精準地預測深度圖的深度值，再以此方法製作成操作系統(即程式軟體)，又其方法及以該方法所構成的系統均未曾見於諸書刊或公開使用，誠符合發明專利申請要件，懇請　鈞局明鑑，早日准予專利，至為感禱。In summary, the present invention relates to a "two-dimensional image depth value estimation system", which is the first to propose a new two-dimensional image depth value estimation method, using a neural network to the depth image of a two-dimensional image. It is estimated that the depth value of the depth map can be predicted effectively and accurately, and then the operating system (ie, the program software) is produced in this way, and the method and the system formed by the method are not seen in the publications or publicly used. Integrity meets the requirements of the invention patent application, and asks the Bureau to give a clear indication of the patent.

（a）、（b）、（c）、（d）、（e）、（f）‧‧‧步驟(a), (b), (c), (d), (e), (f) ‧ ‧ steps

第1圖：係為本發明方法的步驟流程圖。第2圖：係為本發明深度圖訓練與預測之系統架構圖。第3a圖：係為本發明訓練網路時所輸入向量(x,y)₄ f及其對應的深度值d(x,y) 之示意圖。第3b圖：係為本發明第3a圖所擷取向量(x,y)₄ f之示意圖。第4圖：係為本發明的神經元模型之示意圖。第5a圖：係為本發明類神經網路的隱藏層所使用的對數雙彎曲轉移函數之示意圖。第5b圖：係為本發明類神經網路的隱藏層所使用的正切雙彎曲轉移函數之示意圖。第5c圖：係為本發明類神經網路的輸出層所使用的線性轉移函數之示意圖。第6圖：係為本發明第3b圖的二維向量轉換成一維之示意圖。第7圖：係為本發明單重解析度深度預測所創建的倒傳遞網路之架構圖。第8a圖：係為本發明單重解析度深度預測所創建的前饋網路之架構圖。第8b圖：係為本發明訓練類神經網路之示意圖。第9a圖：係為本發明類神經網路預測深度值之示意圖。第9b圖：係為本發明d(x,y) 與de(x,y) 深度值有效範圍之示意圖。第10a圖：係為本發明測試樣本一的單重預測(f_T =f₄ ,f_t =f₄ )結果之示意圖。第10b圖：係為本發明測試樣本二的單重預測(f_T =f₄ ,f_t =f₃ )結果之示意圖。第11圖：係為本發明多重輸入向量及其對應的目標值之示意圖。第12a圖：係為本發明多重輸入向量之示意圖。第12b圖：係為本發明多重解析度訓練過程之示意圖。第13圖：係為本發明多重解析度預測的倒傳遞網路架構圖。第14圖：係為本發明的(x,y)₄ f&f₅ (x,y)及其對應的d(x,y) 之示意圖。第15圖：係為本發明的(x,y)₄ f&f₅ (x,y)&f₆ (x,y)及其對應的d(x,y) 之示意圖。第16圖：係為本發明多重訓練時間之成果圖。第17圖：係為本發明測試樣本一的紋路影像所對應之深度圖。第18圖：係為本發明測試樣本一的多重預測成果圖。第19圖：係為本發明測試樣本二的紋路影像所對應之深度圖。第20圖：係為本發明測試樣本二的多重預測成果圖。第21圖：係為本發明增加神經元四重預測訓練時間之成果圖。第22圖：係為本發明增加神經元四重預測深度之成果圖。第23a圖：係為本發明神經元數與訓練時間之關係圖。第23b圖：係為本發明神經元數與MAE值之關係圖。第24a圖：係為本發明以zig-zag scan法重新設定輸入向量(x,y)_T f。第24b圖：係為本發明以頭尾相接法重新設定輸入向量(x,y)_T f。第25圖：係為本發明以Initial(頭頭相接)、zig-zag及頭尾相接法預測之深度圖成果之示意圖。第26圖：係為本發明所選用14張灰階測試影像圖。第27圖：係為本發明Lab01：訓練兩張影像，測試Inside兩張及(固定)Outside四張之成果圖。第28圖：係為本發明Lab02：訓練五張影像，測試Inside四張及(固定)Outside四張之成果圖。第29圖：係為本發明Lab03：訓練十張影像，測試Inside四張及(固定)Outside四張之成果圖。第30a圖：係為本發明使用中值濾波器來濾除影像中小黑點，使影像看起來平滑之成果圖。第30b圖：係為運用本發明類神經網路所製作出的交錯式立體影像圖。Figure 1 is a flow chart showing the steps of the method of the present invention. Figure 2 is a system architecture diagram of the depth map training and prediction of the present invention. Figure 3a is a schematic diagram of the input vector (x, y) ₄ f and its corresponding depth value d (x, y) when the network is trained in the present invention. Of FIG. 3b: a schematic diagram of FIG. ₄ f 3a of the vector (x, y) captured by the system of the present invention. Figure 4 is a schematic representation of a neuronal model of the invention. Figure 5a is a schematic diagram of the log double bending transfer function used for the hidden layer of the neural network of the invention. Figure 5b is a schematic diagram of the tangent double bending transfer function used by the hidden layer of the neural network of the invention. Figure 5c is a schematic diagram of the linear transfer function used by the output layer of the neural network of the present invention. Fig. 6 is a schematic diagram showing the conversion of a two-dimensional vector of Fig. 3b of the present invention into one dimension. Figure 7 is an architectural diagram of the inverse transfer network created by the single-resolution depth prediction of the present invention. Figure 8a is an architectural diagram of a feedforward network created by the single-resolution depth prediction of the present invention. Figure 8b is a schematic diagram of a training neural network of the present invention. Figure 9a is a schematic diagram of predicted depth values for a neural network of the invention. Fig. 9b is a schematic diagram showing the effective range of d(x, y) and de (x, y) depth values of the present invention. Fig. 10a is a schematic diagram showing the results of single prediction (f _T = f ₄ , f _t = f ₄ ) of test sample 1 of the present invention. Figure 10b is a graphical representation of the results of a single prediction (f _T = f ₄ , f _t = f ₃ ) for the test sample 2 of the present invention. Figure 11 is a schematic diagram of multiple input vectors and their corresponding target values of the present invention. Figure 12a is a schematic diagram of multiple input vectors of the present invention. Figure 12b is a schematic diagram of the multiple resolution training process of the present invention. Figure 13 is a diagram of the inverse transfer network architecture of the multiple resolution prediction of the present invention. Figure 14 is a schematic diagram of (x,y) ₄ f&f ₅ (x,y) and its corresponding d(x,y) of the present invention. Figure 15 is a schematic diagram of (x, y) ₄ f & f ₅ (x, y) & f ₆ (x, y) and their corresponding d (x, y) . Figure 16: This is the result of the multiple training time of the present invention. Figure 17 is a depth map corresponding to the texture image of the test sample 1 of the present invention. Figure 18 is a multi-predictive result graph of test sample 1 of the present invention. Figure 19 is a depth map corresponding to the texture image of the test sample 2 of the present invention. Figure 20 is a multi-predictive result graph of Test Sample 2 of the present invention. Fig. 21 is a graph showing the results of increasing the training time of neuron quadruple prediction for the present invention. Figure 22: This is the result of increasing the depth of neuron quadruple prediction for the present invention. Figure 23a is a diagram showing the relationship between the number of neurons and the training time of the present invention. Figure 23b is a diagram showing the relationship between the number of neurons and the MAE value of the present invention. Of FIG. 24a: zig-zag scan method to set the input vector (x, y) _T f system of the present invention. Of FIG. 24b: end to end process to set the input vector (x, y) _T f system of the present invention. Fig. 25 is a schematic diagram showing the results of the depth map predicted by Initial (head-to-head), zig-zag and head-to-tail connection method. Figure 26: This is a 14-gray test image map selected for the present invention. Figure 27: This is the Lab01 of the invention: training two images, testing the results of the inside two and (fixed) Outside four. Figure 28: This is the Lab02 of the invention: training five images, testing the results of the four inside and four (fixed) Outside. Figure 29: This is the Lab03 of the invention: training ten images, testing the results of four inside and four (fixed) Outside. Figure 30a is a result of the present invention using a median filter to filter out small black spots in the image to make the image look smooth. Figure 30b is an interlaced stereo image created using the neural network of the present invention.

(a)、(b)、(c)、(d)、(e)、(f)‧‧‧步驟 (a), (b), (c), (d), (e), (f) ‧ ‧ steps

Claims

A method for estimating a depth value of a two-dimensional image includes: (a) a first defining step of taking the same image and defining its (x, y) _T f input vector and a corresponding depth value d (x) , y) the target vector; (b), a network creation step, establish a type of neural network and parameterize it; (c), a training network step, enter the (x, y) _T f Inputting the target vector value of the vector and its depth value, thereby training the neural network to obtain a NN weight value; (d), a second defining step, defining (x, y) _t f of the sample image Input vector; and (e), an output step, inputting the (x, y) _t f input vector to the trained neural network, and loading the NN weight value to obtain an estimate of the sample image Measure the depth value de(x,y) .

According to the method for estimating the depth value of the two-dimensional image described in claim 1, the method further comprises a step (f), wherein the step (f) is a comparison step according to the depth value de(x, y ) , according to the evaluation of the training effect of this type of neural network, if it meets the demand, stop training the neural network and use it; if it does not meet the demand, the input is (x, y) _T The f-quantity is set to multiple input vectors, or / and increase the number of neurons in this type of neural network, and retrain the neural network to meet the demand.

According to the estimation method of the two-dimensional image depth value described in Item 2 of the patent application, the Mean absolute error (MAE) is used to evaluate the training effect of the neural network.

According to the method for estimating the depth value of the two-dimensional image described in claim 2, wherein the (x, y) _T f multiple input vector in the step (f) is set as a first heavy input vector, a first A double input vector, a third heavy input vector, and a fourth heavy input vector.

According to the method for estimating the depth value of the two-dimensional image described in claim 4, wherein the input method of the input vector (x, y) _T f in the step (f) is a head-to-head method and a zig-zag method. Or head-to-tail connection.

According to the estimation method of the two-dimensional image depth value described in claim 2, wherein the number of neurons in the step (f) is at least 700.

The estimation method of the two-dimensional image depth value according to claim 1 or 2, wherein the neural network in the step (b) is set as a reverse transmission network and a feedforward network is created. combination.

According to the method for estimating the depth value of the two-dimensional image described in claim 7, wherein a hidden layer inside the neural network uses a log double bending transfer function (logsig) or a tangent double bending transfer function ( Tansig), an output layer set up inside such a neural network uses a linear transfer function (purelin).

The method for estimating a two-dimensional image depth value according to claim 1 or 2, wherein the NN weight value comprises a weight (W) and an offset value (b).

A system for estimating a depth value of a two-dimensional image is a system constructed by using the estimation method according to any one of the above-mentioned claims.