TWI496110B

TWI496110B - Method of Transforming 3D Image Video with Low Complexity Two - dimensional Video

Info

Publication number: TWI496110B
Application number: TW100124109A
Authority: TW
Original assignee: Nat Univ Chung Cheng
Priority date: 2011-07-07
Filing date: 2011-07-07
Publication date: 2015-08-11
Also published as: TW201303793A

Description

Method for converting 3D video video with low complexity 2D video

本發明係與影像處理的技術有關，特別是指一種低複雜度二維影像視訊轉換三維影像視訊之方法。The invention relates to the technology of image processing, in particular to a method for converting 3D video images of low complexity 2D video.

按，二維影像視訊與三維影像視訊的最大差別在於是否有深度資訊圖，深度資訊圖包含了所有二維影像視訊中物件與觀看者或拍攝者的相對距離，所以深度資訊圖對於三維立體顯示技術的播放效果呈現是非常重要的。Press, the biggest difference between 2D video and 3D video is whether there is a depth map. The depth map contains the relative distance between the objects in the 2D video and the viewer or the photographer, so the depth map is for 3D display. The presentation of the technology is very important.

傳統產生影像視訊深度資訊圖的方式主要分為兩種，一種是使用多鏡頭相機拍攝同一場景，同時得到兩個以上的場景視角二維影像視訊，再利用畫面差異度(Disparity)即可計算出其深度資訊圖；另一種則是以一般相機拍攝的影像視訊來產生深度資訊圖，但是一般相機一次只能拍攝單一視角的二維畫面，所以使用單張單一視角二維影像算出景深的方式的計算複雜度非常高。例如Battiato Sebastiano在2004年發表的Depth map generation by image classification論文中提到主要包括了幾種方法：影像分類(Image Classification)、消失點偵測(Vanishing Point Detection)、均值飄移切割(Mean-Shift Segmentation)等技術。這些技術均需要很高的計算複雜度。例如Dorin Comaniciu在1999年發表的Mean shift analysis and applications中提到均值飄移(Mean-Shift)運算，需要用到積分和指數與對數運算，這也需要很高的計算複雜度。因此，傳統的技術在即時應用上的實作非常困難。Traditionally, there are two ways to generate image video depth information maps. One is to use the multi-lens camera to capture the same scene, and at the same time to obtain two or more scene view 2D video images, and then use the screen disparity to calculate The depth information map; the other is to generate the depth information map by the video image captured by the general camera, but generally the camera can only capture the single-view two-dimensional image at a time, so the method of calculating the depth of field using a single single-view two-dimensional image is used. The computational complexity is very high. For example, Battiato Sebastiano mentioned in the Depth map generation by image classification paper published in 2004 mainly includes several methods: Image Classification, Vanishing Point Detection, Mean-Shift Segmentation. ) and other technologies. These techniques all require high computational complexity. For example, Dorin Comaniciu mentioned the Mean-Shift operation in the Mean shift analysis and applications published in 1999, which requires the use of integrals and exponential and logarithmic operations, which also requires high computational complexity. Therefore, the implementation of traditional technology in real-time applications is very difficult.

本發明之主要目的在於提供一種低複雜度二維影像視訊轉換三維影像視訊之方法，其可藉由低複雜度的影像分析及處理技術產生二維影像視訊所沒有的深度資訊，進而將二維影像視訊轉換為三維影像視訊。The main purpose of the present invention is to provide a low-complexity two-dimensional video image conversion method for three-dimensional image video, which can generate depth information not required for two-dimensional video video by low-complexity image analysis and processing technology, and then two-dimensional Video video is converted to 3D video.

本發明之再一目的在於提供一種低複雜度二維影像視訊轉換三維影像視訊之方法，其可大幅減少運算複雜度，進而能適合硬體或是嵌入式系統軟體來實現即時立體影像視訊產生，並且能維持三維影像視訊的良好品質。A further object of the present invention is to provide a low-complexity two-dimensional video image conversion method for three-dimensional image video, which can greatly reduce the computational complexity, and can be adapted to hardware or embedded system software to realize instant stereoscopic video generation. And can maintain the good quality of 3D video.

為了達成前述目的，依據本發明所提供之一種低複雜度二維影像視訊轉換三維影像視訊之方法，係依序對一影像視訊的每個畫面進行轉換，該方法包含有下列步驟：a)判斷邊緣特徵點：判斷一畫面中的每個像素(pixel)是否為邊緣特徵點；b)找出消失線：針對是邊緣特徵點的各該像素進行運算，在對一該像素運算時，係以該像素為中心，找出該像素周圍同樣是邊緣特徵點的相鄰像素，並將其與位於中心的該像素連線之後的各個直線資訊予以儲存，再將該等直線資訊依其所通過邊緣特徵點的數量予以排序，將通過最多邊緣特徵點的至少前二條直線定義為消失線(vanishing line)；c)對畫面分類：判斷該畫面中的邊緣特徵點密集度是否大於一密度門檻值，以及判斷該畫面中的消失線數量是否大於一數量門檻值，若兩者都大於門檻值，則判斷該畫面係為具有近拍照特徵的畫面，接著利用顏色來計算該畫面中天空和遠山的比例，若大於一風景門檻值，則判斷該畫面係為具有風景特徵的畫面；若非前述兩者，則判斷該畫面為具有消失區域特徵的畫面；d)找出消失區域並產生初步深度圖：若在步驟c)中的該畫面被判斷為具有消失區域特徵的畫面，則利用消失區域偵測的技術來找出該畫面的消失區域，其主要是利用所有消失線的交點做為消失點，並於該畫面中找出消失點最密集的區域，即定義為消失區域；若該消失區域位於該畫面外，則依照該等消失線的趨勢將該畫面的最符合趨勢的邊界設為消失區域；若該畫面被判斷為具有風景特徵的畫面，則直接定義消失區域位於畫面的最上方；之後，再依據該畫面中的每個像素與該消失區域的距離來產生初步深度圖(Gradient Depth Map,GDM)；接著，依該畫面的邊緣特徵點來針對該初步深度圖進行預定運算，藉以對該畫面中的物件邊緣進行強化，以產生最後的深度資訊圖；e)偵測對比度並定義景深：若在步驟c)中的該畫面被判斷為具有近拍照特徵的畫面，則對該畫面定義複數區塊，各該區塊具有複數像素，以各該區塊來偵測對比度，進而分辨出各該區塊是屬於該畫面中的前景物件或是背景物件，並定義前景物件的深度為最近之景深，背景物件的深度為最遠之景深，以產生最後的深度資訊圖。藉此，可大幅簡化計算複雜度，並將二維影像視訊轉換為三維影像視訊，且可適用於硬體或是嵌入式系統軟體來實現即時立體影像視訊產生，並且能維持三維影像視訊的良好品質。In order to achieve the foregoing objective, a low-complexity two-dimensional video image conversion method for converting three-dimensional video images according to the present invention is to sequentially convert each picture of a video video, and the method includes the following steps: a) determining Edge feature points: determine whether each pixel (pixel) in a picture is an edge feature point; b) find the vanishing line: perform operations on each pixel that is an edge feature point, and when computing a pixel, The pixel is centered, and the adjacent pixels which are also edge feature points around the pixel are found, and the respective line information after being connected with the pixel located at the center is stored, and then the line information is passed according to the edge The number of feature points is sorted, and at least the first two lines of the most edge feature points are defined as a vanishing line; c) the picture is classified: whether the edge feature point density in the picture is greater than a density threshold value, And determining whether the number of vanishing lines in the picture is greater than a threshold value, and if both are greater than the threshold value, determining that the picture is close to taking a picture The sign of the sign, then use the color to calculate the ratio of the sky and the distant mountain in the picture. If it is greater than a landscape threshold, it is determined that the picture is a picture with a landscape feature; if not, the picture is determined to have a disappearing area. a picture of the feature; d) finding the disappearing area and generating a preliminary depth map: if the picture in step c) is judged to be a picture with the feature of the disappearing area, the technique of disappearing area detection is used to find out the disappearance of the picture The area, which mainly uses the intersection of all the vanishing lines as the vanishing point, and finds the most densely populated area in the picture, that is, defined as the disappearing area; if the disappearing area is outside the picture, it disappears according to the same The trend of the line sets the boundary of the most consistent trend of the picture as the disappearing area; if the picture is judged to be the picture with the scenic feature, the definition of the disappearing area is directly at the top of the picture; and then, according to each of the pictures a distance between the pixel and the disappearing region to generate a Gradient Depth Map (GDM); and then, according to the edge feature point of the image, The depth map performs a predetermined operation to enhance the edge of the object in the picture to generate a final depth information map; e) detecting the contrast and defining the depth of field: if the picture in step c) is judged to have a close-up feature The picture defines a plurality of blocks for the picture, each block has a plurality of pixels, and each of the blocks detects the contrast, thereby distinguishing whether the block belongs to a foreground object or a background object in the picture. And define the depth of the foreground object to the nearest depth of field, and the depth of the background object is the farthest depth of field to produce the final depth map. In this way, the computational complexity can be greatly simplified, and the 2D video can be converted into 3D video, and can be applied to hardware or embedded system software to realize real-time stereoscopic video generation, and can maintain good 3D video. quality.

為了詳細說明本發明之技術特點所在，茲舉以下之較佳實施例並配合圖式說明如後，其中，由於本案屬於影像處理之技術，因此在圖式的表示上，在景深及顯示影像差異上，均難以使用機械製圖的方式表示，請容以影像的方式表示之。In order to clarify the technical features of the present invention in detail, the following preferred embodiments are described with reference to the following description. In the following, since the present invention belongs to the technology of image processing, in the representation of the schema, the difference in depth of field and display image It is difficult to express it by mechanical drawing, please express it by image.

如第一圖所示，本發明一較佳實施例所提供之一種低複雜度二維影像視訊轉換三維影像視訊之方法，係依序對一影像視訊的每個畫面進行轉換，該方法主要具有下列步驟：As shown in the first figure, a low-complexity two-dimensional video image conversion method for three-dimensional video video is provided in a first embodiment of the present invention, which sequentially converts each picture of an image video, and the method mainly has The following steps:

a)判斷邊緣特徵點：判斷一畫面中的每個像素(pixel)是否為邊緣特徵點。於本實施例中，判斷邊緣特徵點時，係以一該像素及其周圍八個像素的RGB(紅、綠、藍)格式的平均值或是YUV(亮度、色度、濃度)格式的Y值，透過垂直方向及水平方向的遮罩陣列來求出水平值及垂直值，並將水平值的絕對值與垂直值的絕對值相加，再把這個結果與一邊緣門檻值比較，若大於該邊緣門檻值，則判斷該像素屬於邊緣特徵點。由於習知的方式是計算出水平值及垂直值的平方和後的平方根，這樣的計算複雜度很高，因此本發明之計算方式大幅簡化了傳統的計算複雜度約65%，且仍能維持與習知計算方式幾近相同的結果。a) Judging edge feature points: determining whether each pixel (pixel) in a picture is an edge feature point. In the embodiment, when determining the edge feature point, the average value of the RGB (red, green, blue) format of the pixel and the surrounding eight pixels or the Y of the YUV (brightness, chroma, density) format is used. Value, through the vertical and horizontal mask array to find the horizontal value and vertical value, and add the absolute value of the horizontal value and the absolute value of the vertical value, and then compare this result with an edge threshold value, if greater than The edge threshold value determines that the pixel belongs to an edge feature point. Since the conventional method is to calculate the square root after the sum of the squares of the horizontal value and the vertical value, such calculation complexity is high, so the calculation method of the present invention greatly simplifies the conventional computational complexity by about 65%, and still maintains Nearly the same results as the conventional calculation method.

b)找出消失線：針對是邊緣特徵點的各該像素進行運算，在對一該像素運算時，係以該像素為中心，找出該像素周圍同樣是邊緣特徵點的相鄰像素，並將其與位於中心的該像素連線之後的各個直線資訊予以儲存，再將該等直線資訊依其所通過邊緣特徵點的數量予以排序，將通過最多邊緣特徵點的二條直線定義為消失線(vanishing line)。於本實施例中，如第二圖所示，該相鄰像素係指以一該像素為中心的周圍八個像素，以及更外一層周圍的十六個像素，而係使用5x5區塊霍夫轉換法(5x5 Block Hough Transform)來找出消失線。而在實際操作時，以第二圖為例說明，在第二圖中的黑點S是目前處理的像素並且是邊緣特徵點，首先找尋第二圖中白色區域(A位置)是否有邊緣特徵點，再來找灰色區域(X位置及Y位置)是否有邊緣特徵點，圖中所示是A位置、X位置及Y位置都具有邊緣特徵點的狀況，此時即在該黑點S透過霍夫轉換法來計算147度至168度的直線資訊，並記錄之。這種方法與未簡化的霍夫轉換法相較，大約可降低56%的計算複雜度。b) finding the vanishing line: performing operations on each of the pixels that are edge feature points, and when computing the pixel, centering on the pixel, finding adjacent pixels that are also edge feature points around the pixel, and The line information after being connected with the pixel located at the center is stored, and the line information is sorted according to the number of edge feature points passing through it, and the two lines passing through the most edge feature points are defined as vanishing lines ( Vanishing line). In the present embodiment, as shown in the second figure, the adjacent pixels refer to eight pixels around the pixel and sixteen pixels around the outer layer, and the 5x5 block Hoff is used. The conversion method (5x5 Block Hough Transform) is used to find the vanishing line. In the actual operation, taking the second figure as an example, the black point S in the second figure is the currently processed pixel and is the edge feature point. First, whether the white area (A position) in the second picture has edge features is found. Point, then look for the gray area (X position and Y position) whether there are edge feature points, the figure shows that the A position, the X position and the Y position have edge feature points, and at this time, the black point S is transmitted. The Hough transform method calculates the straight line information from 147 degrees to 168 degrees and records it. This approach reduces computational complexity by approximately 56% compared to the unsimplified Hough transform method.

c)對畫面分類：判斷該畫面中的邊緣特徵點密集度是否大於一密度門檻值，以及判斷該畫面中的消失線數量是否大於一數量門檻值，若兩者都大於門檻值，則判斷該畫面係為具有近拍照特徵的畫面，如第三圖(C)所示；接著利用顏色來計算該畫面中天空和遠山的比例，若大於一風景門檻值，則判斷該畫面係為具有風景特徵的畫面，如第三圖(B)所示；若非前述兩者，則判斷該畫面為具有消失區域特徵的畫面，如第三圖(A)所示。c) classifying the picture: determining whether the edge feature point density in the picture is greater than a density threshold value, and determining whether the number of vanishing lines in the picture is greater than a threshold value, and if both are greater than the threshold value, determining the The picture is a picture with a close-up feature, as shown in the third picture (C); then the color is used to calculate the ratio of the sky to the distant mountain in the picture. If it is greater than a landscape threshold, the picture is judged to have a landscape feature. The picture is as shown in the third figure (B); if it is not the above, it is judged that the picture is a picture having the feature of the disappearing area, as shown in the third figure (A).

d)找出消失區域並產生初步深度圖：若在步驟c)中的該畫面被判斷為具有消失區域特徵的畫面，則利用消失區域偵測的技術來找出該畫面的消失區域，其主要是利用所有消失線的交點做為消失點，以8x8方塊像素為範圍，於該畫面中找出消失點最密集的區域，即定義為消失區域；若該消失區域位於該畫面外，則依照該等消失線的趨勢將該畫面的最符合趨勢的邊界設為消失區域；若該畫面被判斷為具有風景特徵的畫面，則直接定義消失區域位於畫面的最上方；之後，再依據該畫面中的每個像素與該消失區域的距離來產生初步深度圖(Gradient Depth Map,GDM)。第四圖(A)顯示對應於第三圖(A)的初步深度圖，第四圖(B)則顯示對應於第三圖(B)的初步深度圖；其中，具有風景特徵的畫面大部分最遠的地方都會落在畫面的最上方，例如：天空或是遠山，因此具有風景特徵的畫面的消失區域即會在該初步深度圖的最上方，可參考第三圖(B)及第四圖(B)；接著，依該畫面的邊緣特徵點來針對該初步深度圖進行預定運算，藉以對該畫面中的物件邊緣進行強化，以產生最後的深度資訊圖。前述之預定運算，於本實施例中係指使用聯合雙邊濾波法(Joint Bilateral Filtering,JBF)來進行運算，但不使用邊緣停止功能(edge stop function)。由於習知的聯合雙邊濾波法使用了邊緣停止功能來偵測邊緣，而本發明不需使用此功能來偵測邊緣，因此可以降低26%的計算複雜度。具有消失區域特徵的畫面以及具有風景特徵的畫面的最後深度資訊圖，係如第五圖(A)及第五圖(B)所示。d) finding the disappearing area and generating a preliminary depth map: if the picture in step c) is judged to be a picture with the characteristics of the disappearing area, the technique of disappearing area detection is used to find the disappearing area of the picture, which is mainly The intersection point of all the vanishing lines is used as the vanishing point, and the area with the most dense vanishing point is found in the picture, which is defined as the disappearing area; if the disappearing area is outside the screen, then according to the point of 8x8 square pixels The trend of the disappearance line is set as the disappearing area of the most suitable trend boundary of the picture; if the picture is judged to be the picture with the landscape feature, the directly defined disappearing area is located at the top of the picture; and then, according to the picture The distance between each pixel and the vanishing area produces a Gradient Depth Map (GDM). The fourth figure (A) shows a preliminary depth map corresponding to the third figure (A), and the fourth figure (B) shows a preliminary depth map corresponding to the third figure (B); wherein the picture with the landscape features is mostly The farthest place will fall at the top of the picture, for example: sky or distant mountains, so the disappearing area of the picture with landscape features will be at the top of the preliminary depth map, refer to the third picture (B) and fourth Figure (B); Next, a predetermined operation is performed on the preliminary depth map according to edge feature points of the picture, thereby enhancing the edge of the object in the picture to generate a final depth information map. The foregoing predetermined operation, in this embodiment, refers to the operation using Joint Bilateral Filtering (JBF), but does not use an edge stop function. Since the conventional joint bilateral filtering method uses the edge stop function to detect edges, the present invention does not need to use this function to detect edges, thereby reducing computational complexity by 26%. The final depth information map of the picture with the disappearing area feature and the picture with the landscape feature is as shown in the fifth figure (A) and the fifth figure (B).

本實施例中更包含有一步驟d1)進行切割：其係以一切割方法來對該畫面進行切割，其主要是將顏色相近的像素群組起來，並對該畫面產生切割的動作來偵測物件，進而依切割的結果來調整該初步深度圖，而使得該畫面中的同一物件的深度值在深度資訊上是一致的。前述之切割方法係為一四位元移除切割方法(4-bit Removing Segmentation)，如第六圖所示，其係將各該像素的RGB八位元值的四個低位元移除，而以剩餘的四位元進行顏色的比較，並據以將顏色相近的像素群組起來，第六圖中上方係顯示各該像素RGB的八位元值，下方則顯示移除四個低位元值後，將四位元值加以群組後的狀態；由於一般常用的切割演算法是均值飄移切割(Mean-shift Segmentation)，需要非常高的計算量，而本發明所採用的四位元移除切割方法則除了可達到物件分割的效果之外，還可較前述的習知切割法降低約99.8%的計算複雜度。這個步驟d1)亦可先在步驟a)之前即先行對該畫面進行切割，在等到進行到步驟d)產生出該初步深度圖之後再來與之進行調整。The embodiment further includes a step d1) for cutting: the cutting is performed by a cutting method, which mainly groups pixels of similar colors, and generates a cutting action on the image to detect the object. And adjusting the preliminary depth map according to the result of the cutting, so that the depth values of the same object in the picture are consistent in depth information. The aforementioned cutting method is a 4-bit Removing Segmentation method, as shown in the sixth figure, which removes the four lower bits of the RGB octet value of each pixel, and The remaining four bits are used for color comparison, and the pixels with similar colors are grouped together. In the sixth figure, the upper part displays the octet value of each pixel RGB, and the lower part shows the removal of the four low level values. After that, the four-bit value is grouped; since the commonly used cutting algorithm is Mean-shift Segmentation, a very high amount of calculation is required, and the four-bit removal used in the present invention is removed. In addition to the effect of object segmentation, the cutting method can reduce the computational complexity of about 99.8% compared to the conventional cutting method described above. This step d1) may also first cut the picture before step a), and then adjust it after proceeding to step d) to generate the preliminary depth map.

e)偵測對比度並定義景深：若在步驟c)中的該畫面被判斷為具有近拍照特徵的畫面，則對該畫面定義複數區域，各該區塊具有複數像素，以各該區塊來偵測對比度，進而分辨出各該區塊是屬於該畫面中的前景物件或是背景物件，並定義前景物件的深度為最近之景深，背景物件的深度為最遠之景深，以產生最後的深度資訊圖。具有近拍照特徵的畫面的最後深度資訊圖係如第五圖(C)所示。前述的對比度的計算公式為下述式(1)。e) detecting the contrast and defining the depth of field: if the picture in step c) is judged to be a picture having a near-photographing feature, a plurality of areas are defined for the picture, each of the blocks having a plurality of pixels, each of the blocks Detecting the contrast, and then distinguishing whether the block belongs to the foreground object or the background object in the picture, and defines the depth of the foreground object as the nearest depth of field, and the depth of the background object is the farthest depth of field to generate the final depth. Information map. The final depth information map of the picture with the near-photographing feature is as shown in the fifth figure (C). The aforementioned formula for calculating the contrast is expressed by the following formula (1).

Contrast(對比度)=(I_max -I_min )/(I_max +I_min )　式(1)Contrast=(I _max -I _min )/(I _max +I _min ) Equation (1)

其中，I_max 為區塊範圍中的像素的亮度最大值，I_min 則為亮度最小值。Where I _max is the maximum brightness of the pixels in the block range, and I _min is the minimum value of the brightness.

藉由上述步驟可知，在許多的計算方式上，本發明均加以簡化，進而大幅度的降低了運算複雜度，由此可見，本發明可藉由低複雜度的影像分析及處理技術產生二維影像視訊所沒有的深度資訊，進而將二維影像視訊轉換為三維影像視訊。It can be seen from the above steps that the present invention is simplified in many calculation methods, thereby greatly reducing the computational complexity. Thus, the present invention can generate two-dimensional images by low complexity image analysis and processing techniques. The depth information not available in the video and video, and then the 2D video is converted into 3D video.

此外，本發明可大幅減少運算複雜度，進而能適合硬體或是嵌入式系統軟體來實現即時立體影像視訊產生，並且能維持三維影像視訊的良好品質。In addition, the invention can greatly reduce the computational complexity, and can be adapted to hardware or embedded system software to realize real-time stereoscopic video generation, and can maintain good quality of 3D video.

第一圖係本發明一較佳實施例之流程圖。The first figure is a flow chart of a preferred embodiment of the present invention.

第二圖係本發明一較佳實施例之示意圖，顯示找出消失線的狀態。The second drawing is a schematic view of a preferred embodiment of the invention showing the state of finding the vanishing line.

第三圖(A)係本發明一較佳實施例之影像範例圖，顯示具有消失區域特徵的畫面。The third figure (A) is an image example diagram of a preferred embodiment of the present invention, showing a picture having a feature of a disappearing area.

第三圖(B)係本發明一較佳實施例之影像範例圖，顯示具有風景特徵的畫面。The third figure (B) is an image example diagram of a preferred embodiment of the present invention, showing a picture with landscape features.

第三圖(C)係本發明一較佳實施例之影像範例圖，顯示具有近拍照特徵的畫面。The third diagram (C) is an image example diagram of a preferred embodiment of the present invention, showing a picture with a near-photographing feature.

第四圖(A)係本發明一較佳實施例之示意圖，顯示具有消失區域特徵的初步深度圖。Figure 4 (A) is a schematic illustration of a preferred embodiment of the present invention showing a preliminary depth map having features of vanishing regions.

第四圖(B)係本發明一較佳實施例之示意圖，顯示具有風景特徵的初步深度圖。Figure 4 (B) is a schematic view of a preferred embodiment of the present invention showing a preliminary depth map with landscape features.

第五圖(A)係本發明一較佳實施例之示意圖，顯示具有消失區域特徵的最後深度資訊圖。Figure 5 (A) is a schematic view of a preferred embodiment of the present invention showing a final depth information map having features of vanishing regions.

第五圖(B)係本發明一較佳實施例之示意圖，顯示具有風景特徵的最後深度資訊圖。Figure 5 (B) is a schematic view of a preferred embodiment of the present invention showing a final depth information map having landscape features.

第五圖(C)係本發明一較佳實施例之示意圖，顯示具有近拍照特徵的最後深度資訊圖。Figure 5 (C) is a schematic view of a preferred embodiment of the present invention showing a final depth information map with near-photographing features.

第六圖係本發明依較佳實施例之示意圖，顯示像素進行群組的狀態。The sixth figure is a schematic diagram of the present invention in accordance with a preferred embodiment, showing the state in which the pixels are grouped.

Claims

A low-complexity two-dimensional video video conversion method for three-dimensional video video is to sequentially convert each picture of a video video. The method includes the following steps: a) determining edge feature points: determining each of the pictures Whether the pixel is an edge feature point; b) finding the vanishing line: performing operations on each pixel that is an edge feature point, and when computing the pixel, centering on the pixel, finding the same around the pixel Is the adjacent pixel of the edge feature point, and stores each line information after the line is connected with the pixel located at the center, and then sorts the line information according to the number of edge feature points passed through it, and passes the most edge At least the first two lines of the feature point are defined as a vanishing line; c) classifying the picture: determining whether the edge feature point density in the picture is greater than a density threshold, and determining whether the number of vanishing lines in the picture is greater than A quantity threshold value, if both are greater than the threshold value, it is determined that the picture is a picture with a near-photographing feature, and then the color is used to calculate the picture If the ratio of the sky to the distant mountain is greater than a threshold value, the picture is judged to be a picture with a landscape feature; if not, the picture is determined to be a picture with a disappearing area feature; d) the missing area is found and generated Preliminary depth map: If the picture in step c) is judged to have a feature of the disappearing area feature, the technique of disappearing area detection is used to find the disappearing area of the picture, which is mainly to use the intersection of all the vanishing lines. Is the vanishing point, and finds the most densely populated area in the picture, that is, defined as the disappearing area; if the disappearing area is outside the picture, the most consistent boundary of the picture according to the trend of the vanishing line Set to the disappearing area; if the picture is judged to be a picture with a landscape feature, directly define the disappearing area at the top of the picture; and then generate a preliminary depth map according to the distance between each pixel in the picture and the disappearing area. (Gradient Depth Map, GDM); then, according to the edge feature points of the picture, a predetermined operation is performed on the preliminary depth map, thereby being in the picture The edge of the object is enhanced to generate a final depth information map; e) detecting the contrast and defining the depth of field: if the picture in step c) is judged to be a picture having a near-photographing feature, then a plurality of blocks are defined for the picture, Each of the blocks has a plurality of pixels, and each of the blocks detects the contrast, thereby distinguishing whether the block belongs to a foreground object or a background object in the picture, and defines a depth of the foreground object to the nearest depth of field, the background. The depth of the object is the farthest depth of field to produce the final depth map.

The method for converting a three-dimensional image video of low complexity two-dimensional video image according to claim 1 of the patent application, wherein: in step a), when determining the edge feature point, a pixel and a surrounding pixel thereof are used The average value of the RGB (red, green, blue) format or the Y value of the YUV (brightness, chrominance, density) format, the horizontal and horizontal mask arrays are used to find the horizontal and vertical values, and the level is The absolute value of the value is added to the absolute value of the vertical value. If it is greater than the edge threshold value, it is determined that the pixel belongs to the edge feature point.

The method for converting a three-dimensional video image of a low-complexity two-dimensional video image according to the first aspect of the patent application, wherein: in the step b), the adjacent pixel refers to eight pixels around the pixel. And sixteen pixels around the outer layer, and use the 5x5 Block Hough Transform to find the vanishing line.

The method for converting a three-dimensional video image of a low-complexity two-dimensional image according to claim 1 of the patent application, wherein: in step d), when the disappearing region of the image is found by using a technique of disappearing region detection, In the range of 8x8 square pixels, the most densely populated area is found in the picture, which is defined as the disappearing area.

The method for converting a three-dimensional video image of a low-complexity two-dimensional image according to claim 1 of the patent application, wherein: in the step d), the predetermined operation is performed by using a joint bilateral filtering method (Joint Bilateral Filtering, JBF) to perform the operation, but does not use the edge stop function.

The method for converting a three-dimensional video image of a low-complexity two-dimensional image according to claim 1 of the patent application, wherein: further comprising: performing a step d1): cutting the image by a cutting method, the main The pixels with similar colors are grouped together, and a cutting action is performed on the image to detect the object, and then the preliminary depth map is adjusted according to the result of the cutting, so that the depth value of the same object in the image is in the depth information. It is consistent.

The method for converting a three-dimensional image video of low complexity two-dimensional video image according to claim 6 of the patent application scope, wherein: in step d1), the cutting method is a four-bit removal cutting method (4-bit Removing) Segmentation) removes the four lower bits of the RGB octet value of each pixel, and compares the colors with the remaining four bits, and groups the pixels with similar colors accordingly.