KR100931269B1

KR100931269B1 - Real-Time Edge Detection in H.264 / ACC Compression Domain

Info

Publication number: KR100931269B1
Application number: KR1020070060198A
Authority: KR
Inventors: 원치선; 서종수; 김성민; 조영훈
Original assignee: 연세대학교 산학협력단
Priority date: 2007-06-20
Filing date: 2007-06-20
Publication date: 2009-12-11
Also published as: KR20080111800A

Abstract

본 발명은 H.264/AVC 압축영역에서의 실시간 에지 검출 방법에 대한 것으로서, 더욱 상세하게는 H.264/AVC로 압축된 비디오를 복호하지 않고 압축된 상태에서 실시간으로 공간 영역의 정보인 에지를 검출함으로써, 압축영역에서의 비디오 검색, 장면 전환 등의 비디오 분석에 응용할 수 있는 실시간 에지 검출 방법에 대한 것이다.The present invention relates to a real-time edge detection method in an H.264 / AVC compressed region. More particularly, the present invention provides a method for detecting an edge of a spatial region in real time in a compressed state without decoding a video compressed with H.264 / AVC. By detecting, the present invention relates to a real-time edge detection method that can be applied to video analysis such as video search and scene change in a compressed region.

이를 위해,for teeth,

H.264/AVC로 압축된 영상을 입력받는 단계;Receiving an image compressed with H.264 / AVC;

상기 영상의 비트스트림에서 인트라 예측 모드 정보를 이용하여 영상의 예측 정보를 검출하는 단계;Detecting prediction information of an image using intra prediction mode information in the bitstream of the image;

상기 검출된 예측 정보를 이용하여 영상을 분석하는 단계를 포함하여 구성되는 것을 특징으로 하는 H.264/AVC 압축영역에서의 실시간 에지 검출 방법을 제공한다.It provides a real-time edge detection method in the H.264 / AVC compressed region comprising the step of analyzing the image using the detected prediction information.

H.264/AVC, 비디오 압축, MPEG, 부호화, 복호화, 인트라, 동영상, 에지, Intra Prediction. H.264 / AVC, Video Compression, MPEG, Encoding, Decoding, Intra, Movie, Edge, Intra Prediction.

Description

Method for Real-time Edge Detection in H.264 / ACC Compression Domain {Method For Real-time Edge Extraction In H.264 / AVC Compression Domain}

도 1은 H.264/AVC의 부호화기 블록도,1 is an encoder block diagram of H.264 / AVC,

도 2는 H.264/AVC에서 4X4 블록에 대해 가능한 인트라 예측모드를 나타낸 도면,2 shows possible intra prediction modes for 4 × 4 blocks in H.264 / AVC,

도 3은 H.264/AVC에서 4X4 블록에 대해 가능한 인트라 예측모드를 나타낸 도면,3 shows possible intra prediction modes for 4 × 4 blocks in H.264 / AVC,

도 4는 H.264/AVC에서 16X16 블록에 대해 가능한 인트라 예측모드를 나타낸 도면, 4 shows a possible intra prediction mode for 16 × 16 blocks in H.264 / AVC,

도 5는 H.264/AVC에서 인트라 예측의 공간적 특성을 나타낸 도면, 5 is a diagram showing the spatial characteristics of intra prediction in H.264 / AVC,

도 6은 본 발명에 따라 인트라 예측모드를 추출한 일 실시예를 나타낸 도면,6 is a diagram illustrating an embodiment of extracting an intra prediction mode according to the present invention;

도 7은 본 발명에 따라 DCT residual을 이용하여 추출된 에지를 필터링한 일 실시예를 나타낸 도면,7 is a diagram illustrating an embodiment of filtering edges extracted using DCT residual according to the present invention;

도 8은 본 발명에 따라 추출한 에지로부터 대표성분을 추출하는 일 실시예를 나타낸 도면,8 is a view showing an embodiment of extracting a representative component from the edge extracted in accordance with the present invention,

도 9는 전체 영상을 16개의 부 영상으로 분할한 일 실시예를 나타낸 도면이 다. 9 is a diagram illustrating an embodiment of dividing an entire image into 16 sub-images.

최근에는 이동통신 및 위성통신의 급속한 발달로 정보화 사회에서 무선통신 서비스의 역할이 더욱 중요하게 되고, 종래의 음성이나 문자 정보의 전송뿐만 아니라 무선으로 인터넷에 접속하거나 동영상 통신이 가능한 멀티미디어 무선통신 서비스가 보급되고 있다. 특히, IMT-2000 사업과 위성 DMB(Digital Multimedia Broadcasting) 시스템을 이용한 4세대 이동통신 등에서는 고화질의 동영상을 실시간으로 전송할 수 있는 환경이 구축되고 있다.Recently, due to the rapid development of mobile communication and satellite communication, the role of the wireless communication service becomes more important in the information society, and the multimedia wireless communication service capable of wirelessly accessing the Internet or video communication as well as the transmission of conventional voice or text information It is spreading. In particular, in the 4th generation mobile communication using IMT-2000 business and satellite digital multimedia broadcasting (DMB) system, an environment capable of transmitting high quality video in real time is being established.

이와 같은 기술이 상용화될 수 있었던 것은 무엇보다 아날로그 영상 신호를 양자화나 가변장 부호화 등으로 디지털 처리(Encoding)를 한 다음, 이를 디지털 신호로 송신하고, 수신된 단말기에서는 이를 다시 복호화(Decoding)함으로써, 빠른 전송 속도와 풍부한 정보를 전송할 수 있는 동영상 압축 기술의 발달로 인하여 가 능하게 되었다. 즉, 디지털방송에 있어서의 특징은 동영상 정보를 디지털화하여 압축함으로써 제한된 전송로에서 효율적인 서비스가 가능하게 된 것으로, 동영상의 압축 기술은 서비스의 성격 및 품질을 좌우하는 중요한 기술로 인정되고 있다.The above-mentioned technology has been commercialized, among other things, by digitally encoding an analog video signal by quantization or variable-length encoding, and then transmitting it as a digital signal, which is then decoded by the received terminal. This is made possible by the rapid transmission speed and the development of video compression technology capable of transmitting abundant information. In other words, the characteristics of digital broadcasting enable efficient service in a limited transmission path by digitizing and compressing moving picture information. Moving picture compression technology is recognized as an important technology that determines the nature and quality of service.

그동안 방대한 정보를 저장하고 전송하기 위한 여러가지 압축 기술이 개발되어 왔으며, 특히 1980년대 후반부터는 디지털 동영상 정보의 부호화 및 기술표준 규격을 제정해야 한다는 요구가 제시되면서 기술 발전이 가속화되기 시작했다. 이에 따라 국제전기통신연합(ITU)은 유무선 통신망 환경에서 동영상 서비스를 위한 표준으로 H.261과 H.263을 제정했고, 국제표준화기구(ISO)도 동영상 표준 규격인 MPEG-1, MPEG-2, MPEG-4를 마련하는 등 국제적인 표준화 논의가 활발하게 진행되었다. H.263+ 와 MPEG-4 표준이 개발된 후 무선통신이 급격히 확산되었으며, 이에 따라 종전의 압축 방식에 비해 더욱 향상된 압축 효율을 제공하고 다양한 통신 환경을 수용할 수 있는 동영상 압축 기술 규격의 필요성이 대두되었다.Various compression technologies have been developed for storing and transmitting huge amounts of information. In particular, since the late 1980s, the development of technology has been accelerated as the demand for encoding and specification of digital video information has been proposed. Accordingly, the International Telecommunication Union (ITU) enacted H.261 and H.263 as standards for video services in wired and wireless network environments, and the International Organization for Standardization (ISO) also adopted MPEG-1, MPEG-2, International standardization discussions have been actively conducted, including the provision of MPEG-4. Since the development of the H.263 + and MPEG-4 standards, wireless communication has proliferated rapidly. Accordingly, there is a need for a video compression technology specification that provides improved compression efficiency and accommodates various communication environments compared to conventional compression methods. It has emerged.

이 후, 국제전기통신연합(ITU)과 국제표준화기구/국제전자기술위원회(ISO / IEC)가 공동 결정한 JVT(Joint Video Team)에서는 기존의 방식보다 압축 효율이 뛰어난 H.264(MPEG-4 part 10, 이하에서는 H.264라 함)라는 표준을 승인하게 되었다. H.264 는 현재 디지털방송의 표준 동영상 압축 기술로서 다양한 네트워트 환경에 쉽게 부응할 수 있는 유연성과 동영상의 부호화 효율 측면에서 H263+ 나 MPEG-2/4 등 기존 기술 표준들에 비해 많은 진보가 있었다. 즉, H.264는 기존의 표준기술들과 마찬가지로 하이브리드 MCP(Motion Compensated Prediction) 모델을 채택하였으나, 기존 H.263+ 또는 MPEG-4(part2) 대비 50%의 압축효율을 가지며, 지속적인 고 품질 동영상 전송을 보장한다. 또한, H.264는 패킷망에서의 패킷 손실 및 무선 네트워크에서의 비트 에러 복구능력이 뛰어나고, 네트워크 적응 계층(Network Application Layer)을 통해 상이한 네트워크에서의 전송이 용이한 장점을 가진다.Subsequently, the Joint Video Team (JVT) jointly decided by the International Telecommunication Union (ITU) and the International Organization for Standardization and the International Electrotechnical Commission (ISO / IEC) provided H.264 (MPEG-4 part) with higher compression efficiency than conventional methods. 10, hereinafter referred to as H.264). H.264 is the standard video compression technology of digital broadcasting. There are many advances compared to existing technical standards such as H263 + and MPEG-2 / 4 in terms of flexibility and video coding efficiency that can easily accommodate various network environments. In other words, H.264 adopts Hybrid Motion Compensated Prediction (MCP) model like existing standard technologies, but has 50% compression efficiency compared to H.263 + or MPEG-4 (part2), and continuously high quality video. To ensure transmission. In addition, H.264 has the advantages of excellent packet loss in packet networks and bit error recovery in a wireless network, and easy transmission in different networks through a network application layer.

도 1은 H.264의 부호화기(encoder)의 구성을 나타낸 블록도이다. H.264 국제표준에서는 동영상 신호를 부호화할 때, 공간적 유사성을 이용하여 인트라 부호화 하는 방법과 시간적 차이에 따라 영상 프레임의 유사성을 이용하여 인터 부호화 하는 방법으로 나뉜다. 1 is a block diagram showing the configuration of an H.264 encoder. In the H.264 international standard, video signals are classified into intra coding using spatial similarity and inter coding using video frame similarity according to temporal differences.

도 1을 참조하면, 인트라 예측부에서는 이미 디코딩이 수행된 그림 내에서 예측하고자 하는 블록에 인접한 블록의 픽셀 데이터를 이용하여 예측을 수행하고, 인터 예측에서는 이미 디코딩이 수행되고 디블로킹 필터링이 수행되어 버퍼에 저장되어 있는 기준 픽처를 이용하여 현재 픽처의 블록 예측을 수행한다. 상기와 같은 과정을 통해 얻은 예측 샘플은 변환되고 양자화되어서 압축된다. 엔트로피 코딩부는 양자화된 동영상 데이터에 대해 소정의 방식에 따라 부호화를 수행하여 비트 스트림(Bit stream)으로 만들어 NAL로 전송한다. Referring to FIG. 1, the intra prediction unit performs prediction by using pixel data of a block adjacent to a block to be predicted in a picture in which decoding has already been performed, and in inter prediction, decoding is already performed and deblocking filtering is performed. Block prediction of the current picture is performed using the reference picture stored in the buffer. The prediction sample obtained through the above process is transformed, quantized and compressed. The entropy coding unit performs encoding on the quantized video data according to a predetermined method to form a bit stream and transmits the bit stream to the NAL.

부호화 방법에 있어서, H263+ 나 MPEG-2/4 등에서는 16x16의 화소들로 구성된 매크로 블록 단위로 8x8 블록에 대하여 부호화를 진행하지만, H.264에서는 16x16 인 매크로 블록과 4x4 부블록에 대하여 부호화를 진행한다. 인트라 부호화 방법은 16x16 매크로 블록과 4x4 블록들에 대한 인접 화소들에 의한 방향성에 기인하여 얻게 되는 예측 값과의 차이 값(SAD:Sum of Absolute Difference)을 구하여 부호화함으로써 효율을 증가시키는 방법이다. 4x4 블록들에 대해서는 9가지의 다른 모드에 의하여 부호화를 진행하고, 16x16의 매크로 블록에 대해서는 블록단위로 4가지 모드에 의하여 부호화를 진행한다. 즉, 공간영역의 다수의 예측 모드를 사용함으로써 실제 부호화 대상인 잔여신호를 최소화하여 압축 효율을 향상시킨다.In the encoding method, encoding is performed on 8x8 blocks in units of macroblocks of 16x16 pixels in H263 + and MPEG-2 / 4, whereas encoding is performed on 16x16 macroblocks and 4x4 subblocks in H.264. do. The intra coding method is a method of increasing efficiency by obtaining and encoding a difference value (SAD: Sum of Absolute Difference) between a prediction value obtained due to directionality of adjacent pixels with respect to 16x16 macroblock and 4x4 blocks. For 4x4 blocks, encoding is performed in nine different modes, and for 16x16 macroblocks, encoding is performed in four modes on a block basis. That is, by using a plurality of prediction modes in the spatial domain, the compression efficiency is improved by minimizing the residual signal that is the actual encoding target.

도 2는 4x4 블록의 밝기 성분(luminance)에 대한 인트라 예측 부호화의 9가지 예측 모드를 도시하였고, 도 3은 4x4 블록에 대한 인트라 예측 방법을 도시하였으며, 도 4는 16x16 매크로 블록에 대한 인트라 예측 방법을 도시한 도면을 나타내었다.FIG. 2 illustrates nine prediction modes of intra prediction coding on luminance components of a 4x4 block, FIG. 3 illustrates an intra prediction method for a 4x4 block, and FIG. 4 illustrates an intra prediction method for a 16x16 macroblock. The figure which shows is shown.

도 2를 참조하면, 4x4 블록에 있어서 인트라 예측 부호화 모드는 모드 0에서부터 모드 8까지 모두 9가지가 된다. 여기서 모드 2는 방향성이 없는 DC 모드로서 도 2에는 나타나지 않았다. 도 3을 참조하여 더욱 상세히 설명하면, 4x4 블록의 인트라 부호화는 대상 블록의 주변 화소를 이용하여 예측 블록을 만들어 내고, 이렇게 만들어진 각 예측 블록과 원본 블록간의 SAD(Sum of Absolute Difference)를 구하여, 9가지의 모드 중에서 가장 작은 SAD를 갖는 모드를 최적의 예측 모드로 선택한다.Referring to FIG. 2, there are nine types of intra prediction coding modes, from mode 0 to mode 8, in a 4x4 block. Mode 2 is a directivity DC mode, which is not shown in FIG. 2. In more detail with reference to FIG. 3, intra coding of a 4x4 block generates a prediction block using pixels around the target block, and obtains a sum of absolute difference (SAD) between each prediction block and the original block. The mode having the smallest SAD among the branch modes is selected as the optimal prediction mode.

모드 0은 수직 방향(vertical)의 예측 모드로서, 위측 4개의 화소들을 수직 방향으로 투영하여 해당 블록에 포함된 각 화소들의 화소 값을 예측하는 모드이며, 마찬가지로 모드 1은 수평 방향(horizontal)의 예측 모드, 모드 2는 방향이 없어서 평균값으로 예측하는 DC 모드, 모드 3은 좌측 대각선 방향(diag_down_left)의 예측 모드, 모드 4는 우측 대각선 방향(diag_down_right)의 예측 모드, 모드 5는 우측 수직 방향(vertical_right)의 예측 모드, 모드 6은 수평 아랫 방 향(horizontal_down)의 예측 모드, 모드 7은 좌측 수직 방향(vertical_left)의 예측 모드, 모드 8은 수평 윗 방향(horizontal_up)의 예측 모드를 각각 나타낸다. Mode 0 is a vertical prediction mode, in which the upper four pixels are projected in the vertical direction to predict pixel values of the pixels included in the block. Similarly, mode 1 is a horizontal prediction. Mode, Mode 2 is the DC mode predicting the average value because there is no direction, Mode 3 is the prediction mode of the left diagonal direction (diag_down_left), Mode 4 is the prediction mode of the right diagonal direction (diag_down_right), Mode 5 is the right vertical direction (vertical_right) Prediction mode of, mode 6 is the horizontal down direction (horizontal_down) prediction mode, mode 7 is the left vertical direction (vertical_left) prediction mode, mode 8 is the horizontal up direction (horizontal_up) prediction mode, respectively.

도 4를 참조하면, 16x16 매크로 블록에 대한 인트라 예측 방법에서는 4가지 모드를 가지며, 모드 0은 위측 16개의 화소들을 수직 방향으로 투영하여 인트라 예측 부호화를 진행하며, 마찬가지로 모드 1은 수평 방향의 예측 모드, 모드 2는 DC 모드, 모드 3은 평면 모드로 좌측 16개의 화소와 상측 16개의 화소들을 일정한 조합 형태로 인트라 예측 부호화를 진행하는 모드이다. Referring to FIG. 4, the intra prediction method for the 16x16 macroblock has four modes, and mode 0 performs intra prediction encoding by projecting the upper 16 pixels in the vertical direction. Similarly, mode 1 is the prediction mode in the horizontal direction. The mode 2 is a DC mode and the mode 3 is a planar mode in which intra prediction encoding is performed on the left 16 pixels and the upper 16 pixels in a predetermined combination.

상기한 바와 같이 H.264 는 인트라 예측 부호화에서의 공간적 예측은 기존의 동영상 압축 기술에 없었던 것으로 동일한 영상 내에서 현재 블록은 주변 블록과 공간적 유사성을 가진다는 특성을 이용하여 공간적 중복성을 제거하는 것이다. 즉, 각 블록의 주변 화소로부터 현재 블록의 예측 블록을 만들어 내어 현재 블록과의 차분치 만을 부호화 하여 비트량을 현저하게 줄이는 방법을 이용한 것이다.As described above, H.264 removes spatial redundancy by using a feature that spatial prediction in intra prediction coding has not been found in a conventional video compression technique, and that a current block has a spatial similarity to neighboring blocks in the same image. In other words, the prediction block of the current block is generated from the neighboring pixels of each block, and only the difference with the current block is encoded, thereby reducing the bit amount.

본 발명은 상기와 같은 점을 감안하여 안출한 것으로서, 인트라 예측 모드와 공간 정보의 관계를 이용하여 H.264/AVC 압축 비트 스트림으로부터 부호화된 인트라 예측 모드를 추출하고 이에 해당 형상의 공간적 에지 분포를 도출함으로써, 압축된 비디오를 복호하지 않은 상태에서 분석할 수 있는 H.264/AVC 압축영역에서의 실시간 에지 검출 방법을 제공하는데 그 목적이 있다.The present invention has been made in view of the above, and extracts the encoded intra prediction mode from the H.264 / AVC compressed bit stream by using the relationship between the intra prediction mode and the spatial information, and then extracts the spatial edge distribution of the shape. By deriving, an object of the present invention is to provide a real-time edge detection method in an H.264 / AVC compressed region that can analyze a compressed video in an undecoded state.

상기와 같은 본 발명 H.264/AVC 압축영역에서의 실시간 에지 검출 방법은,As described above, the real time edge detection method in the H.264 / AVC compressed region includes

(a) H.264/AVC로 압축된 영상을 입력받는 단계;(a) receiving an image compressed with H.264 / AVC;

(b) 상기 영상의 비트스트림에서 인트라 예측 모드 정보를 이용하여 영상의 예측 정보를 검출하는 단계; 및(b) detecting prediction information of an image using intra prediction mode information in the bitstream of the image; And

(c) 상기 검출된 예측 정보를 이용하여 영상을 분석하는 단계를 포함하여 구성되는 것을 특징으로 한다. and (c) analyzing the image using the detected prediction information.

특히, 상기 (b) 단계는,In particular, step (b),

DCT 나머지(residual)의 부호화 비트수를 이용하여 검출된 영상의 예측 정보 중 일부를 DC 예측으로 변환하는 단계를 더 포함하여 구성되는 것을 특징으로 한다.And converting a part of the prediction information of the detected image into DC prediction by using the number of encoded bits of the DCT residual.

또한, 상기 검출된 검출된 영상의 예측 정보 중 일부를 DC 예측으로 변환하는 단계는,

에 의해 결정되며, 상기의 P_i _,j는 영상 가로 방향으로 i 번째, 그리고 영상 세로 방향으로 j번째 4X4 블록의 예측 모드를 나타내며, dc는 dc 예측을 의미하고, Dth는 스레숄드(threshold)를, Lr은 DCT 나머지(residual)의 비트수를 나타내는 것을 특징으로 한다.In addition, converting a part of the prediction information of the detected detected image to DC prediction,

P _i _{, j} denotes the prediction mode of the i-th 4X4 block in the image horizontal direction and the j-th direction in the image vertical direction, dc means dc prediction, and Dth denotes a threshold. Lr is characterized in that it represents the number of bits of the DCT residual.

또한, 상기 (b) 단계 후 (c) 단계 전에 있어서, Further, after step (b) and before step (c),

전체 영상을 겹치지 않는 k개의 부-영상으로 나누고, 각 부-영상에 포함되어 있는 상기 검출된 예측 정보들을 이용하여 각 부-영상마다 수직(VertEdge), 수평(HoriEdge), 45도(45Edge), 135도(135Edge), 비방향성(NonEdge) 다섯 가지의 에지 성분을 추출하는 단계를 포함하여 구성되는 것을 특징으로 한다.The whole image is divided into k non-overlapping k sub-images, and the vertical (HertEdge), horizontal (HoriEdge), 45 degree (45 Edge), and sub-images are used for each sub-image using the detected prediction information included in each sub-image. And extracting five edge components of 135 degrees (135 edge) and non-directional (NonEdge).

또한, 상기 검출된 예측 정보들을 이용하여 다섯 가지의 에지 성분을 추출하는 단계는,In addition, extracting the five edge components using the detected prediction information,

edge_k ^v = a*n(vertical_k) + b*n(vertical_right_k) + c*n(vertical_left_k)edge _k ^v = a * n (vertical _k ) + b * n (vertical_right _k ) + c * n (vertical_left _k )

edge_k ^h = d*n(horizontal_k) + e*n(horizontal_down_k) + f*n(horizontal_up_k)edge _k ^h = d * n (horizontal _k ) + e * n (horizontal_down _k ) + f * n (horizontal_up _k )

edge_k ⁴⁵ = g*n(diag_down_left_k) + h*n(vertical_left_k) + i*n(horizontal_up_k)edge _k ⁴⁵ = g * n (diag_down_left _k ) + h * n (vertical_left _k ) + i * n (horizontal_up _k )

edge_k ¹³⁵ = j*n(diag_down_right_k) + k*n(vertical_right_k) + l*n(horizontal_down_k)edge _k ¹³⁵ = j * n (diag_down_right _k ) + k * n (vertical_right _k ) + l * n (horizontal_down _k )

edge_k ^nondir = LinearCombination(n(vertical_k), n(horizontal_k), n(dc_k), n(diag_down_left_k), n(diag_down_right_k), n(vertical_right_k), n(horizontal_down_k), n(vertical_left_k), n(horizontal_up_k))edge _k ^nondir = LinearCombination (n (vertical _k ), n (horizontal _k ), n (dc _k ), n (diag_down_right _k ), n (diag_down_right _k ), n (vertical_right _k ), n (horizontal_down _k ), n ( vertical_left _k ), n (horizontal_up _k ))

상기 edge_k ^x 는 k 번째 부-영상의 x 에지 성분값을, 상기 a 부터 l은 임의의 상수로서 가중치를, n(Y_k)는 k 번째 부-영상 내에서 Y 성분 에지의 개수를, LinearCombination()은 괄호안 변수의 선형 조합을 나타내는 것을 특징으로 한다.The edge _k ^x Is the x-edge component value of the k-th sub-image, a to l are weighted as any constant, n (Y _k ) is the number of Y component edges in the k-th sub-image, and LinearCombination () is parentheses And a linear combination of eye variables.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다. 본 출원에서, “포함하다” 또는 “가지다” 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들의 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. A singular expression includes a plural expression unless the context clearly indicates otherwise. In this application, the terms “comprises” or “having” are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described in the specification, and one or more other It is to be understood that the present invention does not exclude the possibility of the presence or the addition of features, numbers, steps, operations, components, parts, or a combination thereof.

이하, 본 발명을 첨부도면을 참조하여 상세하게 설명한다. Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 5는 4X4 단위 블록에 적용된 인트라 예측 모드와 4X4 단위 블록의 공간적 특징관계를 나타낸다. 도 5에 도시된 바와 같이 인트라 예측 모드가 적용된 인트라 프레임의 단위 블록의 방향성(에지 방향)은 해당 블록에 적용된 예측 모드와 유사함을 알 수 있다.5 shows the spatial feature relationship between the intra prediction mode applied to the 4 × 4 unit block and the 4 × 4 unit block. As illustrated in FIG. 5, it can be seen that the direction (edge direction) of the unit block of the intra frame to which the intra prediction mode is applied is similar to the prediction mode applied to the block.

이하 이러한 인트라 예측 모드와 프레임 공간 정보의 관계를 이용하여, H.264/AVC 압축 비트 스트림으로부터 부호화된 인트라 예측 모드를 추출하여, 해당 형상의 공간적 에지 분포를 도출하는 방법에 대하여 설명한다.Hereinafter, a method of extracting the encoded intra prediction mode from the H.264 / AVC compressed bit stream using the relationship between the intra prediction mode and the frame space information to derive the spatial edge distribution of the shape will be described.

도 6은 H.264/AVC 비트스트림으로부터 추출한 인트라 예측 모드와 공간적인 에지 분포 관계를 나타낸 도면이다.6 is a diagram illustrating an intra prediction mode extracted from an H.264 / AVC bitstream and a spatial edge distribution relationship.

압축 비트스트림으로부터 직접 추출한 인트라 예측 모드를 P_i _,j로 정의하며, 상기의 i는 영상 가로방향으로 4X4블록의 인덱스를 나타내고, j는 영상 세로방향으로 4X4 블록의 인덱스를 나타내며, 따라서 i,j는

의 범위를 갖고, 상기의 X는 영상의 가로크기를, Y는 영상 세로크기를 의미한다.Intra prediction mode extracted directly from the compressed bitstream is defined as P _i _{, j} , where i denotes an index of 4 × 4 blocks in the horizontal direction of the image, j denotes an index of 4 × 4 blocks in the vertical direction of the image, and thus i, j Is

X is the horizontal size of the image, Y is the vertical length of the image.

상기의 P_i _,j는 블록 예측 정보를 나타내며, H.264/AVC 에는 4X4 블록에 9가지의 예측 모드가 존재하므로, P_ij는 vertical, horizontal, dc, diag_down_left, diag_down_right, vertical_right, horizontal_down, vertical_left, horizontal_up 중 하나가 된다.P _i _{, j} denotes block prediction information, and since 9 prediction modes exist in 4 × 4 blocks in H.264 / AVC, P _ij is vertical, horizontal, dc, diag_down_left, diag_down_right, vertical_right, horizontal_down, vertical_left, It becomes one of horizontal_up.

한편 16X16 예측모드는 영상의 DC 성분이 매우 많은 부분에 대하여 수행되므로, 본 발명에서 해당 16X16블록 내 16개의 4X4 블록은 모두 dc로 분류된다.On the other hand, since the 16X16 prediction mode is performed on a very large portion of the DC component of the image, in the present invention, all 16 4X4 blocks in the corresponding 16X16 block are classified as dc.

상기의 도 6의 하얀색 선은 4X4 블록의 경계를 나타내고, 경계 안의 각 숫자는 양자화된 주파수계수의 비트스트림 상태에서의 길이를 나타낸다.The white line of FIG. 6 represents the boundary of the 4 × 4 block, and each number within the boundary represents the length in the bitstream state of the quantized frequency coefficient.

즉, H.264/AVC의 4X4 블록은 DCT(Discrete Cosine Transform)되고 양자화되어 엔트로피 코딩(CAVLC) 되는데, 이러한 양자화된 주파수계수의 비트스트림 상태에서의 길이를 도 6에 나타낸 것이다.That is, 4x4 blocks of H.264 / AVC are Discrete Cosine Transform (DCT), quantized, and entropy coded (CAVLC). The length in the bitstream state of the quantized frequency coefficient is shown in FIG. 6.

붉은 원 안의 숫자들은 대체적으로 작은값을 가지고 있으며, 이를 통해 붉은 원 부분의 DCT계수들은 비트스트림 상에서 작은 비트길이로 부호화 되어 있다는 것을 알 수 있다. The numbers in the red circle generally have a small value, which shows that the DCT coefficients of the red circle are encoded with a small bit length in the bitstream.

도 6의 붉은 원 부분에 도시된 바와 같이, H.264/AVC 비트스트림으로부터 직 접 추출한 인트라 예측 모드는 영상의 에지 성분을 정확하게 나타내고 있지 않다. 이러한 현상은 특히 영상의 저주파 성분이 많은 부분에서 주로 발생하며, 다음 수학식 1과 같이 비트스트림 상에서의 DCT 나머지(residual)의 부호화된 비트길이를 이용하여 쉽게 보완될 수 있다.As shown in the red circle of FIG. 6, the intra prediction mode extracted directly from the H.264 / AVC bitstream does not accurately represent edge components of an image. This phenomenon mainly occurs in the low frequency component of the image, and can be easily compensated by using the coded bit length of the DCT residual on the bitstream as shown in Equation 1 below.

상기의 수학식 1에서, P_i _,j는 (i,j)번째 4X4 블록의 예측 정보이며, dc는 DC 예측모드를 의미하고, Dth는 스레숄드(threshold)를, Lr은 DCT 나머지(residual)의 비트수를 나타낸다.In Equation 1, P _i _{, j} is prediction information of the (i, j) th 4X4 block, dc means DC prediction mode, Dth is the threshold, Lr is the DCT residual Indicates the number of bits.

즉, 인트라 예측 후 DCT 변환을 거쳤을 때, 영상에 실제 에지가 강한 부분에서는 완벽한 예측이 힘들어 많은 수의 0이 아닌 데이터가 존재하며, 따라서 DCT 변환 후에 많은 DCT 나머지(residual)가 존재하게 된다.That is, when the DCT transformation is performed after the intra prediction, a perfect edge is hardly predicted in the portion where the real edge is strong in the image, so that a large number of non-zero data exists, and thus, a large DCT residual exists after the DCT transformation.

따라서, H.264/AVC 비트스트림으로부터 추출한 영상의 에지성분을 DCT 나머지(residual)의 길이를 이용하여 실제 에지성분과 그렇지 않은 에지 성분으로 분류한다. Therefore, the edge components of the image extracted from the H.264 / AVC bitstream are classified into actual edge components and edge components not using the length of the DCT residual.

상기와 같은 과정을 통해, 에지 성분을 분류한 결과를 도 7에 도시하였다. 도 7에서 확인할 수 있듯이, 붉은 원 안의 에지들이 제거되었음을 알 수 있다.Through the above process, the result of classifying the edge component is shown in FIG. As can be seen in Figure 7, it can be seen that the edges in the red circle have been removed.

이하, 도 8을 참조하여 H.264/AVC 비트스트림으로부터 전체 영상을 k 개의 부-영상으로 나누고, 각 부-영상에 내의 추출된 예측 정보를 5가지의 대표 에지정보로 분류하는 방법에 대하여 설명한다.Hereinafter, a method of dividing an entire image from k H.264 / AVC bitstreams into k sub-pictures and classifying extracted prediction information in each sub-picture into five representative edge information will be described with reference to FIG. 8. do.

추출한 에지는 총 9가지로써, 중복된 방향성을 가지고 있는 경우도 있다. 이러한 중복성을 제거하기 위하여, 전체 영상을 k 개의 부-영상으로 나누고, 각 부-영상 단위로 추출된 4x4 블록의 9가지 에지 성분을 5가지 대표 에지 성분으로 분류한다. 상기의 5가지 대표 에지 성분은 수직(edge_k ^v), 수평(edge_k ^h), 45도(edge_k ⁴⁵), 135도(edge_k ¹³⁵), 무방향성(edge_k ^nondir) 다섯가지로 구성된다. 이러한 분류는 부-영상 내의 각 성분별 에지의 개수를 이용하여 가능하다. 이를 수학식 2에 나타내었다.There are a total of nine extracted edges, which may have overlapping directionality. To remove such redundancy, the entire image is divided into k sub-images, and nine edge components of the 4 × 4 block extracted in each sub-image unit are classified into five representative edge components. The five representative edge components are composed of five vertical (edge _k ^v ), horizontal (edge _k ^h ), 45 degrees (edge _k ⁴⁵ ), 135 degrees (edge _k ¹³⁵ ), and non-directional (edge _k ^nondir ). . This classification is possible using the number of edges for each component in the sub-image. This is shown in Equation 2.

상기 edge_k ^x 는 k 번째 부-영상의 x 에지 성분을, 상기 a 부터 l은 임의의 상수로서 가중치를, n(Y_k)는 k 번째 부-영상 내에서 Y 성분 에지의 개수를, LinearCombination()은 괄호 안 변수의 선형 조합을 나타낸다.The edge _k ^x Is the x-edge component of the k-th sub-image, a to l are weighted as any constant, n (Y _k ) is the number of Y-component edges within the k-th sub-image, and LinearCombination () is in parentheses. Represents a linear combination of variables.

상기와 같은 과정을 통해 각 부-영상은 다섯 가지 대표 에지 성분을 가지게 되고, 일반적으로 영상 검색 국제표준 MPEG-7의 시각 디스크립터(Visual Descriptor) 중에서 에지 히스토그램 디스크립터(Edge Histogram Descriptor)는 5개의 대표 에지 성분을 사용하므로 본 발명의 압축 영역 검색 기법과 호환이 가능하다.Through the above process, each sub-image has five representative edge components, and among the visual descriptors of MPEG-7, the edge histogram descriptor has five representative edges. Since the component is used, it is compatible with the compressed region search technique of the present invention.

도 8은 전체 영상을 16개의 (k=16) 부-영상으로 나눈 일 실시예를 나타낸다. 각 부-영상은 9개의 4X4 블록을 포함하고, 상기 수학식 2를 통해 다섯개의 대표 에지 성분을 갖게 된다. 즉 하나의 부-영상은 수직성분 a, 수평성분 b, 45도 성분 c, 135도 성분 d, 무방향성 성분 e를 갖게 되며, 상기 a,b,c,d,e는 임의의 상수이다.8 shows an embodiment in which the entire image is divided into 16 (k = 16) sub-images. Each sub-image includes nine 4 × 4 blocks and has five representative edge components through Equation 2. That is, one sub-image has vertical component a, horizontal component b, 45 degree component c, 135 degree component d, and non-directional component e, wherein a, b, c, d, and e are arbitrary constants.

도 9는 전체 영상을 각 부 영상이 20개의 4X4 블록을 포함하도록 16개의 부-영상으로 나눈 일 실시예를 나타낸다. 9 illustrates an embodiment of dividing an entire image into 16 sub-images such that each sub-image includes 20 4 × 4 blocks.

H.264/AVC 부호화는 기본적으로 매크로블록 단위의 부호화를 수행하므로, 상기와 같은 부 영상으로 분할하는 과정은 당업자에 의해 용이하게 수행될 수 있다.Since H.264 / AVC encoding basically performs macroblock-based encoding, the process of dividing into the sub-picture as described above can be easily performed by those skilled in the art.

이상에서는 본 발명을 특정의 바람직한 실시예에 대하여 도시하고 설명하였으나, 본 발명은 이러한 실시예에 한정되지 않으며, 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 특허청구범위에서 청구하는 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 실시할 수 있는 다양한 형태의 실시예들을 모두 포함한다. While the invention has been shown and described with respect to certain preferred embodiments, the invention is not limited to these embodiments, and those of ordinary skill in the art claim the invention as claimed in the appended claims. It includes all the various forms of embodiments that can be implemented without departing from the spirit.

상기와 같은 본 발명 H.264/AVC 압축영역에서의 실시간 에지 검출 방법에 의하면, 압축된 비디오를 복호하지 않은 상태에서 실시간으로 분석할 수 있으며, 상기와 같은 분석을 통해 영상 검색, 동영상 검색, 장면 전환·편집 등의 비디오 분석 등이 가능하므로, 상당한 기술적·경제적 효과가 기대된다. According to the real-time edge detection method in the H.264 / AVC compressed region of the present invention as described above, the compressed video can be analyzed in real time without decoding, and the image search, video search, scene through the above analysis. As video analysis such as conversion and editing is possible, considerable technical and economic effects are expected.

Claims

(a) receiving an image compressed with H.264 / AVC;

(b) detecting optimal intra prediction mode information on the bitstream of the input image as spatial edge distribution information of the input image;

(c) correcting the detected spatial edge distribution information using the number of encoded bits of the DCT residual; And

and (d) analyzing the image by using the spatial edge distribution information of the detected input image.

The method according to claim 1,

In step (c),

And converting some of the spatial edge distribution information of the detected input image into the DC distribution using the number of encoded bits of the DCT residual in real time in the H.264 / AVC compressed region. Edge detection method.

The method according to claim 2,

Converting some of the spatial edge distribution information of the detected input image into a DC distribution,

Where P _{i, j} is prediction information of the (i, j) th 4X4 block, dc is DC prediction, Dth is a threshold, and Lr is a bit of DCT residual Real-time edge detection method in the H.264 / AVC compressed region characterized by the number.

The method according to any one of claims 1 to 3,

After step (c) and before step (d),

The entire image is divided into k non-overlapping k sub-images, and each of the sub-images is vertical (VertEdge), horizontal (HoriEdge), and 45 degrees (45Edge) for each sub-image by using prediction information of the detected image included in each sub-image. ), 135 degrees (135 edge), non-edge (NonEdge) five edge components comprising the step of extracting the real-time edge detection method in the compressed region.

The method according to claim 4,

For each sub-image, extracting the five edge components of vertical (VertEdge), horizontal (HoriEdge), 45 degrees (45Edge), 135 degrees (135Edge), and non-directional (NonEdge),

edge _k ^v = a * n (vertical _k ) + b * n (vertical_right _k ) + c * n (vertical_left _k )

edge _k ^h = d * n (horizontal _k ) + e * n (horizontal_down _k ) + f * n (horizontal_up _k )

edge _k ⁴⁵ = g * n (diag_down_left _k ) + h * n (vertical_left _k ) + i * n (horizontal_up _k )

edge _k ¹³⁵ = j * n (diag_down_right _k ) + k * n (vertical_right _k ) + l * n (horizontal_down _k )

edge _k ^nondir = LinearCombination (n (vertical _k ), n (horizontal _k ), n (dc _k ), n (diag_down_right _k ), n (diag_down_right _k ), n (vertical_right _k ), n (horizontal_down _k ), n ( vertical_left _k ), n (horizontal_up _k ))

The edge _k ^x Is the x-edge component of the k-th sub-image, a to l are weighted as any constant, n (Y _k ) is the number of Y-component edges within the k-th sub-image, and LinearCombination () is in parentheses. Real time edge detection method in H.264 / AVC compressed region characterized by linear combination of variables.