KR20100042632A

KR20100042632A - Video indexing method, and video indexing device

Info

Publication number: KR20100042632A
Application number: KR1020107002047A
Authority: KR
Inventors: 실바인 파브르; 레기스 소차르드; 피에르 로렌트 나가라예; 올리비에 르 미에; 필립페 기요텔; 사무엘 베르뮬렌
Original assignee: 톰슨 라이센싱
Priority date: 2007-06-29
Filing date: 2008-06-25
Publication date: 2010-04-26
Also published as: CN101690228A; CN101690228B; JP2010532121A; EP2174500A2; WO2009003885A3; WO2009003885A2; JP5346338B2; KR101488548B1

Abstract

The invention relates to a method and a device for indexing a coded video data stream. According to the invention,the video data stream comprises information relative to the location of regions of interest of each picture, said method comprises steps of: reception (T1) of coded video stream, recording the coded video stream on a recording support, decoding (T2) location information of regions of interest, selection (T3) of a region of interest per picture, decoding (T3) of video data, selecting (T4) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture, recording (T6) of the selected regions of interest.

Description

Video indexing method, and video indexing device {VIDEO INDEXING METHOD, AND VIDEO INDEXING DEVICE}

본 발명은 비디오 인덱싱 방법, 및 비디오 인덱싱 디바이스에 관한 것이다.The present invention relates to a video indexing method and a video indexing device.

몇몇 영상(picture) 처리 애플리케이션들은 영상 품질을 개선하기 위해 관심 영역들(ROI: regions of interest) 검출을 이용한다. 예를 들면, 코딩 애플리케이션들은 종종 관심 영역들을 디코딩하고 이들 영역들을 코딩하기 위해 더욱 많은 자원들을 배치한다.Some picture processing applications use regions of interest (ROI) detection to improve picture quality. For example, coding applications often decode areas of interest and place more resources to code these areas.

다른 방법들은 영상에서 관심 영역들의 검출을 가능하게 한다. 특히, 시각 파라미터들을 고려한 영상 또는 비디오의 돌출 지도(salience map)들의 확립에 기초한 방법들이 알려져 있고 영상 또는 비디오를 볼 때 사람의 눈이 오래 머무르는 영역들의 정의를 가능하게 한다.Other methods enable detection of regions of interest in the image. In particular, methods based on the establishment of salience maps of an image or video taking into account visual parameters are known and enable the definition of areas where the human eye stays long when viewing an image or video.

관심 영역들의 검출은 오늘날 주로 코딩하기 전에 관심 영역들에게 더욱 많은 대역폭을 수여함으로써, 예를 들면, 이들 영역들에 대한 양자화 스텝들을 감소시킴으로써, 코딩 동안 관심 영역들에 특권을 주는 것과 같은 방식으로 이용된다.Detection of regions of interest is used today in such a way as to privilege regions of interest during coding, primarily by giving more bandwidth to regions of interest before coding, for example by reducing quantization steps for these regions. do.

이동 전화들, PDA들, 게임 콘솔들, 휴대용 DVD 플레이어들과 같은, 이동 단말들의 출현, 디스플레이 및 스크린 기술들의 발전, 및 새로운 서비스들의 출현은 모두 결합하여 낮은 디스플레이 용량을 갖는 단말 상의 비디오의 디스플레이를 필요하게 만들었다. 예를 들면, 이동 전화 상에서 텔레비전을 수신할 가능성은 저차원(low dimension) 스크린들 상의 밀집한 영상들에 대한 디스플레이 문제들을 일으킨다.The advent of mobile terminals, such as mobile phones, PDAs, game consoles, portable DVD players, the development of display and screen technologies, and the emergence of new services all combine the display of video on a terminal with low display capacity. Made it necessary. For example, the possibility of receiving television on a mobile phone creates display problems for dense images on low dimension screens.

본 발명은 주로 관심 영역들의 검출에 관한 것이 아니라, 오히려 그들을 상이한 애플리케이션들에 대해 고려하고, 모바일이든 아니든지 간에, 낮은 디스플레이 용량을 갖는 단말 상의 영상 디스플레이 문제점을 적어도 해결할 수 있는 디바이스들 또는 애플리케이션들로의 이들 관심 영역들의 송신에 관한 것이다.The present invention does not relate primarily to the detection of regions of interest, but rather to devices or applications that consider them for different applications and at least solve the video display problem on a terminal with low display capacity, whether mobile or not. Relates to the transmission of these areas of interest.

이러한 목적을 위해, 본 발명은 코딩된 비디오 데이터 스트림을 인덱싱하기 위한 방법을 제안한다. 본 발명에 따르면, 상기 비디오 데이터 스트림은 각 영상의 관심 영역들의 위치에 관한 정보를 포함하며, 상기 방법은:For this purpose, the present invention proposes a method for indexing coded video data streams. According to the invention, the video data stream comprises information regarding the location of the regions of interest of each image, the method comprising:

― 코딩된 비디오 스트림을 수신하는 단계Receiving a coded video stream

― 상기 코딩된 비디오 스트림을 레코딩 지원 장치(recording support) 상에 레코딩하는 단계Recording the coded video stream on a recording support

― 관심 영역들의 위치 정보를 디코딩하는 단계-Decoding the location information of the regions of interest

― 영상마다 관심 영역을 선택하는 단계Selecting a region of interest for each image;

― 비디오 데이터를 디코딩하는 단계-Decoding the video data

― 영상마다 선택된 상기 관심 영역들 중에서 상기 비디오 데이터 스트림에 대한 미리 결정된 수의 관심 영역들을 선택하는 단계Selecting a predetermined number of regions of interest for the video data stream from among the regions of interest selected per image;

― 상기 선택된 관심 영역들을 레코딩하는 단계Recording the selected regions of interest

를 포함한다.It includes.

바람직한 실시예에 따르면, 상기 레코딩하는 단계 도중에,According to a preferred embodiment, during the recording step,

― 상기 선택된 관심 영역들은 그들이 선택되고 디코딩되고 있을 때 임시 메모리에 레코딩되고,The selected regions of interest are recorded in temporary memory when they are selected and decoded,

― 상기 선택된 관심 영역들 모두가 상기 임시 메모리에 레코딩될 때, 상기 선택된 관심 영역들은 장기 메모리 지원 장치(permanent memory support)(503)에 전송된다.When all of the selected regions of interest are recorded in the temporary memory, the selected regions of interest are transmitted to a permanent memory support 503.

바람직하게는, 그들을 레코딩하기 전에 상기 관심 영역들은 상기 선택된 관심 영역들 모두에 대해 균질한 크기(homogeneous size)를 획득하기 위해 포맷된다.Advantageously, prior to recording them the regions of interest are formatted to obtain a homogeneous size for all of the selected regions of interest.

바람직하게는, 본 방법은 암호화 키(encryption key)의 도움으로 상기 관심 영역들의 위치를 암호화하는 단계를 포함한다.Advantageously, the method comprises encrypting the location of said regions of interest with the aid of an encryption key.

바람직하게는, 본 방법은 사용자에 의한 지불(payment)에 따라 해독 키를 획득하는 단계를 포함한다.Preferably, the method includes obtaining a decryption key in accordance with a payment by a user.

바람직하게는, 상기 비디오 데이터 스트림은 코딩 표준 H.264/AVC에 따라 코딩되고, 상기 위치 정보는 SEI(Supplemental Enhancement Information) 유형 메시지에 포함된다.Advantageously, said video data stream is coded according to coding standard H.264 / AVC and said location information is included in a Supplemental Enhancement Information (SEI) type message.

바람직한 실시예에 따르면, 상기 SEI 메시지들은 RTP(real-time protocol) 패킷들 내에 캡슐화되며, 상기 RTP 패킷들은 암호화된다.According to a preferred embodiment, the SEI messages are encapsulated in real-time protocol (RTP) packets, and the RTP packets are encrypted.

바람직하게는, 관심 영역들 위치 정보에 관한 상기 SEI 유형 메시지들은 그들이 참조하는 각각의 영상의 전 또는 후에 코딩된 데이터에 삽입된다.Advantageously, said SEI type messages about regions of interest location information are inserted into coded data before or after each picture they refer to.

바람직한 실시예에 따르면, 상기 위치 정보는:According to a preferred embodiment, the location information is:

― 각 영상의 관심 영역들의 수,The number of regions of interest of each image,

― 각각의 영상 차원들에 대한 각 관심 영역의 좌표들,The coordinates of each region of interest for each image dimension,

― 각 관심 영역의 표면,The surface of each region of interest,

― 상기 영상의 다른 관심 영역들에 관한 상기 관심 영역의 중요성에 관한 가중치,A weight relating to the importance of the region of interest with respect to other regions of interest of the image,

― 각 관심 영역의 콘텐트에 관한 정보,Information about the content of each region of interest,

및 이 정보의 임의의 조합으로부터 선택되는 정보를 포함한다.And information selected from any combination of this information.

바람직하게는, 상기 영상마다 관심 영역을 선택하는 단계는 상기 관심 영역의 중요성에 관한 상기 가중치에 따라 관심 영역을 선택한다.Preferably, selecting the region of interest for each image selects the region of interest according to the weights related to the importance of the region of interest.

바람직하게는, 상기 비디오 코딩 표준은 FMO(flexible macro-bloc ordering)를 이용하고, 상기 관심 영역들은, 다른 영상 데이터와 무관하게, 슬라이스 그룹들(slice groups)로 코딩되고, 상기 관심 영역들의 위치 정보는 상기 관심 영역들이 코딩된 상기 슬라이스 그룹 넘버들을 포함한다.Advantageously, the video coding standard uses flexible macro-bloc ordering (FMO), and the regions of interest are coded into slice groups, regardless of other image data, and location information of the regions of interest. Includes the slice group numbers in which the regions of interest are coded.

바람직하게는, 상기 SEI 메시지는 각각의 슬라이스 그룹에 대해 그것이 하나의 관심 영역에 관한 것인지를 나타내는 식별자를 포함한다.Advantageously, said SEI message includes an identifier for each slice group indicating whether it relates to one region of interest.

바람직하게는, 본 방법은 상기 SEI 메시지들을 판독하는 추가 단계를 포함하고 상기 비디오 데이터를 디코딩하는 단계는 상기 관심 영역을 포함하는 슬라이스 그룹들만을 디코딩한다.Advantageously, the method comprises the further step of reading said SEI messages and said step of decoding said video data decodes only slice groups comprising said region of interest.

본 발명은 또한 코딩된 비디오 데이터 스트림을 인덱싱하기 위한 디바이스에 관한 것이다. 본 발명에 따르면, 상기 비디오 데이터 스트림은 각 영상의 관심 영역들의 위치에 관한 정보를 포함하며, 상기 디바이스는:The invention also relates to a device for indexing a coded video data stream. According to the invention, the video data stream comprises information relating to the location of the regions of interest of each image, the device comprising:

― 상기 코딩된 비디오 스트림을 수신하기 위한 수단,Means for receiving the coded video stream,

― 상기 코딩된 비디오 스트림을 레코딩 지원 장치(503) 상에 레코딩하기 위한 수단,Means for recording the coded video stream on a recording support device 503,

― 상기 관심 영역들의 위치 정보를 디코딩하기 위한 수단(501),Means (501) for decoding the location information of the regions of interest,

― 비디오 데이터를 디코딩하기 위한 수단(501),Means (501) for decoding the video data,

― 영상마다 관심 영역을 선택하기 위한 수단(502),Means 502 for selecting a region of interest per image,

― 영상마다 선택된 상기 관심 영역들 중에서 상기 비디오 데이터 스트림에 대한 미리 정해진 수의 관심 영역들을 선택하기 위한 수단(502),Means 502 for selecting a predetermined number of regions of interest for the video data stream from among the regions of interest selected per image,

― 상기 선택된 관심 영역들을 레코딩하기 위한 수단(503)Means 503 for recording the selected regions of interest

을 포함한다..

영상의 관심 영역들의 검출은 일반적으로 코딩 전에 행해진다. 이 데이터는 그 후 인코딩을 용이하게 하기 위해 이용된다. 본 발명자들은 상기 관심 영역의 위치가 또한 영상의 디코딩 동안에 그리고 특히 디스플레이 용량이 제한된 디바이스 상에서의 디스플레이 동안에 중요할 수 있다는 것을 깨달았다. 사실, 수신 단말은 사실상 관심 영역들만 디스플레이하도록 선택할 수 있는데, 이것은 완전한 영상의 디스플레이에 비해 이들 영역들의 더 좋은 가시성을 갖게 할 수 있다.Detection of regions of interest of the image is generally done before coding. This data is then used to facilitate encoding. The inventors have realized that the location of the region of interest may also be important during decoding of the image and especially during display on devices with limited display capacity. In fact, the receiving terminal may choose to display only the regions of interest in nature, which may have better visibility of these regions as compared to the display of the complete image.

본 발명은 부록에 첨부된 도면들을 참고로 하여, 결코 제한하려는 것이 아닌, 실시예들 및 구현들에 의하여 더욱 이해되고 예시될 것이다.
도 1은 본 발명의 바람직한 실시예에 따른 코딩 디바이스를 나타내는 도면.
도 2는 본 발명의 바람직한 실시예에 따른 코딩 방법을 나타내는 도면.
도 3은 본 발명의 바람직한 실시예에 따른 디코딩 디바이스를 나타내는 도면.
도 4는 본 발명의 또 다른 실시예에 따른 디코딩 방법을 나타내는 도면.
도 5는 본 발명의 또 다른 실시예에 따른 개인용 레코딩 유형 디바이스를 나타내는 도면.
도 6은 본 발명의 일 실시예를 구현하는 개인용 레코딩 유형 디바이스에서의 인덱싱 방법을 나타내는 도면.The invention will be further understood and illustrated by embodiments and implementations, which are by no means intended to be limiting with reference to the accompanying drawings.
1 illustrates a coding device according to a preferred embodiment of the present invention.
2 illustrates a coding method according to a preferred embodiment of the present invention.
3 illustrates a decoding device according to a preferred embodiment of the present invention.
4 illustrates a decoding method according to another embodiment of the present invention.
5 illustrates a personal recording type device according to another embodiment of the present invention.
FIG. 6 illustrates an indexing method in a personal recording type device implementing one embodiment of the present invention. FIG.

도 1은 본 발명의 바람직한 실시예를 구현하는 코딩 표준 H.264/AVC에 따른 코딩 디바이스를 나타낸다. 이러한 바람직한 실시예에서, 비디오 스트림은 코딩된다.1 shows a coding device according to the coding standard H.264 / AVC, which implements a preferred embodiment of the invention. In this preferred embodiment, the video stream is coded.

현재 프레임(F_n)은 그것에 의해 코딩되도록 코더 입력에서 제공된다. 이 프레임은 슬라이스들의 형태로 코딩되는데, 즉 그것은 각각이 16×16 픽셀의 그룹들에 대응하는 특정 수의 매크로블록들을 포함하는 서브-유닛들로 분할된다. 각 매크로블록은 인트라 또는 인터 모드에서 코딩된다. 인트라 모드에 있는지 또는 인터 모드에 있는지 간에, 매크로블록은 재구성된 프레임(reconstructed frame)에 기초함으로써 코딩된다. 모듈(109)은 현재 영상의 인트라 모드에서, 영상의 콘텐트에 따라, 코딩 모드를 판정한다. 인트라 모드에서, (도 1에 도시된) P는 이전에 코딩되고, 디코딩되고, 재구성된 현재 프레임 Fn의 샘플들(도 1의 uF'n, u는 필터링되지 않은 것을 의미함)을 포함한다. 인터 모드에서, P는 하나 이상의 F'_n-1 프레임에 기초한 움직임 추정(motion estimation)으로부터 구성된다.The current frame F _n is provided at the coder input to be coded by it. This frame is coded in the form of slices, ie it is divided into sub-units each containing a certain number of macroblocks corresponding to groups of 16x16 pixels. Each macroblock is coded in intra or inter mode. Whether in intra mode or inter mode, the macroblock is coded by based on a reconstructed frame. The module 109 determines, in the intra mode of the current image, the coding mode according to the content of the image. In the intra mode, P (shown in FIG. 1) includes samples of the current frame Fn that was previously coded, decoded, and reconstructed (uF'n in FIG. 1, u means unfiltered). In inter mode, P is constructed from motion estimation based on one or more F ′ _n−1 frames.

움직임 추정 모듈(101)은 현재 프레임(Fn)과 적어도 하나의 선행 프레임(F'n-1) 사이의 움직임의 추정을 확립한다. 이 움직임 추정으로부터, 움직임 보상 모듈(102)은 현재 영상(Fn)이 인터 모드에서 코딩되어야만 할 때 프레임(P)을 생성한다.The motion estimation module 101 establishes an estimate of the motion between the current frame Fn and at least one preceding frame F′n-1. From this motion estimation, motion compensation module 102 generates frame P when the current picture Fn must be coded in inter mode.

감산기(103)는, 코딩될 영상(Fn)과 영상(P) 사이의 차이인, 신호(Dn)를 생성한다. 그 후 이 영상은 모듈(104)에서 DCT 변환에 의해 변환된다. 변환된 영상은 그 후 양자화 모듈(105)에 의해 양자화된다. 그 후, 그 영상들은 모듈(111)에 의해 재편성된다. CABAC(Context-based Adaptive Binary Arithmetic Coding) 유형 엔트로피 코딩 모듈(112)은 그 후 각각의 영상을 코딩한다.The subtractor 103 generates a signal Dn, which is the difference between the image Fn and the image P to be coded. This image is then converted by DCT conversion in module 104. The converted image is then quantized by quantization module 105. The images are then reorganized by the module 111. Context-based Adaptive Binary Arithmetic Coding (CABAC) type entropy coding module 112 then codes each image.

각각 양자화 및 역변환의 모듈들(106 및 107)은 변환 및 양자화 후의 역양자화 및 역변환 후에 차이(D'n)가 재구성되게 해준다.Modules 106 and 107 of quantization and inverse transformation, respectively, allow the difference D'n to be reconstructed after inverse quantization and inverse transformation after transform and quantization.

영상이, 모듈(109)에 따라, 인트라 모드에서 코딩될 때, 인트라 예측 모듈(108)은 영상을 코딩한다. uF'n 영상은, D'n 신호와 P 신호의 합으로서, 가산기 출력(114)에서 획득된다. 이 모듈(108)은 또한 재구성된 필터링되지 않은 F'n 영상을 입력에서 수신한다.When an image is coded in intra mode, according to module 109, intra prediction module 108 codes the image. The uF'n image is obtained at the adder output 114 as the sum of the D'n signal and the P signal. This module 108 also receives a reconstructed unfiltered F'n image at the input.

필터 모듈(110)은 uF'n 영상으로부터 재구성되고 필터링된 F'n 영상을 획득할 수 있다.The filter module 110 may obtain a reconstructed and filtered F′n image from the uF′n image.

엔트로피 디코딩 모듈(112)은 NAL 유형 유닛들에 캡슐화된 코딩된 슬라이스들을 송신한다. NAL들은, 슬라이스들뿐만 아니라, 예를 들면, 헤더들에 관한 정보를 포함한다. NAL 유형 유닛들은 모듈(113)에 송신된다.Entropy decoding module 112 transmits the coded slices encapsulated in NAL type units. NALs contain information about the headers, as well as slices, for example. NAL type units are sent to module 113.

모듈(116)은 관심 영역들이 결정되게 해준다. 몇몇 방법들은 이제 관심 영역들이 영상에 배치될 수 있게 해준다. 특히 돌출 지도들의 확립에 기초한 방법들이 알려져 있다.Module 116 allows areas of interest to be determined. Some methods now allow regions of interest to be placed in the image. In particular, methods based on the establishment of protrusion maps are known.

예를 들면, 2006년 1월 10일에 출원되고 2006년 7월 13일에 공개된 Thompson Licensing 명의의 특허 출원 WO2006/07263은 돌출 지도를 확립하기 위한 효과적인 방법을 개시한다.For example, the patent application WO2006 / 07263, filed on January 10, 2006 and published on July 13, 2006, discloses an effective method for establishing protrusion maps.

수단(116)은 그 후 비디오의 각 영상에 대한 돌출 지도를 확립한다. 이 돌출 지도를 확립하기 위해, 사용자에 의해 입력된 파라미터들이 또한 고려될 수 있다. 예를 들면, 비디오가 관련된 이벤트에 따라, 촬영된 장면의 어떤 중요한 객체들을 정의하는 것이 가능하고 특히 스포츠 이벤트들에 대해서 그것이 축구 경기에 관한 것이라는 것을 지정하는 것이 가능하다. 유익하게는, 이것은 이벤트에 따라 돌출 존들(salience zones)에 가중치를 주는 돌출 지도가 획득되게 한다. 축구 경기에서는, 테라스들보다는 오히려 공에 초점을 맞추는 것이 바람직할 것이다.The means 116 then establishes a salient map for each image of the video. In order to establish this protrusion map, the parameters entered by the user can also be considered. For example, according to the event with which the video is related, it is possible to define certain important objects of the shot scene and in particular for sporting events it is possible to specify that it is about a soccer game. Advantageously, this allows a salience map to be obtained that weights salience zones according to the event. In a football game, it would be desirable to focus on the ball rather than the terraces.

따라서 관심 영역 모듈은, 관심 영역들이라고도 칭해지는, 하나 이상의 돌출 존이 추출되게 해준다. 이들 관심 영역들은 그 후 지리학적으로 영상 위에 배치된다.The region of interest module thus allows one or more protruding zones, also referred to as regions of interest, to be extracted. These regions of interest are then geographically placed on the image.

그것들은 영상의 높이 및 폭에 따른 그들의 좌표들에 의해 식별된다. 그것들의 크기가 또한 각각의 관심 영역들에 대해서 추출될 수 있다. 그것들을 의미론적 정보(semantic information)의 엘리먼트와 연관시키는 것도 가능하다. 사실 축구 경기에 대해서, 사용자가 몇몇의 디스플레이되야 하는 관심 영역들의 선택으로부터 디스플레이되야 하는 관심 영역들을 선택할 수 있다면 관심 영역에 대한 정보를 요청할 수 있다.They are identified by their coordinates along the height and width of the image. Their size can also be extracted for each region of interest. It is also possible to associate them with elements of semantic information. In fact, for a football game, if the user can select areas of interest to be displayed from the selection of several areas of interest to be displayed, information about the area of interest may be requested.

모듈(115)은 관심 영역들에 관한 정보를 그것들을 SEI("Supplemental Enhancement Information") 유형 메시지로 코딩하기 위해 수신할 수 있다.Module 115 may receive information about areas of interest to code them into a "Supplemental Enhancement Information" (SEI) type message.

SEI 메시지는 이하의 표에 표시되는 바와 같이 코딩된다:SEI messages are coded as indicated in the table below:

uuid_iso_iec_11578: 우리의 메시지 유형을 디코더에 표시하기 위한 128 비트들의 단일 워드. uuid_iso_iec_11578 : A single word of 128 bits to indicate our message type to the decoder.

user_data_payload_byte: SEI 메시지의 일부분을 포함하는 8 비트들. user_data_payload_byte : 8 bits containing part of the SEI message.

일반적으로 이러한 경우에:Generally in these cases:

paloadSize = 17(바이트들)이고 따라서 UUID에 대해서는 16이고 소유 데이터(proprietary data)에 대해서는 1이다.

paloadSize = 17 (bytes) and thus 16 for UUID and 1 for proprietary data.

user_data_payload_byte:

여기에서:From here:

number_of_ROI: 영상(또는 다음의 영상들)에 존재하는 관심 영역들의 수.

number_of_ROI: The number of regions of interest present in the image (or following images).

roi_x_16: 16 픽셀들의 배수인, 영상에서의 관심 영역의 위치 X.

roi_x_16: The position of the region of interest in the image, which is a multiple of 16 pixels.

roi_y_16: 16 픽셀들의 배수인, 영상에서의 관심 영역의 위치 Y.

roi_y_16: The position Y of the region of interest in the image, which is a multiple of 16 pixels.

roi_w_16: 16 픽셀들의 배수인, 영상에서의 관심 영역의 폭 W.

roi_w_16: The width W of the region of interest in the image, which is a multiple of 16 pixels.

roi_h_16: 16 픽셀들의 배수인, 영상에서의 관심 영역의 높이 H.

roi_h_16: The height H of the region of interest in the image, which is a multiple of 16 pixels.

semantic_information: 관심 영역을 특징짓는 타이틀.

semantic_information: The title that characterizes the region of interest.

Relative weights: 원칙적으로 가장 큰 관심을 갖는 관심 영역이 어느 것인지를 아는 방식으로 영상의 각 관심 영역의 가중치를 제공.

Relative weights: In principle, we provide the weights of each region of interest in the image in such a way as to know which region of interest is of greatest interest.

Macroblock_alignment: 관심 영역이 발견되는 시작 매크로블록의 넘버뿐만 아니라, 매크로블록들의 수로, 폭으로 및 높이로 관심 영역의 크기를 제공.

Macroblock_alignment: Provides the size of the region of interest in number, width and height of the macroblocks, as well as the number of the starting macroblock in which the region of interest is found.

관심 영역들이 돌출 지도를 이용하여 검출될 때, 돌출의 레이트(rate of salience)가 각 관심 영역에 대해서 획득되고, 영역들은 그들의 돌출이 돌출 지도들을 획득하기 위한 방법에 의해 미리 결정되는 특정 임계치보다 높다면 돌출로서 분류된다. 그러므로, SEI 메시지들에서, 관심 영역들은 돌출이 고정된 임계치보다 더 높은 모든 영역에 대해서 돌출의 오름차순으로 분류된다.When regions of interest are detected using the salient map, the rate of salience is obtained for each region of interest, and the regions are higher than a certain threshold whose salience is predetermined by the method for obtaining salient maps. Are classified as facet protrusions. Therefore, in SEI messages, regions of interest are sorted in ascending order of protrusion for all regions where the protrusion is higher than a fixed threshold.

모듈(113)은 SEI 메시지를 데이터 스트림 안에 삽입하고 이에 따라 코딩된 비디오 스트림을 송신 네트워크에 보낸다.Module 113 inserts the SEI message into the data stream and accordingly sends the coded video stream to the transmitting network.

SEI 메시지는 그것이 참조하는 각 영상 전에 송신된다.The SEI message is sent before each image it references.

다른 실시예들에서, 적어도 하나의 관심 영역의 위치가 두 개 이상의 영상들 사이에서 변할 때만 SEI 메시지를 송신하는 것도 가능하다. 그러므로, 디코딩 동안, 디코더는 수신된 마지막 SEI 메시지를 고려하는데, 그것이 디코딩되야 하는 영상의 바로 전의 것이든 또는 현재 영상이 그러한 SEI 메시지에 의해 선행되지 않는다면 그것이 이전에 수신된 영상에 관한 것이든 간에, 그 수신된 마지막 SEI 메시지를 고려한다.In other embodiments, it is also possible to transmit the SEI message only when the location of the at least one region of interest varies between two or more images. Therefore, during decoding, the decoder considers the last SEI message received, whether it is just before the picture to be decoded or if it is about a previously received picture unless the current picture is preceded by such an SEI message. Consider the last SEI message received.

도 2는 본 발명의 바람직한 실시예를 구현하는 코딩 표준 H.264/AVC에 따른 코딩 방법을 나타낸다.2 illustrates a coding method according to the coding standard H.264 / AVC, which implements a preferred embodiment of the present invention.

스텝(E1) 동안, 방송될 비디오에 연관된 돌출 지도가 결정된다. 관심 영역들을 나타내는 이 돌출 지도를 결정하기 위해, 비디오 콘텐트에 관한 정보가 또한 돌출 지도의 확립 동안에 이 정보를 고려하기 위해 수신될 수 있다. 특히, 스포츠 이벤트 동안, 공의 위치가 사용자에 대한 관심 영역에 대응하는 것으로 간주될 수 있고, 이러한 경우에, 공이 위치한 영상의 존들에 특권을 준다. 비디오가 방영되는 보도의 방송에 대응할 때, 뉴스 캐스터(presenter)가 관심 영역에 대응하는 것이 또한 가정될 수 있고, 이러한 경우에, 공지된 영상 처리 기법들을 이용하여, 예를 들면, 얼굴을 검출함으로써 뉴스 캐스터를 포함하는 존들에 특권을 줌으로써 관심 영역들을 결정한다.During step E1, a salient map associated with the video to be broadcast is determined. To determine this salient map representing regions of interest, information about the video content may also be received to take this information into account during the establishment of the salient map. In particular, during a sporting event, the location of the ball may be considered to correspond to the region of interest for the user, in which case the zones of the image where the ball is located are privileged. When the video corresponds to the broadcast of the sidewalk where it is broadcast, it can also be assumed that the newscaster corresponds to the region of interest, in which case, using known image processing techniques, for example, by detecting a face Determine areas of interest by giving privileges to zones that contain newscasters.

E1 스텝의 끝에서, 비디오 콘텐트에 관한 하나 이상의 관심 영역이 이에 따라 획득된다.At the end of the E1 step, one or more regions of interest regarding the video content are thus obtained.

스텝(E2) 동안, 영상들 내의 관심 영역들의 좌표들이 결정된다. 관심 영역들의 크기도 또한 픽셀들로 결정될 수 있고 콘텐트 상의 의미론적 정보가 각 관심 영역과 연관될 수 있다.During step E2, the coordinates of the regions of interest in the images are determined. The size of the regions of interest can also be determined in pixels and semantic information on the content can be associated with each region of interest.

동시에, 스텝(E3) 동안, 비디오 스트림은 코딩 표준 H.264에 따라 코딩된다. 코딩 동안, 관심 영역들로서 검출된 존들은 특권이 주어진다. 코딩 레벨에서 관심 영역들에 특권을 주기 위해, 보다 낮은 양자화 스텝이 그것들에 적용된다.At the same time, during step E3, the video stream is coded according to coding standard H.264. During coding, zones detected as regions of interest are privileged. In order to privilege regions of interest at the coding level, lower quantization steps are applied to them.

스텝(E2) 다음에, 스텝(E4) 동안, 관심 영역들에 연관된 위치 및 의미론적 정보로부터 SEI 메시지가 생성된다. 이에 따라 생성된 SEI 메시지는 표 1 및 2에서 전술한 SEI 메시지에 따른다.Following step E2, during step E4, an SEI message is generated from the position and semantic information associated with the regions of interest. The SEI message generated according to this follows the SEI message described above in Tables 1 and 2.

스텝(E5) 동안, 스트림은 H.264 표준에 따른 코딩된 스트림을 획득하기 위해 SEI 메시지들을 스트림 안으로 삽입함으로써 구성된다.During step E5, the stream is constructed by inserting SEI messages into the stream to obtain a coded stream according to the H.264 standard.

이에 따라 코딩된 비디오 스트림은 스텝(E6) 동안 실시간으로 또는 지연되는 방식으로 디코딩 디바이스들에 송신되며, 디코딩 디바이스들은 로컬 또는 원격에 있을 수 있다.The coded video stream is thus transmitted to the decoding devices in real time or in a delayed manner during step E6, which may be local or remote.

도 3은 코딩 표준 H.264/AVC에 따른, 본 발명에 따른 디코딩 디바이스의 바람직한 실시예를 나타낸다.3 shows a preferred embodiment of the decoding device according to the invention, according to the coding standard H.264 / AVC.

209 모듈은 입력에서 SEI 메시지들을 수신한다. 그것은 상이한 SEI 메시지들을 추출한다. 유용한 데이터의 NAL들이 엔트로피 디코딩 모듈(201)에 송신된다.The 209 module receives SEI messages on input. It extracts different SEI messages. NALs of useful data are sent to the entropy decoding module 201.

SEI 메시지들은 모듈(210)에 의해 분석된다. 이 모듈은 관심 영역들을 나타내는 SEI 메시지들의 콘텐트의 디코딩을 가능하게 한다. 각 영상의 관심 영역들은 이에 따라 간단한 방식으로 디코딩 디바이스의 레벨에서 그리고 필드 macroblock_alignment에 포함된 정보를 이용하여 각 영상의 디코딩 전에 식별된다.SEI messages are analyzed by module 210. This module enables decoding of the content of SEI messages representing regions of interest. The regions of interest of each image are thus identified in a simple manner at the level of the decoding device and before decoding of each image using the information contained in the field macroblock_alignment.

매크로블록들은 계수들(coefficients)의 세트를 획득하기 위해 재-정렬(re-ordering) 모듈(202)에 송신된다. 이들 계수들은 모듈(203)에서 역 양자화를 및 모듈(204)에서 역 DCT 변환을 겪고, 모듈(204)의 출력에서 D'n 매크로블록들이 획득되고, D'n은 Dn의 변형된 버전이다. 매크로블록 uF'n을 재구성하기 위해, 가산기(205)에 의해, D'n에 예측 블록(P)이 추가된다. 블록(P)은, 인터 모드에서 코딩하는 동안에는, 선행하는 디코딩된 프레임의, 모듈(208)에 의해 수행되는, 움직임 보상 후에 또는 인트라 모드에서 코딩하는 경우에는, 모듈(207)에 의한, 매크로블록(uF'n)의 인트라 예측 후에 획득된다. 필터(206)는 왜곡의 효과들을 감소시키기 위해 신호(uF'n)에 적용되고 재구성된 프레임(F'n)은 연속된 매크로블록들로부터 생성된다.Macroblocks are sent to the re-ordering module 202 to obtain a set of coefficients. These coefficients undergo inverse quantization at module 203 and inverse DCT transform at module 204, and D'n macroblocks are obtained at the output of module 204, where D'n is a modified version of Dn. In order to reconstruct the macroblock uF'n, the adder 205 adds the prediction block P to D'n. Block P is a macroblock, performed by module 208 of the preceding decoded frame while coding in inter mode, by module 207 after motion compensation or when coded in intra mode. Obtained after intra prediction of (uF'n). Filter 206 is applied to signal uF'n to reduce the effects of distortion and a reconstructed frame F'n is generated from successive macroblocks.

SEI 메시지들에 포함된 관심 영역들에 관한 정보를 이용하여, 관심 영역들을 나타내는 블록들이 스트림에서 검출되고, 디스플레이하기 전에, 이들 블록들은 식별되고 사용자의 선택에 따라 잘려서(cropped) PDA, 또는 이동 전화와 같은 디바이스에 디스플레이하기 위해 송신될 수 있다.Using information about the regions of interest included in the SEI messages, before blocks representing the regions of interest are detected in the stream and displayed, these blocks are identified and cropped at the user's choice, such as a PDA or mobile phone. May be sent for display to a device such as a.

예를 들면, 의미론적 정보를 입력함으로써, 사용자가 디스플레이하기를 원하는 매크로블록을 선택하도록 사용자에게 선택을 맡기는 것도 가능하다. 그는 예를 들면 "공"을 입력하고, 이 경우에는 공을 포함하는 관심 영역들이 디스플레이된다. 어떠한 관심 영역도 이 의미에 연관되지 않는다면, 관심 영역들 모두가 디스플레이될 수 있다. 상이한 관심 영역들은 스크린 상에 모자이크의 형태로 디스플레이될 수 있다. 단 하나의 관심 영역이 디스플레이될 때, 이 관심 영역은 전체 스크린을 차지하도록 스크린 상에서 확대되어 디스플레이된다.For example, by entering semantic information, it is also possible to entrust the user to select a macroblock that the user wants to display. He enters "ball", for example, in which case the areas of interest containing the ball are displayed. If no region of interest is relevant to this meaning, all of the regions of interest may be displayed. Different regions of interest may be displayed in the form of a mosaic on the screen. When only one region of interest is displayed, this region of interest is enlarged and displayed on the screen to occupy the entire screen.

디코딩 디바이스는 이에 따라 사용자가 관심을 갖는 정보를 포함할 것 같은 매크로블록들만 디코딩한다. 이러한 방식으로 디코딩은 더욱 빠르고 디코딩 디바이스의 레벨에서 그리고 이에 따라 수신에서 더 적은 자원들을 요구한다. 이것은 특히 수신 디바이스가 제한된 처리 용량을 포함하는 이동 단말일 때 유용하다.The decoding device thus decodes only macroblocks that are likely to contain information of interest to the user. In this way decoding is faster and requires less resources at the level of the decoding device and thus in reception. This is particularly useful when the receiving device is a mobile terminal with limited processing capacity.

도 4는 본 발명의 바람직한 실시예를 구현하는 코딩 표준 H.264/AVC에 따른 디코딩 방법을 나타낸다.4 illustrates a decoding method according to coding standard H.264 / AVC, which implements a preferred embodiment of the present invention.

그러한 방법은 제한된 디스플레이 용량을 갖는 이동 단말에서 구현될 수 있다.Such a method can be implemented in a mobile terminal with limited display capacity.

스텝(S1) 동안, 요청된 디스플레이의 유형이 선택된다. 선택은 이동 단말 상에서 제공된 사용자 인터페이스에 의해 행해진다. 그것은 전체 영상 모드에서 기능하도록 결정되고 이러한 경우에 비디오 스트림이 송신기에 의해 송신될 때 비디오 스트림의 무결성이 디스플레이된다. 또는 그것은 영상의 관심 영역들만 디스플레이하도록 결정된다. 이 특이한 모드는 본 발명의 특이성(particularity)을 구성한다. 그것이 관심 영역들을 디스플레이하도록 결정될 때, 그것은 스텝(S2)으로 넘어가고, 그렇지 않으면 그것은 스텝(S8)으로 넘어간다. 다른 애플리케이션들의 경우에 상이한 유형들의 SEI 메시지들이 비디오 스트림 안으로 삽입될 수 있다는 것은 당연하고, 이러한 경우, 스텝(S8) 이전에 또는 스텝(S8) 동안에, SEI 메시지를 분석하는 단계가 있을 수 있다.During step S1, the type of display requested is selected. The selection is made by a user interface provided on the mobile terminal. It is determined to function in full picture mode and in this case the integrity of the video stream is displayed when the video stream is transmitted by the transmitter. Or it is determined to display only the regions of interest of the image. This unusual mode constitutes the specificity of the present invention. When it is determined to display the regions of interest, it proceeds to step S2, otherwise it proceeds to step S8. It is natural that different types of SEI messages may be inserted into the video stream in the case of other applications, in which case there may be a step of analyzing the SEI message before step S8 or during step S8.

스텝(S2) 동안, 사용자는 그가 관심 영역들에 대해 행하기를 원하는 용법을 선택한다. 특히, 그는 이하를 선택할 수 있다:During step S2, the user selects the usage he wants to do for the areas of interest. In particular, he can choose:

― 그가 디스플레이하기를 원하는 관심 영역들의 최대 수,The maximum number of areas of interest he wants to display,

― 예를 들면, 모자이크 형태로, 그가 다양한 관심 영역들을 스크린 상에서 디스플레이하기를 원하는 방식,-In the form of a mosaic, for example, the way he wants to display various areas of interest on the screen,

― 그가 원하는 관심 영역들에 대한 확대(zoom)의 정도,The degree of zoom in the areas of interest he desires,

― 키워드를 사용하는 것. 상기 관심 영역들의 "의미론적 정보" 필드는 키워드를 포함한다. 이러한 경우에, 각 영상에 대해서, 키워드를 포함하는 영상마다의 단 하나의 관심 영역 (및 이러한 경우, 돌출이 최대인 것들) 또는 키워드를 포함하는 몇몇 관심 영역들을 디스플레이하는 것이 요구되는지를 규정하는 것이 또한 가능하다.― Using keywords. The "semantic information" field of the regions of interest includes a keyword. In this case, for each image it is necessary to specify whether it is desired to display only one region of interest per image containing the keyword (and in this case the ones with the largest protrusions) or several regions of interest containing the keyword. It is also possible.

스텝(S3) 동안, 스트림에 존재하는 SEI 메시지들은 그들이 수신되고 있을 때 분석된다. 상기 SEI 메시지는 그들이 영상 코딩 전에 검출되었을 때 영상의 관심 영역들의 위치를 코딩하는 데에 이용된다. 따라서 각 영상에 대해, 영상의 시각 특성들에 따른 또는 영상 콘텐트에 따른 또는 양쪽 모두에 따른 하나 이상의 관심 영역이 존재할 수 있다. SEI 메시지는 전술한 표 1 및 2에 따라 코딩된다. SEI 메시지들에 관한 정보는 대응하는 영상의 디스플레이까지 일시적으로 레코딩된다.During step S3, SEI messages present in the stream are analyzed when they are being received. The SEI message is used to code the location of the regions of interest of the image when they are detected prior to image coding. Thus, for each image, there may be one or more regions of interest according to the visual characteristics of the image, according to the image content, or both. SEI messages are coded according to Tables 1 and 2 above. Information about SEI messages is temporarily recorded up to the display of the corresponding picture.

스텝(S4) 동안, 영상들은 모두 디코딩 표준에 준하여 디코딩된다.During step S4, the images are all decoded according to the decoding standard.

스텝(S5) 동안, 디코딩된 관심 영역들은 S2 스텝 동안 사용자가 선택한 것들에 따라 처리된다. 사용자가 영상의 주요 관심 영역의 줌(zoom)을 선택하면, 스텝(S6) 동안, 존은 디스플레이의 최대 크기에 도달하도록 확대된다. 사용자가 관심 영역들의 모자이크를 선택했다면 영상은 관심 영역들로 다시 만들어지고, 각각은 스크린 크기 및 디스플레이를 위해 선택된 관심 영역들의 수에 따라 확대된다. 사용자가 키워드를 지정했다면, 키워드를 포함하는 관심 영역들이 디스플레이되고 확대된다.During step S5, the decoded ROIs are processed according to the user's selections during the S2 step. If the user selects a zoom of the main region of interest of the image, during step S6, the zone is enlarged to reach the maximum size of the display. If the user has selected a mosaic of regions of interest, the image is recreated into regions of interest, each enlarged according to the screen size and the number of regions of interest selected for display. If the user has specified a keyword, the regions of interest containing the keyword are displayed and enlarged.

스텝(S7) 동안, 관심 영역들은, 사용자의 욕구에 따라, 이동 단말의 스크린에 디스플레이된다.During step S7, the regions of interest are displayed on the screen of the mobile terminal, in accordance with the desire of the user.

스텝(S8) 동안, 관심 영역들만을 디스플레이하는 사용자에 의한 비-선택(non-selection) 후에, 전체 비디오 스트림이 디스플레이를 위해 디코딩된다.During step S8, after non-selection by the user displaying only regions of interest, the entire video stream is decoded for display.

도 5는 본 발명의 비디오 인덱싱 애플리케이션을 나타낸다.5 shows a video indexing application of the present invention.

도 5는 부분적으로 PVR(personal recorder) 유형 디바이스(500)를 나타낸다. PVR(500)은 그것의 입력에서 압축된 비디오 스트림을 수신한다. 설명된 실시예에 따르면, 이 비디오 데이터 스트림은 코딩 표준 H.264에 따르는 것이다. 압축된 비디오 스트림은 특히 표 1 및 2에서 전술한 바와 같은 SEI 메시지들을 포함한다.5 shows, in part, a personal recorder (PVR) type device 500. PVR 500 receives a compressed video stream at its input. According to the described embodiment, this video data stream is in accordance with coding standard H.264. The compressed video stream contains in particular SEI messages as described above in Tables 1 and 2.

이 비디오 데이터 스트림은 부분적으로 레코딩 지원 장치(503)에 보내진다. 레코딩 지원 장치는 하드 디스크, 홀로그래프 지원 장치(holographic support), 메모리 카드 또는 "블루 레이" 디스크로 이해될 수 있다. 이 레코딩 지원 장치는 다른 실시예들에서는 원격(remote)에 있을 수 있다.This video data stream is partly sent to the recording support device 503. The recording support device may be understood as a hard disk, holographic support, memory card or "Blu-ray" disk. This recording support device may be remote in other embodiments.

비디오 데이터 스트림은 또 다른 부분에서 실시간으로 디코딩되도록 디코더(501)에 송신되고, 이것은 예를 들면 텔리비전 세트에서 디스플레이된다. 알려진 디바이스들에서, 스트림은 사용자가 그것을 실시간으로 시청하기를 원할 때 디코더(501)에 송신된다. 그렇지 않으면, 그것은 디코딩되지 않으나, 레코딩이 요청될 때 단순히 레코딩된다.The video data stream is sent to the decoder 501 to be decoded in real time in another part, which is displayed for example in a television set. In known devices, the stream is sent to the decoder 501 when the user wants to watch it in real time. Otherwise it is not decoded but simply recorded when recording is requested.

이 양태에 따르면, 본 발명은, 실시간으로 시청하는 것이 요청되지 않을 때에도, 비디오 데이터 스트림의 일부를 디코딩하는 것을 제안한다. 비디오 스트림의 일부분에 대해서, 그것은 특히 관심 영역들 또는 특정한 관심 영역들로 이해된다.According to this aspect, the present invention proposes to decode a portion of the video data stream even when viewing in real time is not required. For a portion of a video stream it is understood in particular to areas of interest or specific areas of interest.

디코더(501)가 레코딩이 요청된 비디오 스트림을 수신할 때, 데이터는 레코딩 지원 장치(503)에 송신된다. 레코딩 지원 장치(503)는 데이터가 수신될 때 그것을 레코딩한다. 동시 방식으로, 디코더(501)는 비디오 데이터 스트림을 수신하고 SEI 메시지들을 점진적으로 디코딩한다. 디코딩된 관심 영역들은 그것들을 레코딩 지원 장치(503)에 송신하기 전에 그것들을 일시적으로 레코딩할 책임이 있는 비디오 인덱싱 모듈(502)에 송신된다.When the decoder 501 receives a video stream for which recording is requested, data is transmitted to the recording support apparatus 503. The recording support apparatus 503 records the data when it is received. In a simultaneous manner, decoder 501 receives the video data stream and progressively decodes SEI messages. The decoded regions of interest are transmitted to the video indexing module 502 responsible for temporarily recording them before sending them to the recording support device 503.

도 6은 디코더(501) 및 인덱싱 모듈(502)에 의해 구현되는 방법을 나타낸다.6 illustrates a method implemented by decoder 501 and indexing module 502.

스텝(T1) 동안, 비디오 데이터 스트림은 디코더(501)에 의해 수신된다. 스텝(T2) 동안, 디코더(501)는 비디오 데이터 스트림에 존재하는 SEI 메시지들을 디코딩한다. 디코딩된 SEI 메시지들은 표 1 및 2에서 전술한 바와 같은 SEI 메시지들이다. 디코더는 또한 다른 SEI 메시지들을 디코딩할 수 있으나, 그것은 본 발명의 목적이 아니다. 각 SEI 메시지는 표 1 및 2에서 전술한 바와 같이 영상마다 하나 이상의 관심 영역을 기술할 수 있다. 스텝(T3) 동안, 디코더(501)는 각 SEI 메시지를 분석하고 각 영상을 디코딩한다. 이 스텝 동안, SEI 메시지에 표시된 가중치는 각 영상에 대해 레코딩될 관심 영역을 선택하도록 이용된다. 바람직한 실시예에서, 최대의 돌출을 갖는, 즉 가장 높은 가중치를 갖는 관심 영역이 보존된다.During step T1, the video data stream is received by decoder 501. During step T2, decoder 501 decodes SEI messages present in the video data stream. The decoded SEI messages are the SEI messages as described above in Tables 1 and 2. The decoder may also decode other SEI messages, but that is not the purpose of the present invention. Each SEI message may describe one or more regions of interest per image, as described above in Tables 1 and 2. During step T3, the decoder 501 analyzes each SEI message and decodes each image. During this step, the weights indicated in the SEI message are used to select the region of interest to be recorded for each picture. In a preferred embodiment, the region of interest with the largest protrusion, ie with the highest weight, is preserved.

일단 관심 영역이 디코딩되면, 스텝(T4) 동안, 그것은 인덱싱 모듈(502)에 송신된다. 영상마다의 관심 영역의 레코딩, 및 영상들 모두에 대한 이것은, 그것이 큰 볼륨의 정보를 나타내고 또한 효율적인 비디오의 인덱싱을 가능하게 하지 않기 때문에 관심을 거의 끌지 않는다. 따라서 인덱싱 모듈은 비디오를 인덱싱하기 위해 이용되는 영상을 결정한다. 여기에서 설명되는 바람직한 실시예에 따르면, 대략 10개의 영상들만이 한시간 반의 비디오에 대해서 선택될 것이다. 다른 실시예들에서 영상들의 수가 더 커질 것이라는 것은 짐작될 수 있다. 이들 10개의 영상들은 규칙적인 간격으로 취해진다. 이들 선택된 영상들은 인덱싱 모듈(502)에 포함되고 도시되지 않은 RAM 유형 메모리에 일시적으로 레코딩된다. 그것들을 최상의 방식으로 디스플레이하기 위해, 영상들은 스텝(T5) 동안 확대되는데, 즉 그들은 그들이 모두 동일한 크기가 되도록 확대된다. 바람직한 실시예에 따르면, 이 크기는 영상의 크기일 수 있다. 그러기 위해, 그것들은 임시 메모리에서 판독되고 그것들의 확대 후에 다시 레코딩된다. 또 다른 실시예에 따르면, 영상들은 그것들을 임시 메모리에 레코딩하기 전에 확대된다.Once the region of interest is decoded, during step T4 it is sent to the indexing module 502. The recording of the region of interest per picture, and for all the pictures, is of little interest because it presents a large volume of information and also does not enable efficient indexing of the video. The indexing module thus determines the image used to index the video. According to the preferred embodiment described herein, only about 10 images will be selected for an hour and a half video. It can be guessed that in other embodiments the number of images will be larger. These ten images are taken at regular intervals. These selected pictures are included in the indexing module 502 and temporarily recorded in a RAM type memory not shown. In order to display them in the best way, the images are enlarged during step T5, ie they are enlarged so that they are all the same size. According to a preferred embodiment, this size may be the size of the image. To do so, they are read from the temporary memory and recorded again after their enlargement. According to another embodiment, the pictures are enlarged before recording them to temporary memory.

또 다른 실시예에 따르면, 이미지들은 디스플레이 상의 모자이크로서 제공된다. 따라서, 확대되지 않고, 이미지들은, 그것들 모두에 대해 동일한, 하나의 단일 크기로 감소된다.According to another embodiment, the images are provided as a mosaic on the display. Thus, without magnification, the images are reduced to one single size, the same for all of them.

전체 비디오가 수신되고 따라서 레코딩 지원 장치(503)에서 레코딩되면, 스텝(T6) 동안, 인덱싱 영상들도 임시 메모리로부터 레코딩 지원 장치(503)로 전송되어 파일로 레코딩된다.When the entire video is received and thus recorded in the recording support device 503, during step T6, the indexed pictures are also transferred from the temporary memory to the recording support device 503 and recorded in a file.

그 후 요망되는 용법에 따라, 관심 영역들은 인덱스화(indexation)를 위해 이용되거나 또는 사용자가 데이터베이스의 콘텐트를 참고하기를 원할 때 PVR 유형 디바이스 상에서 디스플레이하기 위해 이용될 수도 있다.Then, according to the desired usage, the regions of interest may be used for indexing or for displaying on a PVR type device when the user wants to consult the content of the database.

본 발명의 또 다른 양태에 따르면, SEI 메시지들의 코딩 동안에 관심 영역들의 위치 데이터를 암호화하는 것이 또한 가능하다. 따라서, 해독 키를 갖는 사용자들만이 관심 영역들에 액세스할 수 있고 따라서 관심 영역들의 위치 정보에 기인한 관심 영역들의 시각화 또는 비디오 스트림들의 인덱싱에 액세스할 수 있다. 도 2에 관하여, 이 암호화 단계는 (도시되지 않은) 스텝(E4')이지만 스텝(E4) 후에 삽입될 것이다.According to another aspect of the invention, it is also possible to encrypt the position data of the regions of interest during the coding of SEI messages. Thus, only users with the decryption key can access the regions of interest and thus have access to the visualization of the regions of interest or the indexing of the video streams due to the location information of the regions of interest. 2, this encryption step is step E4 '(not shown) but will be inserted after step E4.

해독 키의 획득은, 예를 들면, 프로그램 방송사로부터의 서비스에 대한 지불의 객체일 수 있다.The acquisition of the decryption key may be, for example, an object of payment for a service from a program broadcaster.

이것을 하기 위해, 관심 영역들에 관한 SEI 메시지들은 RTP(Real Time Protocol) 유형 패킷들에 캡슐화되고 상이한 비디오 포트 상에서 송신된다. 임시 CTS 유형 라벨들은 관심 영역들에 관한 SEI 메시지들을 대응하는 영상들과 링크할 수 있다. 유익하게도, 이 송신 모드는 SEI 메시지들을 포함하고 비디오를 포함하지 않는 RTP 패킷들만 암호화할 수 있다.To do this, SEI messages about areas of interest are encapsulated in Real Time Protocol (RTP) type packets and transmitted on different video ports. Temporary CTS type labels may link SEI messages about regions of interest with corresponding pictures. Advantageously, this transmission mode can only encrypt RTP packets containing SEI messages and no video.

해독은 단말 수신기의 레벨에서 수행된다.Decryption is performed at the level of the terminal receiver.

MPEG-2 TS 캡슐화의 경우에, 이용되는 암호화 표준은 DVB-CSA이고 관심 영역들에 관한 SEI 메시지들은 비디오의 것과는 다른 PID에 캡슐화된다. 관심 영역들에 관한 SEI 메시지들은 PES 패킷 헤더의 PTS(timestamp)를 통해 대응하는 영상들에 링크된다. 이 송신 모드는 비디오 PID가 아니라 관심 영역들에 관한 SEI 메시지들을 포함하고 PID들의 암호화만 허용한다.In the case of MPEG-2 TS encapsulation, the encryption standard used is DVB-CSA and the SEI messages for the regions of interest are encapsulated in a different PID than that of the video. SEI messages about regions of interest are linked to corresponding pictures via a timestamp (PTS) in the PES packet header. This transmission mode contains SEI messages for areas of interest, not video PIDs, and only allows encryption of PIDs.

또 다른 실시예에 따르면, 비디오 데이터 스트림은 영상의 상이한 부분들의 코딩을 독립적으로 따라서 그것들의 디코딩을 독립적으로 가능하게 하는 FMO(Flexible Macroblock Ordering)를 이용하여 코딩 표준 H.264/AVC에 따라 코딩된다. FMO 모드는 "슬라이스 그룹들"을 이용한다. "슬라이스 그룹들"은 표준에서 정의된다. 이 실시예에서, 관심 영역들은 영상의 나머지 부분과는 다른 그룹들에 코딩된다. PPS 유형 NAL은 "슬라이스 그룹들"의 맵(map)을 포함한다. 관심 영역들이 어느 "슬라이스 그룹들"에 코딩되는지를 나타내는 이하에서 설명되는 것들과 같은 SEI 메시지들이 삽입된다.According to another embodiment, the video data stream is coded according to coding standard H.264 / AVC using Flexible Macroblock Ordering (FMO), which independently enables coding of different parts of an image and thus enables their decoding independently. . FMO mode uses "slice groups". "Slice groups" are defined in the standard. In this embodiment, the regions of interest are coded in different groups than the rest of the image. PPS type NAL includes a map of "slice groups". SEI messages, such as those described below, indicating which regions of interest are coded are inserted.

이하의 표들은 이 실시예에 따라 이용된 SEI 메시지들의 포맷을 예시한다:The following tables illustrate the format of SEI messages used in accordance with this embodiment:

uuid_iso_iec_11578: 우리의 메시지 유형을 디코더에 나타내기 위한 128 비트의 단일 워드. uuid_iso_iec_11578 : 128-bit single word to indicate our message type to the decoder.

user_data_payload_byte: SEI 메시지의 일부분을 포함하는 8 비트. user_data_payload_byte : 8 bits containing part of SEI message.

일반적으로 이 경우에는:Typically in this case:

payloadSize = 17 (바이트)이고 따라서 UUID에 대해서는 16이고 소유 데이터에 대해서는 1이다.

payloadSize = 17 (bytes) and thus 16 for UUID and 1 for proprietary data.

user_data_payload_byte:

- Slice_group(i)_id: slice_group_id가 "1"이라면 slice_group은 관심 영역을 나타내고, 그것이 "0"이라면 slice_group은 영상의 나머지 부분을 나타낸다.Slice_group (i) _id: If slice_group_id is "1", slice_group represents the region of interest and if it is "0", slice_group represents the rest of the image.

관심 영역을 나타내는 각 slice_group에 대해서, 의미론적 정보, 상대적 가중치(relative weight) 및 그것에 관한 매크로블록이 규정될 수 있다.For each slice_group representing the region of interest, semantic information, relative weight, and macroblocks relating thereto may be defined.

따라서, 관심 영역들은 독립적으로 식별되고 코딩되기 때문에 수신 동안에 관심 영역들에 대응하는 매크로블록들만이 디코딩될 수 있다.Thus, since the regions of interest are independently identified and coded, only macroblocks corresponding to the regions of interest can be decoded during reception.

Claims

A method of indexing a coded video data stream, the video data stream comprising information regarding the location of regions of interest of each image, wherein the method comprises:
Receiving a coded video stream (T1),
Recording the coded video stream on a recording support,
Decoding position information of the regions of interest (T2),
Selecting a region of interest for each image (T3),
Decoding the video data (T3),
Selecting (T4) a predetermined number of regions of interest for the video data stream from the regions of interest selected per image,
Recording the selected regions of interest (T6)
Indexing method comprising a.

The method of claim 1 wherein during the recording step:
The selected regions of interest are recorded in temporary memory when they are selected and decoded,
When the selected regions of interest are recorded in the temporary memory, the selected regions of interest are transmitted to a permanent memory support 503.
Indexing method characterized in that.

The indexing method according to claim 1 or 2, wherein prior to recording the region of interest, the regions of interest are formatted to obtain a homogeneous size for all of the selected regions of interest.

The method of claim 1, comprising encrypting the location of the regions of interest with the aid of an encryption key.

5. The method of claim 4 including obtaining a decryption key in accordance with a payment by a user.

The indexing method according to any one of claims 1 to 5, wherein the video data stream is coded according to coding standard H.264 / AVC and the position information is included in a Supplemental Enhancement Information (SEI) type message. Way.

7. The method of claim 5 or 6, wherein the SEI messages are encapsulated in real-time protocol (RTP) packets, and the RTP packets are encrypted.

7. The indexing method according to claim 5 or 6, wherein said Supplemental Enhancement Information (SEI) type messages relating to region of interest location information are inserted into said coded data before or after each image they refer to.

The method according to any one of claims 1 to 8, wherein the location information,
The number of regions of interest of each image,
The coordinates of each region of interest for each image dimension,
The surface of each region of interest,
A weight relating to the importance of the region of interest with respect to other regions of interest of the image,
Information about the content of each region of interest;
And any combination of this information
Indexing method comprising the information selected from.

10. The indexing method according to any one of claims 1 to 9, wherein in selecting the region of interest for each image (T3), the region of interest is selected according to a weight related to the importance of the region of interest.

The video coding standard of claim 6, wherein the video coding standard uses flexible macro-bloc ordering (FMO), and the regions of interest are independently slice groups, independent of other image data. And wherein the location information of the regions of interest includes slice group numbers coded with the region of interest.

12. The method of claim 11, wherein a Supplemental Enhancement Information (SEI) message includes an identifier indicating for each slice group whether it is related to one region of interest.

13. The method of claim 12, comprising the further step of reading the SEI messages and the step of decoding the video data (T3) decodes only the slice groups containing the regions of interest.

A device for indexing a coded video data stream, the video data stream comprising information regarding the location of regions of interest of each image, wherein the device comprises:
Means for receiving the coded video stream,
Means for recording the coded video stream on a recording support device 503,
Means (501) for decoding the location information of the regions of interest,
Means for decoding the video data 501,
Means 502 for selecting a region of interest per image,
Means (502) for selecting a predetermined number of regions of interest for the video data stream from among the regions of interest selected per image,
Means (503) for recording the selected regions of interest
And a device for indexing the coded video data stream.