KR102142735B1

KR102142735B1 - Transmitting device, transmitting method, receiving device and receiving method

Info

Publication number: KR102142735B1
Application number: KR1020157004909A
Authority: KR
Inventors: 이쿠오 츠카고시
Original assignee: 소니 주식회사
Priority date: 2012-09-07
Filing date: 2013-09-02
Publication date: 2020-08-07
Also published as: US20190373277A1; JP2016213888A; EP2894861B1; CN104604242A; US20210243463A1; JP6038380B1; DK2894861T3; JP6258206B2; US11700388B2; JP6038381B1; US10951910B2; HUE047946T2; RU2015107001A; JPWO2014038522A1; US20150172690A1; BR112015004393A2; CN104604242B; EP2894861A4; JP6038379B1; JP2016213887A

Abstract

초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 송신되는 경우에, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서 자신의 표시 능력에 맞는 해상도의 화상 데이터의 취득을 용이하게 한다. 부호화 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너를 송신한다. 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보를 삽입한다. 예를 들어, 보조 정보는, 부호화 화상 데이터에 포함되는 움직임 벡터의 정밀도 제한을 나타내는 정보로 된다. 또한, 예를 들어, 보조 정보는, 시간 해상도를 소정의 비율로 다운스케일링할 때 선택할 픽처를 식별하는 정보로 된다.When the image data of the ultra high-definition service is transmitted without scalable encoding, it is easy to acquire image data of a resolution suitable for its display capability in a receiver that does not support this ultra-high-definition service. A container of a predetermined format having a video stream containing coded image data is transmitted. In the video stream, auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted. For example, the auxiliary information is information indicating a precision limit of a motion vector included in coded image data. Further, for example, the auxiliary information is information identifying a picture to be selected when downscaling the temporal resolution at a predetermined rate.

Description

Transmitting device, transmitting method, receiving device and receiving method {TRANSMITTING DEVICE, TRANSMITTING METHOD, RECEIVING DEVICE AND RECEIVING METHOD}

본 발명은, 송신 장치, 송신 방법, 수신 장치 및 수신 방법에 관한 것으로, 특히, 공간적 혹은 시간적인 초고해상도 화상의 화상 데이터를 송신하는 송신 장치 등에 관한 것이다.The present invention relates to a transmission device, a transmission method, a reception device and a reception method, and more particularly, to a transmission device or the like for transmitting image data of spatial or temporal ultra-high resolution images.

예를 들어, 유효 화소수가 1920×1080인 HD 화상 외에, 유효 화소수가 수평, 수직으로 각각 2배, 4배인 4K, 8K 등의 공간적인 초고해상도 화상의 서비스가 고려되어 있다(예를 들어, 특허문헌 1 참조). 또한, 예를 들어, 프레임 주파수가 30㎐인 화상 외에, 프레임 주파수가 60㎐, 120㎐ 등의 시간적인 초고해상도 화상의 서비스가 고려되어 있다. 또한, 이들 초고해상도 화상의 서비스를, 적절히 초고화질 서비스라 한다.For example, in addition to HD images having an effective pixel count of 1920x1080, services of spatial ultra-high-resolution images such as 4K and 8K with double and four times the effective pixel count are considered (for example, patents). Reference 1). In addition, for example, in addition to images having a frame frequency of 30 Hz, services of temporal ultra-high resolution images such as frame frequencies of 60 Hz and 120 Hz are considered. In addition, these ultra high resolution image services are appropriately referred to as ultra high definition services.

일본 특허공개 제2011-057069호 공보Japanese Patent Publication No. 2011-057069

전술한 초고화질 서비스의 화상 데이터가 스케일러블 부호화되어 있는 경우, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서도, 자신의 표시 능력에 맞는 해상도의 화상 데이터를 용이하게 취득할 수 있다. 그러나, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되어 있지 않은 경우, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서는, 자신의 표시 능력에 맞는 해상도의 화상 데이터를 취득하는 것이 곤란해진다.When the image data of the ultra-high-definition service described above is scalable-encoded, even in a receiver that does not support this ultra-high-definition service, it is possible to easily acquire image data of a resolution suitable for its display capability. However, when the image data of the ultra-high-definition service is not scalable-encoded, it is difficult for a receiver that does not support this ultra-high-definition service to acquire image data of a resolution suitable for its display capability.

본 발명의 목적은, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 송신되는 경우에, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서 자신의 표시 능력에 맞는 해상도의 화상 데이터의 취득을 용이하게 하는 데 있다.An object of the present invention is to facilitate acquisition of image data of a resolution suitable for its display capability in a receiver that does not support this ultra-high-definition service when the image data of the ultra-high-definition service is transmitted without scalable encoding. Having

본 발명의 개념은,The concept of the present invention,

부호화 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너를 송신하는 송신부와,A transmitter for transmitting a container of a predetermined format having a video stream containing encoded image data;

상기 비디오 스트림에, 상기 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보를 삽입하는 보조 정보 삽입부An auxiliary information inserting unit for inserting auxiliary information for downscaling spatial and/or temporal resolution of the image data into the video stream.

를 구비하는 송신 장치에 있다.It is in a transmission device having a.

본 발명에 있어서, 송신부에 의해, 부호화 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너가 송신된다. 부호화 화상 데이터는, 예를 들어, MPEG4-AVC(MVC), MPEG2video, 혹은 HEVC 등의 부호화가 실시된 것이다. 컨테이너는, 예를 들어, 디지털 방송 규격으로 채용되고 있는 트랜스포트 스트림(MPEG-2 TS)이어도 된다. 또한, 컨테이너는, 예를 들어, 인터넷의 배신(配信) 등에서 사용되는 MP4, 혹은 그 이외의 포맷의 컨테이너이어도 된다.In the present invention, a container of a predetermined format having a video stream containing encoded image data is transmitted by the transmitter. The encoded image data is encoded by MPEG4-AVC (MVC), MPEG2video, or HEVC, for example. The container may be, for example, a transport stream (MPEG-2 TS) employed in the digital broadcasting standard. Further, the container may be, for example, an MP4 used for distribution on the Internet or the like, or a container of other format.

보조 정보 삽입부에 의해, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입된다. 예를 들어, 보조 정보는, 부호화 화상 데이터에 포함되는 움직임 벡터의 정밀도 제한을 나타내는 정보로 되어도 된다. 또한, 예를 들어, 보조 정보는, 시간 해상도를 소정의 비율로 다운스케일링할 때 선택할 픽처를 식별하는 정보로 되어도 된다.By the auxiliary information insertion unit, auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted into the video stream. For example, the auxiliary information may be information indicating a precision limit of a motion vector included in coded image data. Further, for example, the auxiliary information may be information identifying a picture to be selected when downscaling the temporal resolution at a predetermined rate.

이와 같이 본 발명에 있어서는, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되는 것이다. 그로 인해, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 송신되는 경우, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서 자신의 표시 능력에 맞는 해상도의 화상 데이터의 취득을 용이하게 행할 수 있다.In this way, in the present invention, auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted into a video stream. Therefore, when the image data of the ultra high-definition service is transmitted without scalable encoding, it is possible to easily acquire image data of a resolution suitable for one's own display capability in a receiver that does not support this ultra-high-definition service.

또한, 본 발명에 있어서, 예를 들어, 컨테이너의 레이어에, 보조 정보가 비디오 스트림에 삽입되어 있음을 나타내는 식별 정보를 삽입하는 식별 정보 삽입부를 더 구비하도록 되어도 된다. 이 경우, 수신기는, 비디오 스트림을 디코드하지 않아도, 이 비디오 스트림에 보조 정보가 삽입되어 있음을 알 수 있어, 보조 정보의 추출을 적절하게 행할 수 있다.Further, in the present invention, for example, an identification information inserting unit for inserting identification information indicating that auxiliary information is inserted into a video stream may be further provided in a layer of a container. In this case, the receiver can know that the auxiliary information is inserted into the video stream without decoding the video stream, and can extract the auxiliary information appropriately.

예를 들어, 이 식별 정보에는, 공간적 및/또는 시간적인 해상도의 다운스케일링에 있어서 가능한 비율을 나타내는 다운스케일링 정보가 부가되도록 되어도 된다. 또한, 이 식별 정보에는, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 부가되도록 되어도 된다. 또한, 예를 들어, 컨테이너는 트랜스포트 스트림이며, 식별 정보 삽입부는, 식별 정보를, 트랜스포트 스트림에 포함되는 프로그램 맵 테이블의 비디오 엘리멘터리 루프의 관리하의 기술자(記述子)에 삽입하도록 되어도 된다.For example, downscaling information indicating a possible ratio in downscaling of spatial and/or temporal resolution may be added to this identification information. Further, spatial and/or temporal resolution information of image data included in the video stream may be added to this identification information. Further, for example, the container is a transport stream, and the identification information inserting unit may insert identification information into a descriptor under the management of a video elementary loop of a program map table included in the transport stream. .

또한, 본 발명에 있어서, 예를 들어, 컨테이너의 레이어에, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보를 삽입하는 해상도 정보 삽입부를 더 구비하도록 되어도 된다. 이 경우, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 송신되는 경우에 있어서, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서는, 이 해상도 정보에 기초하여, 다운스케일링 처리의 내용을 결정하는 것이 가능해진다.Further, in the present invention, for example, a resolution information inserting unit that inserts spatial and/or temporal resolution information of image data included in a video stream into a container layer may be further provided. In this case, in the case where the image data of the ultra high definition service is transmitted without scalable encoding, in a receiver that does not support this ultra high definition service, it is possible to determine the content of the downscaling process based on this resolution information. Becomes

예를 들어, 해상도 정보에는, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도에 대응하지 않는 저능력 디코더를 위한 서포트가 되어 있는지 여부를 식별하는 식별 정보가 부가되도록 되어도 된다. 또한, 예를 들어, 컨테이너는 트랜스포트 스트림이며, 해상도 정보 삽입부는, 해상도 정보를, 트랜스포트 스트림에 포함되는 이벤트 인포메이션 테이블의 관리하의 기술자에 삽입하도록 되어도 된다.For example, identification information for identifying whether the video stream is supported for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of image data may be added to the resolution information. Further, for example, the container is a transport stream, and the resolution information inserting unit may insert the resolution information into a descriptor under the management of the event information table included in the transport stream.

또한, 본 발명의 다른 개념은,In addition, another concept of the present invention,

상기 컨테이너의 레이어에, 상기 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별할 수 있도록 식별 정보를 삽입하는 식별 정보 삽입부Identification information inserting unit for inserting identification information in the layer of the container so that the high-definition service by the video stream can be identified by at least a program unit.

본 발명에 있어서, 송신부에 의해, 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너가 송신된다. 컨테이너는, 예를 들어, 디지털 방송 규격으로 채용되어 있는 트랜스포트 스트림(MPEG-2 TS)이어도 된다. 또한, 컨테이너는, 예를 들어, 인터넷의 배신 등에서 사용되는 MP4, 혹은 그 이외의 포맷의 컨테이너이어도 된다.In the present invention, a container of a predetermined format having a video stream containing image data is transmitted by the transmitter. The container may be, for example, a transport stream (MPEG-2 TS) adopted as a digital broadcasting standard. Further, the container may be, for example, an MP4 used for distribution on the Internet or the like, or a container of a format other than that.

식별 정보 삽입부에 의해, 컨테이너의 레이어에, 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별할 수 있도록 식별 정보가 삽입된다. 예를 들어, 식별 정보에는, 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 포함되도록 되어도 된다. 예를 들어, 컨테이너는 트랜스포트 스트림이며, 식별 정보 삽입부는, 식별 정보를, 트랜스포트 스트림에 포함되는 이벤트 인포메이션 테이블의 관리하의 기술자에 삽입하도록 되어도 된다.By the identification information inserting unit, identification information is inserted into the layer of the container so that the ultra-high-definition service by the video stream can be identified at least in units of programs. For example, the identification information may include spatial and/or temporal resolution information of image data. For example, the container is a transport stream, and the identification information inserting unit may insert identification information into a descriptor under the management of the event information table included in the transport stream.

이와 같이 본 발명에 있어서는, 컨테이너의 레이어에, 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별할 수 있도록 식별 정보가 삽입되는 것이다. 그로 인해, 수신기에 있어서는, 초고화질 서비스를 용이하게 식별할 수 있고, 자신의 표시 능력과 비교하여, 공간적 및/또는 시간적인 해상도의 다운스케일링 처리가 필요한지 여부, 또한 그 비율을 적절하면서도 즉시 결정할 수 있다.As described above, in the present invention, identification information is inserted into a layer of a container so that a high-definition service by a video stream can be identified at least in units of programs. Therefore, in the receiver, the ultra-high-definition service can be easily identified, and whether or not a downscaling process of spatial and/or temporal resolution is required and compared with its display ability, and the ratio can be appropriately and immediately determined. have.

또한, 본 발명에 있어서, 예를 들어, 식별 정보에는, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도에 대응하지 않는 저능력 디코더를 위한 서포트가 되어 있는지 여부를 나타내는 서포트 정보가 부가되도록 되어도 된다. 이 경우, 수신기에 있어서는, 비디오 스트림에 저능력 디코더를 위한 서포트, 예를 들어 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보의 삽입 등이 되어 있는지 여부를 용이하게 판단할 수 있다.Further, in the present invention, for example, the identification information is added to the video stream so that support information indicating whether or not support for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is provided. You may work. In this case, the receiver can easily determine whether the video stream has support for a low-capacity decoder, for example, insertion of auxiliary information for downscaling of spatial and/or temporal resolution.

부호화 화상 데이터를 포함하는 비디오 스트림을 수신하는 수신부와,A receiver for receiving a video stream including encoded image data;

상기 비디오 스트림에는, 상기 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 있으며,In the video stream, auxiliary information for downscaling of spatial and/or temporal resolution of the image data is inserted,

상기 부호화 화상 데이터에 대하여, 상기 보조 정보에 기초하여 공간적 및/또는 시간적인 해상도의 다운스케일링 처리를 실시하여 원하는 해상도의 표시 화상 데이터를 얻는 처리부A processing unit that performs downscaling of spatial and/or temporal resolution on the encoded image data based on the auxiliary information to obtain display image data of a desired resolution.

를 더 구비하는 수신 장치에 있다.It is in a receiving device further comprising.

본 발명에 있어서, 수신부에 의해, 부호화 화상 데이터를 포함하는 비디오 스트림이 수신된다. 이 비디오 스트림에는, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 있다. 그리고, 처리부에 의해, 부호화 화상 데이터에 대하여 보조 정보에 기초하여 공간적 및/또는 시간적인 해상도의 다운스케일링 처리가 실시되어 원하는 해상도의 표시 화상 데이터가 얻어진다.In the present invention, a video stream including coded image data is received by the receiver. In this video stream, auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted. Then, the processing unit performs downscaling processing of spatial and/or temporal resolution on the encoded image data based on the auxiliary information to obtain display image data of a desired resolution.

이와 같이 본 발명에 있어서는, 비디오 스트림에 삽입되어 있는 보조 정보에 기초하여, 부호화 화상 데이터에 대하여 공간적 및/또는 시간적인 해상도의 다운스케일링 처리가 실시되어 원하는 해상도의 표시 화상 데이터가 얻어지는 것이다. 그로 인해, 다운스케일링 처리의 부하를 경감할 수 있다.In this way, in the present invention, based on the auxiliary information inserted in the video stream, down-scaling processing of spatial and/or temporal resolution is performed on the encoded image data to obtain display image data of a desired resolution. Therefore, the load of the downscaling process can be reduced.

또한, 본 발명에 있어서, 예를 들어, 수신부는, 비디오 스트림을 포함하는 소정 포맷의 컨테이너를 수신하고, 이 컨테이너의 레이어에, 공간적 및/또는 시간적인 해상도의 다운스케일링에 있어서 가능한 비율을 나타내는 다운스케일링 정보가 삽입되어 있으며, 처리부는, 이 다운스케일링 정보에 기초하여, 표시 화상 데이터를 얻기 위한 다운스케일링 처리를 제어하도록 되어도 된다.In addition, in the present invention, for example, the receiving unit receives a container of a predetermined format including a video stream, and the layer of the container indicates a possible ratio in downscaling of spatial and/or temporal resolution. The scaling information is inserted, and the processing unit may control the downscaling process for obtaining display image data based on the downscaling information.

또한, 본 발명에 있어서, 예를 들어, 수신부는, 비디오 스트림을 포함하는 소정 포맷의 컨테이너를 수신하고, 이 컨테이너의 레이어에, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 삽입되어 있으며, 처리부는, 이 해상도 정보에 기초하여, 표시 화상 데이터를 얻기 위한 상기 다운스케일링 처리를 제어하도록 되어도 된다.In addition, in the present invention, for example, the receiving unit receives a container of a predetermined format including a video stream, and spatial and/or temporal resolution information of image data included in the video stream is stored in the layer of the container. It is inserted, and the processing unit may control the downscaling process for obtaining display image data based on this resolution information.

본 발명에 의하면, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 송신되는 경우에, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서 자신의 표시 능력에 맞는 해상도의 화상 데이터의 취득을 용이하게 행할 수 있다.ADVANTAGE OF THE INVENTION According to this invention, when image data of ultra high definition service is transmitted without scalable encoding, the receiver which does not support this ultra high definition service can easily acquire the image data of the resolution suitable for one's display ability. have.

도 1은, 실시 형태로서의 화상 송수신 시스템의 구성예를 나타내는 블록도이다.
도 2는, 공간적인 해상도의 다운스케일링 처리를 설명하기 위한 도면이다.
도 3은, 수신기의 디코더의 구성예를 나타내는 블록도이다.
도 4는, 공간적인 해상도의 다운스케일링 처리를 설명하기 위한 도면이다.
도 5는, 움직임 벡터 MV의 정밀도에 제한을 두지 않는 경우, 예를 들어, 움직임 벡터 MV1의 정밀도가 1/4 픽셀(quarter pixel) 정밀도인 경우에 대하여 설명하기 위한 도면이다.
도 6은, 움직임 벡터 MV의 정밀도에 제한을 두는 경우, 예를 들어, 움직임 벡터 MV2의 정밀도가 1/2 픽셀(half pixel) 정밀도인 경우에 대하여 설명하기 위한 도면이다.
도 7은, 시간적인 해상도의 다운스케일링 처리를 설명하기 위한 도면이다.
도 8은, 트랜스포트 스트림 TS를 생성하는 송신 데이터 생성부의 구성예를 나타내는 블록도이다.
도 9는, 보조 정보로서 SEI 메시지가 삽입되는 GOP의 선두의 액세스 유닛 및 선두 이외의 액세스 유닛을 나타내는 도면이다.
도 10은, 보조 정보로서의 움직임 벡터 MV의 정밀도 제한을 나타내는 정보를 포함하는 SEI 메시지(downscaling_spatial SEI message)의 구조예(Syntax)를 나타내는 도면이다.
도 11은, SEI 메시지(downscaling_spatial SEI message)의 구조예에 있어서의 주요한 정보의 내용을 나타내는 도면이다.
도 12는, 보조 정보로서의 시간 해상도를 소정의 비율로 다운스케일링할 때 선택할 픽처를 나타내는 정보를 포함하는 SEI 메시지(picture_temporal_pickup SEI message)의 구조예(Syntax)를 나타내는 도면이다.
도 13은, SEI 메시지(picture_temporal_pickup SEI message)의 구조예에 있어서의 주요한 정보의 내용을 나타내는 도면이다.
도 14는, 다운스케일링 디스크립터(downscaling_descriptor)의 구조예(Syntax)를 나타내는 도면이다.
도 15는, 다운스케일링 디스크립터(downscaling_descriptor)의 변형 구조예(Syntax)를 나타내는 도면이다.
도 16은, 다운스케일링 디스크립터(downscaling_descriptor)의 구조예에 있어서의 주요한 정보의 내용을 나타내는 도면이다.
도 17은, 수퍼 하이 레졸루션 디스크립터(Super High resolution descriptor)의 구조예(Syntax)를 나타내는 도면이다.
도 18은, 수퍼 하이 레졸루션 디스크립터(Super High resolution descriptor)의 구조예에 있어서의 주요한 정보의 내용을 나타내는 도면이다.
도 19는, 트랜스포트 스트림 TS의 구성예를 나타내는 도면이다.
도 20은, 수신기의 구성예를 나타내는 블록도이다.1 is a block diagram showing a configuration example of an image transmission/reception system as an embodiment.
2 is a diagram for explaining a downscaling process of spatial resolution.
3 is a block diagram showing a configuration example of a decoder of the receiver.
4 is a diagram for describing a downscaling process of spatial resolution.
5 is a diagram for explaining a case where the precision of the motion vector MV is not limited, for example, when the precision of the motion vector MV1 is 1/4 pixel precision.
FIG. 6 is a diagram for explaining a case where the precision of the motion vector MV is limited, for example, when the precision of the motion vector MV2 is half pixel precision.
7 is a diagram for explaining a downscaling process with temporal resolution.
8 is a block diagram showing an example of a configuration of a transmission data generation unit that generates a transport stream TS.
Fig. 9 is a diagram showing the access unit at the head of the GOP to which the SEI message is inserted as auxiliary information, and access units other than the head.
FIG. 10 is a diagram showing a structural example (Syntax) of an SEI message (downscaling_spatial SEI message) including information indicating precision limitation of a motion vector MV as auxiliary information.
11 is a diagram showing the contents of main information in an example of the structure of an SEI message (downscaling_spatial SEI message).
FIG. 12 is a diagram showing a structural example (Syntax) of an SEI message (picture_temporal_pickup SEI message) including information indicating a picture to be selected when downscaling temporal resolution as auxiliary information at a predetermined rate.
13 is a diagram showing the contents of main information in an example of the structure of an SEI message (picture_temporal_pickup SEI message).
14 is a diagram showing a structural example (Syntax) of the downscaling descriptor (downscaling_descriptor).
15 is a diagram showing an example of a modified structure (Syntax) of the downscaling descriptor (downscaling_descriptor).
Fig. 16 is a diagram showing the contents of main information in the structural example of the downscaling descriptor (downscaling_descriptor).
17 is a diagram showing a structural example (Syntax) of a Super High Resolution Descriptor.
18 is a diagram showing the contents of main information in an example of the structure of a Super High Resolution Descriptor.
19 is a diagram showing a configuration example of a transport stream TS.
20 is a block diagram showing a configuration example of a receiver.

이하, 발명을 실시하기 위한 구체적인 내용(이하, 「실시 형태」라 함)에 대하여 설명한다. 또한, 설명은 이하의 순서로 행한다.Hereinafter, specific content (hereinafter referred to as "the embodiment") for carrying out the invention will be described. In addition, description is given in the following order.

1. 실시 형태1. Embodiment

2. 변형예2. Modification

<1. 실시 형태><1. Embodiment>

[화상 송수신 시스템][Image sending and receiving system]

도 1은, 실시 형태로서의 화상 송수신 시스템(10)의 구성예를 나타내고 있다. 이 화상 송수신 시스템(10)은, 방송국(100) 및 수신기(200)에 의해 구성되어 있다. 방송국(100)은, 컨테이너로서의 트랜스포트 스트림 TS를 방송파에 실어 송신한다.1 shows a configuration example of the image transmission/reception system 10 as an embodiment. This image transmission/reception system 10 is composed of a broadcasting station 100 and a receiver 200. The broadcasting station 100 carries the transport stream TS as a container on a broadcast wave and transmits it.

트랜스포트 스트림 TS는, 부호화 화상 데이터를 포함하는 비디오 스트림을 갖고 있다. 송신 화상 데이터에는, 다양한 화상 서비스에 대응한 것이 포함된다. 화상 서비스로서는, 예를 들어, 유효 화소수가 1920×1080인 HD 화상 서비스 외에, 유효 화소수가 수평, 수직으로 각각 2배, 4배인 4K, 8K 등의 공간적인 초고해상도 화상의 서비스(초고화질 서비스)가 고려된다. 또한, 화상 서비스로서는, 예를 들어, 프레임 주파수가 30㎐인 화상 서비스 외에, 프레임 주파수가 60㎐, 120㎐ 등의 시간적인 초고해상도 화상의 서비스(초고화질 서비스)가 고려된다.The transport stream TS has a video stream containing encoded image data. The transmission image data includes those corresponding to various image services. As an image service, for example, in addition to an HD image service with an effective pixel count of 1920x1080, spatial ultra-high resolution image services (4K, 8K, etc., which are 2x, 4x, respectively, horizontally and vertically, respectively) (ultra-high definition service) Is considered. In addition, as an image service, for example, in addition to an image service having a frame frequency of 30 Hz, temporal ultra-high resolution image services (ultra-high definition service), such as frame frequencies of 60 Hz and 120 Hz, are considered.

초고화질 서비스의 화상 데이터에 관해서는, 스케일러블 부호화하여 송신되는 경우와, 스케일러블 부호화되지 않고 송신되는 경우가 있다. 스케일러블 부호화됨으로써, 후방 호환성(backward compatible)이 보증되고, 초고화질 서비스에 대응하지 않는 수신기에 있어서도, 자신의 표시 능력에 맞는 해상도의 화상 데이터를 용이하게 취득 가능해진다.With regard to the image data of the ultra high-definition service, there are cases where it is transmitted by scalable encoding, and there are cases where it is transmitted without scalable encoding. By being scalable-encoded, backward compatibility is ensured, and even in a receiver that does not support ultra-high-definition service, it is possible to easily acquire image data of a resolution suitable for its display capability.

초고화질 서비스의 화상 데이터를 송신하는 경우, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입된다. 이 보조 정보는, 예를 들어, 비디오 스트림의 픽처 헤더 또는 시퀀스 헤더의 유저 데이터 영역 등에 삽입된다.When transmitting image data of an ultra-high-definition service, auxiliary information for downscaling spatial and/or temporal resolution of the image data is inserted into the video stream. This auxiliary information is inserted into, for example, a user data area of a picture header or a sequence header of a video stream.

예를 들어, 공간적인 해상도의 다운스케일링을 위한 보조 정보로서는, 부호화 화상 데이터에 포함되는 움직임 벡터의 정밀도 제한을 나타내는 정보로 된다. 예를 들어, 통상의 움직임 벡터의 정밀도 제한이 1/4 픽셀 정밀도일 때, 수신기측에 있어서의 공간적인 해상도의 다운스케일링의 처리 부하를 경감하기 위해 움직임 벡터의 정밀도 제한이 1/2 픽셀 정밀도 혹은 1 픽셀 정밀도 등으로 된다.For example, as auxiliary information for downscaling of spatial resolution, it becomes information indicating the precision limitation of motion vectors included in coded image data. For example, when the normal motion vector precision limit is 1/4 pixel precision, the motion vector precision limit is 1/2 pixel precision or to reduce the processing load of spatial resolution downscaling on the receiver side. 1 pixel precision, etc.

또한, 시간적인 해상도의 다운스케일링을 위한 보조 정보로서는, 시간 해상도를 소정의 비율로 다운스케일링할 때 선택할 픽처를 식별하는 정보로 된다. 예를 들어, 이 정보에 의해, 1개 건너뛴 픽처(프레임)에 대응하여 1/2로 다운스케일링할 때 선택할 픽처임을 나타낸다. 또한, 예를 들어, 이 정보에 의해, 3개 건너뛴 픽처(프레임)에 대응하여 1/4로 다운스케일링할 때 선택할 픽처임을 나타낸다.In addition, as auxiliary information for downscaling of temporal resolution, it becomes information identifying a picture to be selected when downscaling the temporal resolution at a predetermined ratio. For example, this information indicates that the picture to be selected when downscaling to 1/2 corresponding to one skipped picture (frame). In addition, for example, this information indicates that it is a picture to be selected when downscaling to 1/4 corresponding to three skipped pictures (frames).

전술한 바와 같이 보조 정보가 삽입됨으로써, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 송신되는 경우에, 이 초고화질 서비스에 대응하지 않는 수신기에 있어서 자신의 표시 능력에 맞는 해상도의 화상 데이터의 취득을 용이하게 행할 수 있게 된다. 이 보조 정보의 상세에 대해서는, 후술한다.By inserting the auxiliary information as described above, when the image data of the ultra-high-definition service is transmitted without scalable encoding, the receiver that does not support this ultra-high-definition service acquires image data of a resolution suitable for its display capability. Can be easily performed. The details of this auxiliary information will be described later.

또한, 트랜스포트 스트림 TS의 레이어에, 보조 정보가 비디오 스트림에 삽입되어 있음을 나타내는 식별 정보가 삽입된다. 예를 들어, 이 식별 정보는, 트랜스포트 스트림 TS에 포함되는 프로그램 맵 테이블(PMT: Program Map Table)의 비디오 엘리멘터리 루프(Video ES loop)의 관리하에 삽입된다. 이 식별 정보에 의해, 수신측에서는, 비디오 스트림을 디코드하지 않아도, 이 비디오 스트림에 보조 정보가 삽입되어 있음을 알 수 있어, 보조 정보의 추출을 적절하게 행하는 것이 가능해진다.Further, identification information indicating that auxiliary information is inserted into the video stream is inserted into the layer of the transport stream TS. For example, this identification information is inserted under the management of a video elementary loop (Video ES loop) of a Program Map Table (PMT) included in the transport stream TS. With this identification information, it can be seen that the auxiliary information is inserted into the video stream without decoding the video stream on the receiving side, and it becomes possible to appropriately extract the auxiliary information.

이 다운스케일링 정보에는, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 부가되는 경우가 있다. 이 경우, 수신측에서는, 비디오 스트림을 디코드하지 않고, 화상 데이터의 공간적 및/또는 시간적인 해상도를 파악하는 것이 가능해진다. 이 다운스케일링 정보의 상세에 대해서는, 후술한다.Spatial and/or temporal resolution information of image data included in a video stream may be added to this downscaling information. In this case, the receiving side can grasp the spatial and/or temporal resolution of the image data without decoding the video stream. Details of this downscaling information will be described later.

또한, 트랜스포트 스트림 TS의 레이어에, 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별할 수 있도록 식별 정보가 삽입된다. 예를 들어, 본 실시 형태에 있어서, 트랜스포트 스트림 TS의 레이어에, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 삽입된다. 예를 들어, 이 해상도 정보는, 트랜스포트 스트림 TS에 포함되는 이벤트 인포메이션 테이블(EIT: Event Information Table)의 관리하에 삽입된다. 이 해상도 정보(식별 정보)에 의해, 비디오 스트림을 디코드하지 않고, 화상 데이터의 공간적 및/또는 시간적인 해상도를 파악하는 것이 가능해진다.In addition, identification information is inserted into a layer of the transport stream TS so that ultra-high-definition services by the video stream can be identified at least in units of programs. For example, in the present embodiment, spatial and/or temporal resolution information of image data included in the video stream is inserted into the layer of the transport stream TS. For example, this resolution information is inserted under the management of an event information table (EIT) included in the transport stream TS. With this resolution information (identification information), it is possible to grasp the spatial and/or temporal resolution of the image data without decoding the video stream.

이 해상도 정보에는, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도에 대응하지 않는 저능력 디코더를 위한 서포트가 되어 있는지 여부를 식별하는 식별 정보가 부가되어 있다. 이 경우, 수신측에서는, 비디오 스트림에 저능력 디코더를 위한 서포트, 예를 들어, 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보의 삽입 등이 되어 있는지 여부를 용이하게 판단할 수 있다. 이 해상도 정보의 상세에 대해서는, 후술한다.To this resolution information, identification information for identifying whether or not a video stream is supported for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of image data is added. In this case, the receiving side can easily determine whether the video stream is provided with support for a low-capacity decoder, for example, insertion of auxiliary information for downscaling of spatial and/or temporal resolution. Details of this resolution information will be described later.

수신기(200)는, 방송국(100)으로부터 방송파에 실어 보내져 오는 트랜스포트 스트림 TS를 수신한다. 이 트랜스포트 스트림 TS는, 부호화 화상 데이터를 포함하는 비디오 스트림을 갖고 있다. 수신기(200)는, 비디오 스트림의 디코드 처리를 행하여, 표시 화상 데이터를 취득한다.The receiver 200 receives the transport stream TS carried on the broadcast wave from the broadcasting station 100. This transport stream TS has a video stream containing encoded image data. The receiver 200 decodes the video stream to obtain display image data.

수신기(200)는, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 보내져 오는 경우로서, 자신이 그 초고화질 서비스에 대응하지 않는 경우에는, 부호화 화상 데이터에 대하여, 보조 정보에 기초하여 공간적 및/또는 시간적인 해상도의 다운스케일링 처리를 실시하고, 원하는 해상도의 표시 화상 데이터를 얻는다. 이 경우, 수신 화상 데이터의 해상도와 다운스케일링의 가능 비율에 의해, 다운스케일링 처리가 제어된다.The receiver 200 is a case where the image data of the ultra-high definition service is transmitted without being scalable encoded, and when it does not correspond to the ultra high-definition service, spatial and/or encoded image data is based on auxiliary information. Alternatively, temporal resolution downscaling is performed, and display image data of a desired resolution is obtained. In this case, the downscaling process is controlled by the resolution of the received image data and the possible ratio of downscaling.

예를 들어, 수신 화상 데이터의 해상도와 다운스케일링의 가능 비율에 따라서는, 원하는 해상도의 표시 화상 데이터를 얻을 수 없는 경우도 상정되지만, 그 경우에는, 다운스케일링 처리는 행해지지 않는다. 또한, 다운스케일링의 가능 비율이 복수 있는 경우, 수신 화상 데이터의 해상도에 따라서, 다운스케일링의 비율이 선택되어 원하는 해상도의 표시 화상 데이터를 얻는 일이 행해진다.For example, depending on the resolution of the received image data and the possible ratio of downscaling, it may be assumed that the display image data of the desired resolution cannot be obtained, but in that case, the downscaling process is not performed. In addition, when there are a plurality of possible ratios of downscaling, depending on the resolution of the received image data, the ratio of downscaling is selected to obtain display image data of a desired resolution.

[해상도의 다운스케일링 처리][Resolution downscaling]

수신기(200)에서 행해지는 다운스케일링 처리에 대하여 설명한다. 처음에, 공간적인 해상도의 다운스케일링 처리를 설명한다. 예를 들어, 수신 화상 데이터가, 도 2의 (a)에 도시한 바와 같은, 8K의 화상 데이터인 경우를 고려한다. 예를 들어, 표시 능력이 4K인 수신기(200)에 있어서는, 공간적인 해상도를 수평, 수직 모두 1/2로 하는 다운스케일링 처리가 실시되고, 도 2의 (b)에 도시한 바와 같은, 4K의 화상 데이터를 얻는 일이 행해진다. 또한, 예를 들어, 표시 능력이 HD인 수신기(200)에 있어서는, 공간적인 해상도를 수평, 수직 모두 1/4로 하는 다운스케일링 처리가 실시되고, 도 2의 (c)에 도시한 바와 같은, HD의 화상 데이터를 얻는 일이 행해진다.The downscaling process performed by the receiver 200 will be described. First, a downscaling process of spatial resolution is described. For example, consider a case where the received image data is 8K image data as shown in Fig. 2A. For example, in the receiver 200 having a display capability of 4K, a downscaling process in which the spatial resolution is halved both horizontally and vertically is performed, and as shown in Fig. 2B, 4K Acquisition of image data is performed. Further, for example, in the receiver 200 having a display capability of HD, a downscaling process in which the spatial resolution is set to 1/4 both horizontally and vertically is performed, as shown in Fig. 2C. Acquiring HD image data is performed.

도 3은, 수신기(200)의 디코더의 구성예를 나타내고 있다. 수신된 부호화 화상 데이터 Ve는 엔트로피 복호화부(353a)에서 엔트로피 복호화 처리가 행해지고, 역양자화부(353b)에서 역양자화 처리가 행해진다. 또한, 역양자화 처리 후의 데이터는, 공간 주파수 역변환부(353c)에서 공간 주파수의 역변환 처리가 실시되어 데이터 D(n)이 얻어진다.3 shows a configuration example of a decoder of the receiver 200. The received coded image data Ve is subjected to entropy decoding processing by the entropy decoding unit 353a, and inverse quantization processing is performed by the inverse quantization unit 353b. In addition, the data after the inverse quantization process is subjected to an inverse transform process of the spatial frequency in the spatial frequency inverse transform unit 353c to obtain data D(n).

이 경우, 공간 주파수의 역변환 처리는, N*N의 부호화 블록마다, 다운스케일링의 비율에 따른 영역의 주파수 성분에만 적용되고(도 4의 (a)의 해칭 영역 참조), 데이터 D(n)으로서, 다운스케일링된 화상 데이터가 얻어진다. 또한, 이 도 4의 예는, 다운스케일링의 비율이 1/2인 경우를 나타내고 있다.In this case, the inverse transform processing of the spatial frequency is applied only to the frequency components of the region according to the downscaling ratio for each N*N coded block (see the hatched region in Fig. 4A), and as data D(n) , Downscaled image data is obtained. In addition, this example of FIG. 4 shows the case where the ratio of downscaling is 1/2.

프레임 버퍼(353d)에 기록되어 있는 1 프레임 전의 화상 데이터(도 4의 (b) 참조)로부터, 부호화 블록마다 움직임 벡터 MV에 따른 영역의 화소 데이터가 판독되고, 보간 필터(353e)에 공급되어 보간 연산되고, 보간 후의 예측 블록이 생성된다(도 4의 (c) 참조). 그리고, 가산기(353f)에 있어서, 데이터 D(n)에, 보간 필터(353e)에서 생성된 보간 후의 예측 블록이 가산되고(도 4의 (d) 참조), 다운스케일링된 현재 프레임의 화상 데이터 Vd(n)이 얻어진다.From the image data before one frame recorded in the frame buffer 353d (see Fig. 4(b)), pixel data of the region according to the motion vector MV is read for each coded block, supplied to the interpolation filter 353e, and interpolated. It is calculated, and a prediction block after interpolation is generated (see Fig. 4(c)). Then, in the adder 353f, the prediction block after interpolation generated by the interpolation filter 353e is added to the data D(n) (see Fig. 4(d)), and the downscaled image data Vd of the current frame (n) is obtained.

여기서, 부호화 화상 데이터 Ve에 부가되어 있는 움직임 벡터 MV의 화소 정밀도를 P로 한다. 공간 주파수 역변환부(353c)에서, 예를 들어 1/2로 축소 디코드하면, 화소 정밀도는 오리지널 정밀도 P에 비하여, 1/2이 되어 정밀도는 거칠어진다. 오리지널 움직임 벡터 MV의 화소 정밀도 P로 움직임 보상을 시키기 위해서는, 프레임 버퍼(353d)의 화상 데이터를, P의 정밀도에 적합하게 보간할 필요가 있다.Here, the pixel precision of the motion vector MV added to the coded image data Ve is P. When the spatial frequency inverse transform unit 353c reduces the decoding to 1/2, for example, the pixel precision is 1/2 compared to the original precision P, and the precision becomes rough. In order to perform motion compensation with the pixel precision P of the original motion vector MV, it is necessary to interpolate the image data of the frame buffer 353d to suit the precision of P.

예를 들어, 오리지널 움직임 벡터 MV가 1/4 픽셀 정밀도로 인코드되는 경우, 축소 디코드되어 프레임 버퍼(353d)에 기억된 화상 데이터를 움직임 보상할 때의 정밀도는, 그 화상 데이터의 화소 정밀도가 1/2로 축소되어 있으므로, 오리지널 움직임 벡터 MV의 정밀도로 움직임 보상을 시키기 위해서는, 프레임 버퍼(353d)의 화상 데이터를, 1/(1/4*1/2)로 보간할 필요가 있다.For example, when the original motion vector MV is encoded with 1/4 pixel precision, the precision when motion compensation is performed for the image data stored in the frame buffer 353d by reduction decoding, the pixel precision of the image data is 1 Since it is reduced to /2, in order to compensate motion with the precision of the original motion vector MV, it is necessary to interpolate the image data of the frame buffer 353d by 1/(1/4*1/2).

그로 인해, 움직임 벡터 MV의 정밀도에 제한을 두지 않는 경우에는, 보간 필터 연산의 대상이 되는 예측 화소 범위가 크고, 보간 필터의 탭 수가 많아져서 연산 부하가 커진다. 이에 반하여, 움직임 벡터 MV의 정밀도에 제한을 두는 경우에는, 보간 필터 연산의 대상이 되는 예측 화소 범위가 작고, 보간 필터의 탭 수가 적어져서 연산 부하가 작아진다.Therefore, when the precision of the motion vector MV is not limited, the predicted pixel range that is the target of the interpolation filter calculation is large, and the number of taps of the interpolation filter increases, which increases the computational load. On the other hand, when the precision of the motion vector MV is limited, the predicted pixel range that is the target of the interpolation filter operation is small, and the number of taps of the interpolation filter decreases, resulting in a small computational load.

도 5는, 움직임 벡터 MV의 정밀도에 제한을 두지 않는 경우, 예를 들어, 움직임 벡터 MV1의 정밀도가 1/4 픽셀(quarter pixel) 정밀도인 경우를 나타내고 있다. 이 경우, 서로 인접하는 예측 화소끼리로부터 보간 화소를 구하는 데 있어서, MV1의 정밀도를 커버할 정도의 페이즈 수에 대응하는 필터 연산이 필요해진다. 저역 통과 필터에 의한 보간 연산을 행할 때, 일정 이상의 통과 영역을 확보하고, 차단 주파수 부근을 급준하게 하기 위해서는, 보간 필터의 필터 탭 수는 많아지고, 그에 수반되어 대상이 되는 예측 화소수가 많아진다. Fig. 5 shows a case where the precision of the motion vector MV is not limited, for example, the precision of the motion vector MV1 is quarter pixel precision. In this case, in calculating interpolation pixels from prediction pixels adjacent to each other, a filter operation corresponding to the number of phases sufficient to cover the precision of MV1 is required. When performing interpolation calculation with a low-pass filter, in order to ensure a pass region of a certain level or more and to make the vicinity of the cutoff frequency steep, the number of filter taps of the interpolation filter increases, and consequently, the number of predicted pixels to be targeted increases.

도 6은, 움직임 벡터 MV의 정밀도에 제한을 두는 경우, 예를 들어, 움직임 벡터 MV2의 정밀도가 1/2 픽셀(half pixel) 정밀도인 경우를 나타내고 있다. 이 경우, 서로 인접하는 예측 화소끼리로부터 보간 화소를 구하는 데 있어서, MV2의 정밀도를 커버할 정도의 페이즈 수에 대응하는 필터 연산이 필요해진다. MV2의 정밀도는 MV1의 정밀도보다도 거칠기 때문에, 페이즈 수는 적어진다. 이 경우, 전술한 제한을 두지 않는 경우에 비하여, 동등한 통과를 확보하는 데 있어서, 보간 필터의 탭 수는 적어도 되며, 대상이 되는 예측 화소수도 적어도 된다.6 shows a case where the precision of the motion vector MV is limited, for example, when the precision of the motion vector MV2 is half pixel precision. In this case, in obtaining interpolation pixels from prediction pixels adjacent to each other, a filter operation corresponding to the number of phases sufficient to cover the precision of MV2 is required. Since the precision of MV2 is rougher than that of MV1, the number of phases is reduced. In this case, the number of taps of the interpolation filter is at least, and the number of predicted pixels to be targeted is also at least as compared with the case where the above-described limitation is not applied.

이러한 점에서, 본 실시 형태에서는, 송신측에 있어서, 움직임 벡터 MV는, 적절히, 전술한 움직임 벡터 MV2와 같이 정밀도 제한을 두고 인코드하는 일이 행해진다. 그 경우, 본 실시 형태에 있어서는, 움직임 벡터 MV의 정밀도 제한의 정보가, 비디오 스트림에 보조 정보로서 삽입된다. 수신기(200)는, 공간적인 해상도의 다운스케일링 처리를 행할 때, 이 보조 정보로부터 움직임 벡터 MV의 정밀도 제한을 인식하여 그 정밀도 제한에 맞는 보간 처리를 행할 수 있어, 처리 부하의 경감을 도모할 수 있다.In view of this, in the present embodiment, on the transmission side, the motion vector MV is properly encoded with precision limitations as in the motion vector MV2 described above. In that case, in this embodiment, the information of the precision limitation of the motion vector MV is inserted as auxiliary information in the video stream. When downscaling the spatial resolution, the receiver 200 recognizes the precision limitation of the motion vector MV from this auxiliary information and can perform interpolation processing that fits the precision limitation, thereby reducing the processing load. have.

다음으로, 시간적인 해상도의 다운스케일링 처리를 설명한다. 예를 들어, 수신 화상 데이터가, 도 7의 (a)에 도시한 바와 같은, 120fps의 화상 데이터인 경우를 고려한다. 비디오 스트림에 보조 정보로서, 하프 픽처 레이트 플래그(Half picture rate flag)와, 쿼터 픽처 레이트 플래그(Quarter picture rate flag)가 삽입되어 있다.Next, a downscaling process with temporal resolution is described. For example, consider a case where the received image data is 120 fps image data as shown in Fig. 7A. As auxiliary information, a half picture rate flag and a quarter picture rate flag are inserted as auxiliary information in the video stream.

하프 픽처 레이트 플래그는, 1 픽처(프레임) 건너뛰어 "1"로 된다. 즉, 이 하프 픽처 레이트 플래그에 의해, 시간 해상도를 1/2로 다운스케일링할 때 선택할 픽처를 식별할 수 있다. 또한, 쿼터 픽처 레이트 플래그는, 2 픽처(프레임) 건너뛰어 "1"로 된다. 즉, 쿼터 픽처 레이트 플래그에 의해, 시간 해상도를 1/4로 다운스케일링할 때 선택할 픽처를 식별할 수 있다.The half picture rate flag is skipped by 1 picture (frame) and becomes "1". That is, the half picture rate flag can identify a picture to be selected when downscaling the time resolution to 1/2. The quarter picture rate flag is skipped 2 pictures (frames) and becomes "1". That is, it is possible to identify a picture to be selected when downscaling the temporal resolution to 1/4 by the quarter picture rate flag.

예를 들어, 표시 능력이 60fps인 수신기(200)에 있어서는, 하프 픽처 레이트 플래그에 기초하여, 도 7의 (b)에 도시한 바와 같이, 1개 건너뛴 픽처만이 취출되어 디코드되고, 60fps의 화상 데이터를 얻는 일이 행해진다. 또한, 예를 들어, 표시 능력이 30fps인 수신기(200)에 있어서는, 쿼터 픽처 레이트 플래그에 기초하여, 도 7의 (c)에 도시한 바와 같이, 3개 건너뛴 픽처만이 취출되어 디코드되고, 30fps의 화상 데이터를 얻는 일이 행해진다.For example, in the receiver 200 having a display capability of 60 fps, based on the half picture rate flag, as shown in (b) of FIG. 7, only one skipped picture is taken out and decoded, Acquisition of image data is performed. Further, for example, in the receiver 200 having a display capability of 30 fps, based on the quarter picture rate flag, as shown in Fig. 7C, only three skipped pictures are taken out and decoded, It is done to obtain 30 fps image data.

「송신 데이터 생성부의 구성예」"Structure Example of Transmission Data Generation Unit"

도 8은, 방송국(100)에 있어서, 전술한 트랜스포트 스트림 TS를 생성하는 송신 데이터 생성부(110)의 구성예를 나타내고 있다. 이 송신 데이터 생성부(110)는 화상 데이터 출력부(111)와, 비디오 인코더(112)와, 음성 데이터 출력부(115)와, 오디오 인코더(116)와, 멀티플렉서(117)를 갖고 있다.8 shows an example of the configuration of the transmission data generation unit 110 that generates the transport stream TS described above in the broadcasting station 100. The transmission data generation unit 110 includes an image data output unit 111, a video encoder 112, an audio data output unit 115, an audio encoder 116, and a multiplexer 117.

화상 데이터 출력부(111)는, 다양한 화상 서비스에 대응한 화상 데이터를 출력한다. 화상 서비스로서는, 유효 화소수가 1920×1080인 HD 화상 서비스, 유효 화소수가 수평, 수직으로 각각 2배, 4배인 4K, 8K 등의 공간적인 초고해상도 화상의 서비스(초고화질 서비스) 등이 있다. 또한, 화상 서비스로서는, 예를 들어, 프레임 주파수가 30㎐인 화상 서비스, 프레임 주파수가 60㎐, 120㎐ 등의 시간적인 초고해상도 화상의 서비스(초고화질 서비스) 등이 있다. 이 화상 데이터 출력부(111)는 예를 들어, 피사체를 촬상하여 화상 데이터를 출력하는 카메라, 혹은 기억 매체로부터 화상 데이터를 판독하여 출력하는 화상 데이터 판독부 등에 의해 구성된다.The image data output unit 111 outputs image data corresponding to various image services. Examples of the image service include an HD image service with an effective pixel count of 1920 x 1080, and a spatial ultra-high resolution image service (ultra-high definition service), such as 4K and 8K with double and quadruple effective pixel counts, respectively. In addition, examples of the image service include an image service having a frame frequency of 30 kHz, a temporal ultra-high resolution image service such as a frame frequency of 60 kHz, and 120 kHz (ultra-high definition service). The image data output unit 111 is configured by, for example, a camera that captures an object and outputs image data, or an image data reading unit that reads and outputs image data from a storage medium.

비디오 인코더(112)는, 화상 데이터 출력부(111)로부터 출력되는 화상 데이터에 대하여 예를 들어, MPEG4-AVC(MVC), MPEG2video, 혹은 HEVC 등의 부호화를 실시하여 부호화 화상 데이터를 얻는다. 또한, 이 비디오 인코더(112)는, 다음 단계에 구비하는 스트림 포매터(도시생략)에 의해, 이 부호화 화상 데이터를 포함하는 비디오 스트림(비디오 엘리멘터리 스트림)을 생성한다.The video encoder 112 encodes the image data output from the image data output unit 111, for example, MPEG4-AVC (MVC), MPEG2video, or HEVC, to obtain encoded image data. In addition, the video encoder 112 generates a video stream (video elementary stream) including this encoded image data by a stream formatter (not shown) provided in the next step.

이 경우, 예를 들어, 초고화질 서비스의 화상 데이터에 관해서는, 후방 호환성(backward compatible)을 보증하기 위해 스케일러블 부호화되는 경우도 있지만, 스케일러블 부호화되지 않는 경우도 있다. 스케일러블 부호화되지 않는 경우, 비디오 인코더(112)는, 이 초고화질 서비스에 대응하지 않는 수신기의 편의를 위해, 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보를, 비디오 스트림에 삽입한다.In this case, for example, with respect to the image data of the ultra high-definition service, in some cases, it is scalable coded to ensure backward compatibility, but in some cases it is not scalable coded. When not scalable, video encoder 112 inserts auxiliary information for downscaling of spatial and/or temporal resolution into a video stream for the convenience of a receiver that does not correspond to this ultra-high definition service.

음성 데이터 출력부(115)는, 화상 데이터에 대응한 음성 데이터를 출력한다. 이 음성 데이터 출력부(115)는 예를 들어, 마이크로폰, 혹은 기억 매체로부터 음성 데이터를 판독하여 출력하는 음성 데이터 판독부 등에 의해 구성된다. 오디오 인코더(116)는, 음성 데이터 출력부(115)로부터 출력되는 음성 데이터에 대하여 MPEG-2 Audio, AAC 등의 부호화를 실시하고, 오디오 스트림(오디오 엘리멘터리 스트림)을 생성한다.The audio data output unit 115 outputs audio data corresponding to image data. The audio data output unit 115 is configured by, for example, a microphone or an audio data reading unit that reads and outputs audio data from a storage medium. The audio encoder 116 encodes audio data output from the audio data output unit 115, such as MPEG-2 Audio and AAC, and generates an audio stream (audio elementary stream).

멀티플렉서(117)는, 비디오 인코더(112), 그래픽스 인코더(114) 및 오디오 인코더(116)에서 생성된 각 엘리멘터리 스트림을 패킷화하여 다중하고, 트랜스포트 스트림 TS를 생성한다. 이 경우, 각각의 PES(Packetized Elementary Stream)의 헤더에는, 수신측에 있어서의 동기 재생을 위해서, PTS(Presentation Time Stamp)가 삽입된다.The multiplexer 117 packetizes and multiplies each elementary stream generated by the video encoder 112, the graphics encoder 114, and the audio encoder 116 to generate a transport stream TS. In this case, a PTS (Presentation Time Stamp) is inserted into the header of each PES (Packetized Elementary Stream) for synchronous reproduction on the receiving side.

멀티플렉서(117)는, 초고화질 서비스의 화상 데이터를 스케일러블 부호화하지 않고 송신하는 경우, 트랜스포트 스트림 TS의 레이어에, 공간적 및/또는 시간적으로 가능한 비율의 해상도의 다운스케일링을 나타내는 다운스케일링 정보를 삽입한다. 예를 들어, 이 다운스케일링 정보는, 트랜스포트 스트림 TS에 포함되는 프로그램 맵 테이블(PMT: Program Map Table)의 비디오 엘리멘터리 루프(Video ES loop)의 관리하에 삽입된다.The multiplexer 117 inserts downscaling information indicating downscaling of a spatially and/or temporally possible resolution of resolution into a layer of the transport stream TS when transmitting image data of an ultra-high definition service without scalable encoding. do. For example, this downscaling information is inserted under the management of a Video Element Loop (PMT) of a Program Map Table (PMT) included in the transport stream TS.

또한, 멀티플렉서(117)는, 트랜스포트 스트림 TS의 레이어에, 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별할 수 있도록 식별 정보를 삽입한다. 예를 들어, 본 실시 형태에 있어서, 멀티플렉서(117)는 트랜스포트 스트림 TS의 레이어에, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보를 삽입한다. 예를 들어, 이 해상도 정보는, 트랜스포트 스트림 TS에 포함되는 이벤트 인포메이션 테이블(EIT: Event Information Table)의 관리하에 삽입된다.In addition, the multiplexer 117 inserts identification information into a layer of the transport stream TS so that ultra-high-definition service by the video stream can be identified at least in units of programs. For example, in the present embodiment, the multiplexer 117 inserts spatial and/or temporal resolution information of image data included in the video stream into the layer of the transport stream TS. For example, this resolution information is inserted under the management of an event information table (EIT) included in the transport stream TS.

도 8에 도시한 송신 데이터 생성부(110)의 동작을 간단히 설명한다. 화상 데이터 출력부(111)로부터 출력되는 다양한 화상 서비스에 대응한 화상 데이터는, 비디오 인코더(112)에 공급된다. 이 비디오 인코더(112)에서는, 그 화상 데이터에 대하여 예를 들어, MPEG4-AVC(MVC), MPEG2video, 혹은 HEVC 등의 부호화가 실시되고, 부호화 화상 데이터를 포함하는 비디오 스트림(비디오 엘리멘터리 스트림)이 생성된다. 이 비디오 스트림은, 멀티플렉서(117)에 공급된다.The operation of the transmission data generation unit 110 shown in FIG. 8 will be briefly described. Image data corresponding to various image services output from the image data output unit 111 is supplied to the video encoder 112. In the video encoder 112, for example, MPEG4-AVC (MVC), MPEG2video, or HEVC, etc. is encoded with respect to the image data, and a video stream containing the encoded image data (video elementary stream) This is created. This video stream is supplied to the multiplexer 117.

이 경우, 예를 들어, 초고화질 서비스의 화상 데이터에 관해서는, 후방 호환성(backward compatible)을 보증하기 위해 스케일러블 부호화되는 경우도 있지만, 스케일러블 부호화되지 않는 경우도 있다. 스케일러블 부호화되지 않는 경우, 비디오 인코더(112)에서는, 이 초고화질 서비스에 대응하지 않는 수신기의 편의를 위해, 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보를, 비디오 스트림에 삽입하는 일이 행해진다.In this case, for example, with respect to the image data of the ultra high-definition service, in some cases, it is scalable coded to ensure backward compatibility, but in some cases it is not scalable coded. When not scalable encoding, the video encoder 112 inserts auxiliary information for downscaling of spatial and/or temporal resolution into a video stream for the convenience of a receiver that does not support this ultra-high definition service. This is done.

음성 데이터 출력부(115)로부터 출력되는 화상 데이터에 대응한 음성 데이터는, 오디오 인코더(116)에 공급된다. 이 오디오 인코더(116)에서는, 그 음성 데이터에 대하여 MPEG-2 Audio, AAC 등의 부호화가 실시되고, 오디오 스트림(오디오 엘리멘터리 스트림)이 생성된다. 이 오디오 스트림은, 멀티플렉서(117)에 공급된다.Audio data corresponding to the image data output from the audio data output section 115 is supplied to the audio encoder 116. The audio encoder 116 encodes the audio data, such as MPEG-2 Audio and AAC, and generates an audio stream (audio elementary stream). This audio stream is supplied to the multiplexer 117.

멀티플렉서(117)에서는, 각 인코더로부터 공급되는 엘리멘터리 스트림이 패킷화되어 다중되고, 트랜스포트 스트림 TS가 생성된다. 이 경우, 각각의 PES 헤더에는, 수신측에 있어서의 동기 재생을 위해서, PTS가 삽입된다. 또한, 멀티플렉서(117)에서는, PMT의 비디오 엘리멘터리 루프(Video ES loop)의 관리하에, 공간적 및/또는 시간적으로 가능한 비율의 해상도 다운스케일링을 나타내는 다운스케일링 정보가 삽입된다. 또한, 멀티플렉서(117)에서는, EIT의 관리하에, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 삽입된다.In the multiplexer 117, the elementary stream supplied from each encoder is packetized and multiplexed, and a transport stream TS is generated. In this case, PTS is inserted into each PES header for synchronous reproduction on the receiving side. Further, in the multiplexer 117, downscaling information indicating resolution downscaling at a spatially and/or temporally possible ratio is inserted under the management of the video elementary loop (PM) of the PMT. Further, in the multiplexer 117, spatial and/or temporal resolution information of image data included in the video stream is inserted under the management of the EIT.

[보조 정보, 식별 정보, 해상도 정보의 구조와 TS 구성][Structure of auxiliary information, identification information, and resolution information and TS configuration]

전술한 바와 같이, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입된다. 예를 들어, 부호화 방식이 MPEG4-AVC인 경우, 또는, HEVC와 같은, NAL 패킷 등의 부호화 구조가 서로 비슷한 부호화 방식인 경우, 이 보조 정보는, 액세스 유닛(AU)의 "SEIs"의 부분에, SEI 메시지로서 삽입된다.As described above, in the video stream, auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted. For example, when the encoding method is MPEG4-AVC, or when the encoding structures such as NAL packets, such as HEVC, are encoding methods similar to each other, this auxiliary information is provided in the part of "SEIs" of the access unit AU. , Inserted as an SEI message.

이 경우, 보조 정보로서의 움직임 벡터 MV의 정밀도 제한을 나타내는 정보는, SEI 메시지(downscaling_spatial SEI message)로서 삽입된다. 또한, 보조 정보로서의 시간 해상도를 소정의 비율로 다운스케일링할 때 선택할 픽처를 나타내는 정보는, SEI 메시지(picture_temporal_pickup SEI message)로서 삽입된다. 도 9의 (a)는, GOP(Group Of Pictures)의 선두의 액세스 유닛을 나타내고 있으며, 도 9의 (b)는, GOP의 선두 이외의 액세스 유닛을 나타내고 있다. SEI 메시지는, 화소 데이터가 부호화되어 있는 슬라이스(slices)보다도 비트 스트림 상, 빠른 위치에 부호화되므로, 수신기는 SEI의 내용을 식별함으로써, 그 이하의 디코드 처리를 결정하는 것이 가능해진다.In this case, information indicating the precision limitation of the motion vector MV as auxiliary information is inserted as an SEI message (downscaling_spatial SEI message). In addition, information indicating a picture to be selected when downscaling the temporal resolution as auxiliary information at a predetermined rate is inserted as an SEI message (picture_temporal_pickup SEI message). Fig. 9(a) shows the access unit at the head of the GOP (Group Of Pictures), and Fig. 9(b) shows the access unit other than the head of the GOP. Since the SEI message is encoded at a position faster than the slices in which the pixel data is encoded, it is possible for the receiver to determine the subsequent decoding processing by identifying the contents of the SEI.

도 10의 (a)는, 「downscaling_spatial SEI message」의 구조예(Syntax)를 나타내고 있다. 「uuid_iso_iec_11578」은, "ISO/IEC 11578: 1996 AnnexA."로 나타내는 UUID값을 갖는다. 「user_data_payload_byte」의 필드에, 「userdata_for_downscaling_spatial()」이 삽입된다. 도 10의 (b)는 「userdata_for_downscaling_spatial()」의 구조예(Syntax)를 나타내고 있다. 이 중에, 「constrained_to_half_pixel_MV_flag」의 플래그와, 「constrained_to_integer_pixel_MV_flag」의 플래그가 포함되어 있다. 「userdata_id」는, 부호없음 16비트로 나타나는 식별자이다.Fig. 10(a) shows a structural example (Syntax) of the "downscaling_spatial SEI message". "Uuid_iso_iec_11578" has a UUID value represented by "ISO/IEC 11578: 1996 AnnexA.". "Userdata_for_downscaling_spatial()" is inserted into the field of "user_data_payload_byte". Fig. 10B shows a structural example (Syntax) of "userdata_for_downscaling_spatial()". Among them, a flag of "constrained_to_half_pixel_MV_flag" and a flag of "constrained_to_integer_pixel_MV_flag" are included. "Userdata_id" is an identifier represented by unsigned 16 bits.

「constrained_to_half_pixel_MV_flag」의 플래그는, 도 11에 도시한 바와 같이, "1"일 때, 움직임 벡터 MV의 정밀도가 1/2 픽셀 정밀도로 제한되어 있음을 나타낸다. 또한, 「constrained_to_integer_pixel_MV_flag」의 플래그는, 도 11에 도시한 바와 같이, "1"일 때, 움직임 벡터 MV의 정밀도가 정수 픽셀 정밀도로 제한되어 있음을 나타낸다.The flag of "constrained_to_half_pixel_MV_flag" indicates that the precision of the motion vector MV is limited to 1/2 pixel precision when "1", as shown in FIG. In addition, the flag of "constrained_to_integer_pixel_MV_flag" shows that when "1", the precision of the motion vector MV is limited to integer pixel precision, as shown in FIG.

도 12의 (a)는, 「picture_temporal_pickup SEI message」의 구조예(Syntax)를 나타내고 있다. 「uuid_iso_iec_11578」은, "ISO/IEC 11578: 1996 AnnexA."로 나타내는 UUID값을 갖는다. 「user_data_payload_byte」의 필드에, 「userdata_for_picture_temporal()」이 삽입된다. 도 12의 (b)는 「userdata_for_picture_temporal()」의 구조예(Syntax)를 나타내고 있다. 이 중에, 「half picture rate flag」의 플래그와, 「quarter picture rate flag」의 플래그가 포함되어 있다. 「userdata_id」는, 부호없음 16비트로 나타내는 식별자이다.Fig. 12A shows a structural example (Syntax) of the "picture_temporal_pickup SEI message". "Uuid_iso_iec_11578" has a UUID value represented by "ISO/IEC 11578: 1996 AnnexA.". "Userdata_for_picture_temporal()" is inserted into the field of "user_data_payload_byte". Fig. 12B shows a structural example (Syntax) of "userdata_for_picture_temporal()". Among them, a flag of the “half picture rate flag” and a flag of the “quarter picture rate flag” are included. "Userdata_id" is an identifier represented by unsigned 16 bits.

「half picture rate flag」의 플래그는, 도 13에 도시한 바와 같이, "1"일 때, 시간 해상도가 1/2인 표시 능력을 갖는 디코더로 취출하여 디코드할 픽처임을 나타낸다. 또한, 「quarter picture rate flag」의 플래그는, 도 13에 도시한 바와 같이, "1"일 때, 시간 해상도가 1/4인 표시 능력을 갖는 디코더로 취출하여 디코딩할 픽처임을 나타낸다.The flag of the "half picture rate flag" indicates that it is a picture to be taken out and decoded by a decoder having a display capability with 1/2 of the temporal resolution when "1", as shown in FIG. In addition, the flag of the "quarter picture rate flag", as shown in FIG. 13, indicates that when it is "1", it is a picture to be taken out and decoded by a decoder having a display capability with a temporal resolution of 1/4.

또한, 전술한 바와 같이, 예를 들어, 트랜스포트 스트림 TS의 프로그램 맵 테이블(PMT)의 비디오 엘리멘터리 루프(Video ES loop)의 관리하에, 비디오 스트림에, 전술한 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 있음을 나타내는 식별 정보가 삽입된다.In addition, as described above, for example, under the management of the video elementary loop (Video ES loop) of the program map table (PMT) of the transport stream TS, in the video stream, the spatial and/or spatial data of the aforementioned image data and/or Identification information indicating that auxiliary information for downscaling of temporal resolution is inserted is inserted.

도 14는, 이 식별 정보로서의 다운스케일링 디스크립터(downscaling_descriptor)의 구조예(Syntax)를 나타내고 있다. 또한, 도 15는, 이 다운스케일링 디스크립터(downscaling_descriptor)의 변형 구조예(Syntax)를 나타내고 있다. 도 16은, 그들 구조예에 있어서의 주요한 정보의 내용(Semantics)을 나타내고 있다.Fig. 14 shows a structural example (Syntax) of the downscaling descriptor (downscaling_descriptor) as this identification information. In addition, Fig. 15 shows an example of a modified structure (Syntax) of this downscaling descriptor (downscaling_descriptor). 16 shows the contents (Semantics) of main information in the structural examples.

「downscaling_descriptor_tag」의 8비트 필드는, 디스크립터 타입을 나타내며, 여기에서는, 다운스케일링 디스크립터임을 나타낸다. 「downscaling_descriptor_length」의 8비트 필드는, 디스크립터의 길이(사이즈)를 나타내며, 디스크립터의 길이로서 이후의 바이트 수를 나타낸다.The 8-bit field of "downscaling_descriptor_tag" indicates a descriptor type, and here indicates that it is a downscaling descriptor. The 8-bit field of "downscaling_descriptor_length" indicates the length (size) of the descriptor, and indicates the number of bytes that follow as the length of the descriptor.

「downscaling_type」의 2비트 필드는, 다운스케일링 타입을 나타낸다. 예를 들어, "01"은, 시간적인 해상도의 다운스케일링을 나타내고, "10"은 공간적인 해상도의 다운스케일링을 나타내며, "11"은 시간적 및 공간적인 해상도의 다운스케일링을 나타낸다.A 2-bit field of "downscaling_type" indicates a downscaling type. For example, "01" represents downscaling of temporal resolution, "10" represents downscaling of spatial resolution, and "11" represents downscaling of temporal and spatial resolution.

「downscaling_type」이 "01", "11"일 때, 「temporal_downscaling_factor」의 2비트 필드가 유효한 것으로 된다. 이 2비트 필드는, 시간적인 해상도의 다운스케일링에서 가능한 비율(다운스케일)을 나타낸다. 예를 들어, "00"은 다운스케일링이 불가능함을 나타낸다. 또한, "01"은 1/2 비율의 다운스케일링이 가능함을 나타낸다. "10"은 1/4 비율의 다운스케일링이 가능함을 나타내지만, 아울러 1/2 비율의 다운스케일링도 가능함을 나타낸다. 또한, 「temporal_downscaling_factor」가 "01", "10"인 것은, 비디오 스트림에 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 있음도 나타낸다.When "downscaling_type" is "01" and "11", the 2-bit field of "temporal_downscaling_factor" becomes valid. This 2-bit field indicates the ratio (downscale) possible in downscaling with temporal resolution. For example, "00" indicates that downscaling is not possible. Also, "01" indicates that downscaling of 1/2 ratio is possible. "10" indicates that downscaling of 1/4 ratio is possible, but also indicates that downscaling of 1/2 ratio is possible. Also, "temporal_downscaling_factor" of "01" and "10" also indicates that auxiliary information for downscaling of temporal resolution is inserted into the video stream.

또한, 「downscaling_type」이 "10", "11"일 때, 「spatial_downscaling_factor」의 2비트 필드가 유효한 것으로 된다. 이 2비트 필드는, 공간적인 해상도의 다운스케일링에서 가능한 비율(다운스케일)을 나타낸다. 예를 들어, "00"은 다운스케일링이 불가능함을 나타낸다. 또한, "01"은 수평, 수직으로 1/2 비율의 다운스케일링이 가능함을 나타낸다. "10"은 수평, 수직으로 1/4 비율의 다운스케일링이 가능함을 나타내지만, 아울러 1/2의 비율 다운스케일링도 가능함을 나타낸다. 또한, 「spatial_downscaling_factor」가 "01", "10"인 것은, 비디오 스트림에 공간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 있음도 나타낸다.In addition, when "downscaling_type" is "10" and "11", the 2-bit field of "spatial_downscaling_factor" becomes valid. This 2-bit field indicates the ratio (downscale) possible in downscaling of spatial resolution. For example, "00" indicates that downscaling is not possible. Further, "01" indicates that downscaling of 1/2 ratio is possible horizontally and vertically. "10" indicates that downscaling of 1/4 ratio is possible horizontally and vertically, but also that downscaling of 1/2 ratio is possible. In addition, "spatial_downscaling_factor" of "01" and "10" also indicates that auxiliary information for downscaling of spatial resolution is inserted into the video stream.

「spatial resolution class type」의 3비트 필드는, 송신 화상 데이터의 공간적인 해상도의 클래스 타입을 나타낸다. 예를 들어, "001"은, 1920×1080, 즉 HD 해상도임을 나타낸다. 또한, 예를 들어, "010"은, 3840×2160, 즉 4K 해상도임을 나타낸다. 또한, 예를 들어, "011"은, 7680×4320, 즉 8K 해상도임을 나타낸다.The 3-bit field of "spatial resolution class type" indicates the class type of spatial resolution of the transmitted image data. For example, “001” indicates 1920×1080, that is, HD resolution. Further, for example, "010" indicates that it is 3840 x 2160, that is, 4K resolution. In addition, for example, "011" indicates that it is 7680×4320, that is, 8K resolution.

「temporal resolution class type」의 3비트 필드는, 송신 화상 데이터의 시간적인 해상도의 클래스 타입을 나타낸다. 예를 들어, "001"은, 24㎐, 25㎐, 29.97㎐, 30㎐ 등을 나타내고, "010"은 50㎐, 59.94㎐, 60㎐ 등을 나타내고, "011"은 100㎐, 120㎐ 등을 나타내고, "100"은 200㎐, 240㎐ 등을 나타낸다.The 3-bit field of "temporal resolution class type" indicates the class type of temporal resolution of the transmitted image data. For example, "001" represents 24 km, 25 km, 29.97 km, 30 km, etc., "010" indicates 50 km, 59.94 km, 60 km, etc., "011" is 100 km, 120 km, etc. And "100" represents 200 kPa, 240 kPa, and the like.

또한, 전술한 바와 같이, 예를 들어, 트랜스포트 스트림 TS의 이벤트 인포메이션 테이블(EIT)의 관리하에, 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 삽입된다. 도 17은, 이 해상도 정보로서의 수퍼 하이 레졸루션 디스크립터(Super High resolution descriptor)의 구조예(Syntax)를 나타내고 있다. 또한, 도 18은, 그 구조예에 있어서의 주요한 정보의 내용(Semantics)을 나타내고 있다.Further, as described above, for example, spatial and/or temporal resolution information of image data included in the video stream is inserted under the management of the event information table (EIT) of the transport stream TS. Fig. 17 shows a structural example (Syntax) of the Super High resolution descriptor as this resolution information. 18 shows the content (Semantics) of the main information in the structural example.

「Temporal resolution class type」의 3비트 필드는, 송신 화상 데이터의 시간적인 해상도의 클래스 타입을 나타낸다. 예를 들어, "001"은, 24㎐, 25㎐, 29.97㎐, 30㎐ 등을 나타내고, "010"은 50㎐, 59.94㎐, 60㎐ 등을 나타내고, "011"은 100㎐, 120㎐ 등을 나타내며, "100"은 200㎐, 240㎐ 등을 나타낸다.The 3-bit field of "Temporal resolution class type" indicates the class type of temporal resolution of transmitted image data. For example, "001" represents 24 km, 25 km, 29.97 km, 30 km, etc., "010" indicates 50 km, 59.94 km, 60 km, etc., "011" is 100 km, 120 km, etc. Represents, "100" represents 200 Pa, 240 Pa, etc.

「Backward_compatible_type」의 2비트 필드는, 송신 화상 데이터에 관하여, 후방 호환성(Backward compatible)이 보증되어 있는지 여부를 나타낸다. 예를 들어, "00"은 후방 호환성이 보증되어 있지 않음을 나타낸다. "01"은 공간적인 해상도에 대한 후방 호환성이 보증되어 있음을 나타낸다. 이 경우, 송신 화상 데이터는, 예를 들어, 공간적인 해상도에 관하여 스케일러블 부호화가 되어 있다. "10"은 시간적인 해상도에 대한 후방 호환성이 보증되어 있음을 나타낸다. 이 경우, 송신 화상 데이터는, 예를 들어, 시간적인 해상도에 관하여 스케일러블 부호화가 되어 있다.The 2-bit field of "Backward_compatible_type" indicates whether backward compatibility is guaranteed with respect to the transmitted image data. For example, "00" indicates that backward compatibility is not guaranteed. "01" indicates that backward compatibility with respect to spatial resolution is guaranteed. In this case, the transmitted image data is scalable encoded with respect to spatial resolution, for example. "10" indicates that backward compatibility with respect to temporal resolution is guaranteed. In this case, the transmitted image data is scalable encoded with respect to temporal resolution, for example.

「lower_capable_decoder_support_flag」의 플래그 정보는, 송신 화상 데이터의 공간적 및/또는 시간적인 해상도에 대응하지 않는 저능력 디코더를 위한 서포트가 되어 있는지 여부를 나타낸다. 예를 들어, "0"은 서포트되어 있지 않음을 나타낸다. "1"은 서포트되어 있음을 나타낸다. 예를 들어, 전술한 바와 같이, 비디오 스트림에 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되는 경우에는, 이 플래그 정보는 "1"로 된다.The flag information of "lower_capable_decoder_support_flag" indicates whether there is support for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the transmitted image data. For example, "0" indicates that it is not supported. "1" indicates that it is supported. For example, as described above, when the auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted in the video stream, this flag information becomes "1".

도 19는, 트랜스포트 스트림 TS의 구성예를 나타내고 있다. 트랜스포트 스트림 TS에는, 비디오 엘리멘터리 스트림의 PES 패킷 「PID1: video PES1」과, 오디오 엘리멘터리 스트림의 PES 패킷 「PID2: Audio PES1」이 포함되어 있다. 이 비디오 엘리멘터리 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 SEI 메시지로서 삽입되어 있다.19 shows a configuration example of the transport stream TS. The transport stream TS includes a PES packet "PID1: video PES1" of the video elementary stream and a PES packet "PID2: Audio PES1" of the audio elementary stream. In this video elementary stream, auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted as an SEI message.

이 경우, 보조 정보로서의 움직임 벡터 MV의 정밀도 제한을 나타내는 정보는, SEI 메시지(downscaling_spatial SEI message)(도 10 참조)로서 삽입된다. 또한, 보조 정보로서의 시간 해상도를 소정의 비율로 다운스케일링할 때 선택할 픽처를 나타내는 정보는, SEI 메시지(picture_temporal_pickup SEI message)(도 12 참조)로서 삽입된다.In this case, information indicating the precision limitation of the motion vector MV as auxiliary information is inserted as an SEI message (downscaling_spatial SEI message) (see FIG. 10). Further, information indicating a picture to be selected when downscaling the temporal resolution as auxiliary information at a predetermined rate is inserted as an SEI message (picture_temporal_pickup SEI message) (see FIG. 12).

또한, 트랜스포트 스트림 TS에는, PSI(Program Specific Information)로서, PMT(Program Map Table)가 포함되어 있다. 이 PSI는, 트랜스포트 스트림에 포함되는 각 엘리멘터리 스트림이 어느 프로그램에 속해 있는지를 기재한 정보이다. 또한, 트랜스포트 스트림 TS에는, 이벤트(프로그램) 단위의 관리를 행하는 SI(Serviced Information)로서의 EIT(Event Information Table)가 포함되어 있다.Further, the transport stream TS includes a Program Map Table (PMT) as Program Specific Information (PSI). This PSI is information describing which program each elementary stream included in the transport stream belongs to. In addition, the transport stream TS includes an event information table (EIT) as service information (SI) for managing events (programs).

PMT에는, 각 엘리멘터리 스트림에 관련된 정보를 갖는 엘리멘터리 루프가 존재한다. 이 구성예에서는, 비디오 엘리멘터리 루프(Video ES loop)가 존재한다. 이 비디오 엘리멘터리 루프에는, 전술한 1개의 비디오 엘리멘터리 스트림에 대응하여 스트림 타입, 패킷 식별자(PID) 등의 정보가 배치됨과 함께, 그 비디오 엘리멘터리 스트림에 관련된 정보를 기술하는 디스크립터도 배치된다.In the PMT, there is an elementary loop having information related to each elementary stream. In this configuration example, there is a video elementary loop (Video ES loop). In this video elementary loop, information such as a stream type and a packet identifier (PID) is arranged in correspondence to the above-mentioned one video elementary stream, and a descriptor describing information related to the video elementary stream is also provided. Is placed.

이 PMT의 비디오 엘리멘터리 루프(Video ES loop)의 관리하에, 다운스케일링 디스크립터(downscaling_descriptor)(도 14 참조)가 삽입되어 있다. 이 디스크립터는, 전술한 바와 같이, 비디오 스트림에 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 있음을 나타내는 것이다.Under the management of the video elementary loop (Video ES loop) of the PMT, a downscaling descriptor (downscaling_descriptor) (see FIG. 14) is inserted. As described above, this descriptor indicates that auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted into the video stream.

또한, EIT의 관리하에, 수퍼 하이 레졸루션 디스크립터(Super High resolution descriptor)(도 17 참조). 이 디스크립터는, 전술한 바와 같이, 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별하기 위한 식별 정보를 구성하고 있다. 구체적으로는, 이 디스크립터는, 송신 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 포함되어 있다.Also, under the supervision of EIT, a Super High resolution descriptor (see FIG. 17). As described above, this descriptor constitutes identification information for identifying an ultra-high-definition service by a video stream in at least a program unit. Specifically, this descriptor contains spatial and/or temporal resolution information of transmitted image data.

「수신기의 구성예」"Configuration example of receiver"

도 20은, 수신기(200)의 구성예를 나타내고 있다. 이 수신기(200)는 CPU(201)와, 플래시 ROM(202)과, DRAM(203)과, 내부 버스(204)와, 리모트 컨트롤 수신부(205: RC 수신부)와, 리모트 컨트롤 송신기(206: RC 송신기)를 갖고 있다.20 shows a configuration example of the receiver 200. The receiver 200 includes a CPU 201, a flash ROM 202, a DRAM 203, an internal bus 204, a remote control receiver 205 (RC receiver), and a remote control transmitter 206: RC Transmitter).

또한, 이 수신기(200)는 안테나 단자(211)와, 디지털 튜너(212)와, 트랜스포트 스트림 버퍼(213: TS 버퍼)와, 디멀티플렉서(214)를 갖고 있다. 또한, 이 수신기(200)는 코디드 버퍼(215)와, 비디오 디코더(216)와, 디코디드 버퍼(217)와, 비디오 RAM(218)과, 코디드 버퍼(241)와, 오디오 디코더(242)와, 채널 믹싱부(243)를 갖고 있다.In addition, the receiver 200 has an antenna terminal 211, a digital tuner 212, a transport stream buffer (213: TS buffer), and a demultiplexer 214. In addition, the receiver 200 includes a coded buffer 215, a video decoder 216, a decoded buffer 217, a video RAM 218, a coded buffer 241, and an audio decoder 242. ), and a channel mixing unit 243.

CPU(201)는, 수신기(200)의 각부 동작을 제어한다. 플래시 ROM(202)은, 제어 소프트웨어의 저장 및 데이터의 보관을 행한다. DRAM(203)은, CPU(201)의 워크에리어를 구성한다. CPU(201)는, 플래시 ROM(202)으로부터 판독한 소프트웨어나 데이터를 DRAM(203) 위에 전개하여 소프트웨어를 기동시켜 수신기(200)의 각부를 제어한다. RC 수신부(205)는 RC 송신기(206)로부터 송신된 리모트 컨트롤 신호(리모콘 코드)를 수신하고, CPU(201)에 공급한다. CPU(201)는, 이 리모콘 코드에 기초하여, 수신기(200)의 각부를 제어한다. CPU(201), 플래시 ROM(202) 및 DRAM(203)은, 내부 버스(204)에 의해 서로 접속되어 있다.The CPU 201 controls the operation of each part of the receiver 200. The flash ROM 202 stores control software and stores data. The DRAM 203 constitutes a work area of the CPU 201. The CPU 201 deploys software or data read from the flash ROM 202 onto the DRAM 203 to activate the software to control each part of the receiver 200. The RC receiver 205 receives the remote control signal (remote control code) transmitted from the RC transmitter 206, and supplies it to the CPU 201. The CPU 201 controls each part of the receiver 200 based on this remote control code. The CPU 201, the flash ROM 202 and the DRAM 203 are connected to each other by an internal bus 204.

안테나 단자(211)는 수신 안테나(도시생략)에 의해 수신된 텔레비전 방송 신호를 입력하는 단자이다. 디지털 튜너(212)는 안테나 단자(211)에 입력된 텔레비전 방송 신호를 처리하여, 유저의 선택 채널에 대응한 소정의 트랜스포트 스트림 TS를 출력한다. 트랜스포트 스트림 버퍼(213: TS 버퍼)는, 디지털 튜너(212)로부터 출력된 트랜스포트 스트림 TS를 일시적으로 축적한다. 이 트랜스포트 스트림 TS에는, 비디오 엘리멘터리 스트림과, 오디오 엘리멘터리 스트림이 포함되어 있다.The antenna terminal 211 is a terminal for inputting a television broadcast signal received by a receiving antenna (not shown). The digital tuner 212 processes the television broadcast signal input to the antenna terminal 211, and outputs a predetermined transport stream TS corresponding to the user's selected channel. The transport stream buffer 213 (TS buffer) temporarily accumulates the transport stream TS output from the digital tuner 212. The transport stream TS includes a video elementary stream and an audio elementary stream.

디멀티플렉서(214)는, TS 버퍼(213)에 일시적으로 축적된 트랜스포트 스트림 TS로부터, 비디오 및 오디오의 각 스트림(엘리멘터리 스트림)을 추출한다. 또한, 디멀티플렉서(214)는, 이 트랜스포트 스트림 TS로부터, 전술한 다운스케일링 디스크립터(downscaling_descriptor)와, 수퍼 하이 레졸루션 디스크립터(Super High resolution descriptor)를 추출하고, CPU(201)로 보낸다.The demultiplexer 214 extracts each stream (elementary stream) of video and audio from the transport stream TS temporarily accumulated in the TS buffer 213. Further, the demultiplexer 214 extracts the downscaling descriptor (downscaling_descriptor) and the super high resolution descriptor (Super High resolution descriptor) described above from the transport stream TS, and sends them to the CPU 201.

CPU(201)는, 수퍼 하이 레졸루션 디스크립터로부터, 수신 화상 데이터의 공간적 및 시간적인 해상도 정보, 또한 수신 화상 데이터에 후방 호환성(Backward Compatible)이 있는지 여부의 정보, 수신 화상 데이터에 저능력 디코더를 위한 서포트가 되어 있는지 여부의 정보 등을 파악할 수 있다. 또한, CPU(201)는, 다운스케일링 디스크립터로부터, 비디오 스트림에 공간적 및/또는 시간적인 해상도의 다운스케일링 처리를 위한 보조 정보가 삽입되어 있는지 여부의 정보, 또한, 공간적 및/또는 시간적인 해상도의 다운스케일링에 있어서 가능한 비율의 정보 등을 파악할 수 있다.The CPU 201, from a super high resolution descriptor, provides spatial and temporal resolution information of the received image data, information on whether or not the received image data is backward compatible, and supports for a low-performance decoder in the received image data. It is possible to grasp information, such as whether or not. In addition, the CPU 201, from the downscaling descriptor, information on whether auxiliary information for downscaling processing of spatial and/or temporal resolution is inserted in the video stream, and also of spatial and/or temporal resolution down. In scaling, it is possible to grasp information of a possible ratio.

CPU(201)는, 이들 파악 정보에 기초하여, 수신기(200)에 있어서의 디코드 등의 처리를 제어한다. 예를 들어, 자신의 표시 능력을 대응할 수 없는 초고화질 서비스의 화상 데이터가 수신되어 있는 경우에, 그것이 스케일러블 부호화되어 있지 않을 때, CPU(201)는, 비디오 스트림에 삽입되어 있는 보조 정보에 기초하여, 공간적 및/또는 시간적인 해상도의 다운스케일링 처리를 행하게 하고, 원하는 해상도의 표시 화상 데이터가 얻어지도록 제어한다.The CPU 201 controls processing such as decoding in the receiver 200 based on these pieces of information. For example, when image data of an ultra-high-definition service that cannot cope with its display capability is received, when it is not scalable encoded, the CPU 201 is based on the auxiliary information inserted in the video stream. Thus, downscaling processing of spatial and/or temporal resolution is performed, and control is performed so that display image data of a desired resolution is obtained.

코디드 버퍼(215)는, 디멀티플렉서(214)에서 추출되는 비디오 엘리멘터리 스트림을 일시적으로 축적한다. 비디오 디코더(216)는 CPU(201)의 제어하에 코디드 버퍼(215)에 기억되어 있는 비디오 스트림에 대하여 디코드 처리를 행하여 표시 화상 데이터를 얻는다. 또한, 수신 화상 데이터의 내용에 따라서는, 공간적 및/또는 시간적인 해상도의 다운스케일링 처리도 불가능하며, 자신의 표시 능력에 있던 해상도의 표시 화상 데이터를 얻을 수 없는 경우도 있다.The coded buffer 215 temporarily accumulates the video elementary stream extracted from the demultiplexer 214. The video decoder 216 decodes the video stream stored in the coded buffer 215 under the control of the CPU 201 to obtain display image data. Further, depending on the content of the received image data, downscaling of spatial and/or temporal resolution is also impossible, and in some cases, it is not possible to obtain display image data of a resolution that is in its own display capability.

또한, 비디오 디코더(216)는, 비디오 스트림에 삽입되어 있는 SEI 메시지를 추출하고, CPU(201)로 보낸다. 이 SEI 메시지에는, 「downscaling_spatial SEI message」, 「picture_temporal_pickup SEI message」도 포함된다. CPU(201)는, 비디오 디코더(216)에서 공간적 및/또는 시간적인 해상도의 다운스케일링 처리가 행해지는 경우에는, 이 SEI 메시지에 포함되는 보조 정보에 기초하여 처리를 행하게 한다.In addition, the video decoder 216 extracts the SEI message inserted in the video stream and sends it to the CPU 201. The SEI message also includes "downscaling_spatial SEI message" and "picture_temporal_pickup SEI message". The CPU 201 causes the video decoder 216 to perform processing based on the auxiliary information included in this SEI message when downscaling processing with spatial and/or temporal resolution is performed.

즉, 공간적인 해상도의 다운스케일링 처리를 행하게 하는 경우에는, 「downscaling_spatial SEI message」의 SEI 메시지에 포함되는 움직임 벡터 MV의 정밀도 제한 정보에 기초하여 다운스케일링 처리를 행하게 하여, 처리 부하를 경감시킨다. 한편, 시간적인 해상도의 다운스케일링 처리를 행하게 하는 경우에는, 「picture_temporal_pickup SEI message」의 SEI 메시지에 포함되는 비율에 따른 선택 픽처 정보에 기초하여 다운스케일링 처리를 행하게 하여, 처리 부하를 경감시킨다.That is, when downscaling processing with spatial resolution is performed, downscaling processing is performed based on precision limit information of the motion vector MV included in the SEI message of the "downscaling_spatial SEI message", thereby reducing the processing load. On the other hand, when downscaling processing with temporal resolution is performed, downscaling processing is performed based on selected picture information according to a ratio included in the SEI message of the "picture_temporal_pickup SEI message", thereby reducing the processing load.

디코디드 버퍼(217)는, 비디오 디코더(216)에서 얻어진 표시 화상 데이터를 일시적으로 축적한다. 비디오 RAM(218)은, 디코디드 버퍼(217)에 기억되어 있는 표시 화상 데이터를 불러와 적절한 타이밍에 디스플레이로 출력한다.The decoded buffer 217 temporarily stores the display image data obtained by the video decoder 216. The video RAM 218 retrieves display image data stored in the decoded buffer 217 and outputs it to the display at an appropriate timing.

코디드 버퍼(241)는, 디멀티플렉서(214)에서 추출되는 오디오 스트림을 일시적으로 축적한다. 오디오 디코더(242)는, 코디드 버퍼(241)에 기억되어 있는 오디오 스트림의 복호화 처리를 행하여 복호화된 음성 데이터를 얻는다. 채널 믹싱부(243)는, 오디오 디코더(242)에서 얻어지는 음성 데이터에 대하여, 예를 들어 5.1ch 서라운드 등을 실현하기 위한 각 채널의 음성 데이터를 얻어 스피커에 공급한다.The coded buffer 241 temporarily accumulates the audio stream extracted from the demultiplexer 214. The audio decoder 242 decodes the audio stream stored in the coded buffer 241 to obtain decoded audio data. The channel mixing unit 243 obtains audio data of each channel for realizing, for example, 5.1ch surround and the like, for the audio data obtained from the audio decoder 242 and supplies it to the speaker.

수신기(200)의 동작을 설명한다. 안테나 단자(211)에 입력된 텔레비전 방송 신호는 디지털 튜너(212)에 공급된다. 이 디지털 튜너(212)에서는, 텔레비전 방송 신호가 처리되어 유저의 선택 채널에 대응한 소정의 트랜스포트 스트림 TS가 출력된다. 이 트랜스포트 스트림 TS는, TS 버퍼(213)에 일시적으로 축적된다. 이 트랜스포트 스트림 TS에는, 비디오 엘리멘터리 스트림과, 오디오 엘리멘터리 스트림이 포함되어 있다.The operation of the receiver 200 will be described. The television broadcast signal input to the antenna terminal 211 is supplied to the digital tuner 212. In this digital tuner 212, a television broadcast signal is processed and a predetermined transport stream TS corresponding to a user's selection channel is output. The transport stream TS is temporarily stored in the TS buffer 213. The transport stream TS includes a video elementary stream and an audio elementary stream.

디멀티플렉서(214)에서는, TS 버퍼(213)에 일시적으로 축적된 트랜스포트 스트림 TS로부터, 비디오 및 오디오의 각 스트림(엘리멘터리 스트림)이 추출된다. 또한, 디멀티플렉서(214)에서는, 이 트랜스포트 스트림 TS로부터, 다운스케일링 디스크립터(downscaling_descriptor)와, 수퍼 하이 레졸루션 디스크립터(Super High resolution descriptor)가 추출되고, CPU(201)로 보내진다. CPU(201)에서는, 이들 디스크립터에 포함되는 정보에 기초하여, 수신기(200)에 있어서의 디코드 등의 처리를 제어하는 일이 행해진다.In the demultiplexer 214, each stream (elementary stream) of video and audio is extracted from the transport stream TS temporarily accumulated in the TS buffer 213. Further, in the demultiplexer 214, a downscaling descriptor (downscaling_descriptor) and a super high resolution descriptor (Super High resolution descriptor) are extracted from this transport stream TS and sent to the CPU 201. The CPU 201 controls the processing of the decoding and the like in the receiver 200 based on the information included in these descriptors.

디멀티플렉서(214)에서 추출되는 비디오 스트림은, 코디드 버퍼(215)에 공급되어 일시적으로 축적된다. 비디오 디코더(216)에서는, CPU(201)의 제어하에 코디드 버퍼(215)에 기억되어 있는 비디오 스트림에 대하여 디코드 처리가 행해지고, 자신의 표시 능력에 맞는 표시 화상 데이터가 얻어진다.The video stream extracted from the demultiplexer 214 is supplied to the coded buffer 215 and temporarily accumulated. In the video decoder 216, decoding processing is performed on the video stream stored in the coded buffer 215 under the control of the CPU 201, and display image data suitable for its display capability is obtained.

이 경우, 비디오 디코더(216)에서는, 기본 비디오 스트림에 삽입되어 있는 「downscaling_spatial SEI message」, 「picture_temporal_pickup SEI message」 등도 포함하는 SEI 메시지가 추출되고, CPU(201)로 보내진다. CPU(201)에서는, 비디오 디코더(216)에서 공간적 및/또는 시간적인 해상도의 다운스케일링 처리가 행해지는 경우에는, 이 SEI 메시지에 포함되는 보조 정보에 기초하여 처리를 행하게 한다.In this case, the video decoder 216 extracts an SEI message including the "downscaling_spatial SEI message", "picture_temporal_pickup SEI message", etc. inserted into the basic video stream, and sends it to the CPU 201. The CPU 201 causes the video decoder 216 to perform processing based on the auxiliary information included in this SEI message when downscaling processing with spatial and/or temporal resolution is performed.

비디오 디코더(216)에서 얻어진 표시 화상 데이터는 디코디드 버퍼(217)에 일시적으로 축적된다. 그 후, 비디오 RAM(218)에서는, 적절한 타이밍에 디코디드 버퍼(217)에 기억되어 있는 표시 화상 데이터가 불러와져서 디스플레이로 출력된다. 이에 의해, 디스플레이에 화상 표시가 행해진다.The display image data obtained by the video decoder 216 is temporarily accumulated in the decoded buffer 217. Thereafter, in the video RAM 218, the display image data stored in the decoded buffer 217 is loaded at an appropriate timing and output to the display. Thereby, image display is performed on the display.

또한, 디멀티플렉서(214)에서 추출되는 오디오 스트림은, 코디드 버퍼(241)에 공급되어 일시적으로 축적된다. 오디오 디코더(242)에서는, 코디드 버퍼(241)에 기억되어 있는 오디오 스트림의 복호화 처리가 행해져서 복호화된 음성 데이터가 얻어진다. 이 음성 데이터는 채널 믹싱부(243)에 공급된다. 채널 믹싱부(243)에서는, 음성 데이터에 대하여 예를 들어 5.1ch 서라운드 등을 실현하기 위한 각 채널의 음성 데이터가 생성된다. 이 음성 데이터는 예를 들어 스피커에 공급되고, 화상 표시에 맞춘 음성 출력이 이루어진다.Further, the audio stream extracted from the demultiplexer 214 is supplied to the coded buffer 241 and is temporarily accumulated. In the audio decoder 242, the audio stream stored in the coded buffer 241 is decoded to obtain decoded audio data. This audio data is supplied to the channel mixing unit 243. In the channel mixing unit 243, audio data of each channel for realizing, for example, 5.1ch surround, etc. is generated for the audio data. This audio data is supplied to, for example, a speaker, and audio output adapted to the image display is made.

전술한 바와 같이, 도 1에 도시한 화상 송수신 시스템(10)에 있어서는, 비디오 스트림에, 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 송신되는 것이다. 그로 인해, 초고화질 서비스의 화상 데이터가 스케일러블 부호화되지 않고 송신되는 경우, 이 초고화질 서비스에 대응하지 않는 수신기(200)에 있어서, 자신의 표시 능력에 맞는 해상도의 화상 데이터의 취득을 용이하게 행할 수 있다.As described above, in the image transmission/reception system 10 shown in Fig. 1, auxiliary information for downscaling of spatial and/or temporal resolution of image data is inserted into a video stream and transmitted. Therefore, when the image data of the ultra-high-definition service is transmitted without scalable encoding, the receiver 200 that does not support this ultra-high-definition service can easily acquire the image data of a resolution suitable for its display capability. Can.

<2. 변형예><2. Modification>

또한, 전술한 실시 형태에 있어서는, 컨테이너가 트랜스포트 스트림(MPEG-2 TS)인 예를 나타내었다. 그러나, 본 발명은, 인터넷 등의 네트워크를 이용하여 수신 단말기에 배신되는 구성의 시스템에도 마찬가지로 적용할 수 있다. 인터넷의 배신에서는, MP4나 그 이외의 포맷의 컨테이너에 의해 배신되는 경우가 많다. 즉, 컨테이너로서는, 디지털 방송 규격으로 채용되어 있는 트랜스포트 스트림(MPEG-2 TS), 인터넷 배신에서 사용되고 있는 MP4 등의 다양한 포맷의 컨테이너가 해당된다.Moreover, in the above-mentioned embodiment, the example which the container is a transport stream (MPEG-2 TS) was shown. However, the present invention can also be applied to a system having a configuration distributed to a receiving terminal using a network such as the Internet. In the case of Internet distribution, it is often distributed by MP4 or other format containers. That is, as a container, containers of various formats, such as transport stream (MPEG-2 TS) which is adopted as a digital broadcasting standard, and MP4 used in Internet distribution, are applicable.

또한, 본 발명은, 이하와 같은 구성을 취할 수도 있다.In addition, the present invention can also take the following structures.

[1] 부호화 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너를 송신하는 송신부와,[1] a transmitter for transmitting a container of a predetermined format having a video stream containing encoded image data;

를 구비하는 송신 장치.Transmission device comprising a.

[2] 상기 보조 정보는, 상기 부호화 화상 데이터에 포함되는 움직임 벡터의 정밀도 제한을 나타내는 정보인, 상기 [1]에 기재된 송신 장치.[2] The transmission device according to [1], wherein the auxiliary information is information indicating a precision limit of a motion vector included in the coded image data.

[3] 상기 보조 정보는, 시간 해상도를 소정의 비율로 다운스케일링할 때 선택할 픽처를 식별하는 정보인, 상기 [1] 또는 [2]에 기재된 송신 장치.[3] The transmitting device according to [1] or [2], wherein the auxiliary information is information identifying a picture to be selected when downscaling the temporal resolution at a predetermined rate.

[4] 상기 컨테이너의 레이어에, 상기 보조 정보가 상기 비디오 스트림에 삽입되어 있음을 나타내는 식별 정보를 삽입하는 식별 정보 삽입부를 더 구비하는, 상기 [1] 내지 [3] 중 어느 하나에 기재된 송신 장치.[4] The transmitting apparatus according to any one of [1] to [3], further comprising an identification information inserting unit for inserting identification information indicating that the auxiliary information is inserted into the video stream in a layer of the container. .

[5] 상기 식별 정보에는, 공간적 및/또는 시간적인 해상도의 다운스케일링에 있어서 가능한 비율을 나타내는 다운스케일링 정보가 부가되어 있는, 상기 [4]에 기재된 송신 장치.[5] The transmitting apparatus according to [4], wherein downscaling information indicating a possible ratio in downscaling of spatial and/or temporal resolution is added to the identification information.

[6] 상기 식별 정보에는, 상기 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 부가되어 있는, 상기 [4] 또는 [5]에 기재된 송신 장치.[6] The transmission device according to [4] or [5], wherein spatial and/or temporal resolution information of image data included in the video stream is added to the identification information.

[7] 상기 컨테이너는 트랜스포트 스트림이며,[7] The container is a transport stream,

상기 식별 정보 삽입부는, 상기 식별 정보를, 상기 트랜스포트 스트림에 포함되는 프로그램 맵 테이블의 비디오 엘리멘터리 루프의 관리하의 기술자에 삽입하는, 상기 [4] 내지 [6] 중 어느 하나에 기재된 송신 장치.The identification information inserting unit inserts the identification information into a descriptor under the management of a video elementary loop of a program map table included in the transport stream, and the transmission device according to any one of [4] to [6]. .

[8] 상기 컨테이너의 레이어에, 상기 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보를 삽입하는 해상도 정보 삽입부를 더 구비하는, 상기 [1] 내지 [7] 중 어느 하나에 기재된 송신 장치.[8] The method according to any one of [1] to [7], further comprising a resolution information inserting unit for inserting spatial and/or temporal resolution information of image data included in the video stream into the layer of the container. Transmitting device.

[9] 상기 해상도 정보에는,[9] The resolution information includes:

상기 비디오 스트림에, 상기 화상 데이터의 공간적 및/또는 시간적인 해상도에 대응하지 않는 저능력 디코더를 위한 서포트가 되어 있는지 여부를 식별하는 식별 정보가 부가되어 있는, 상기 [8]에 기재된 송신 장치.The transmission device according to [8], in which identification information for identifying whether or not support is provided for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is added to the video stream.

[10] 상기 컨테이너는 트랜스포트 스트림이며,[10] The container is a transport stream,

상기 해상도 정보 삽입부는, 상기 해상도 정보를, 상기 트랜스포트 스트림에 포함되는 이벤트 인포메이션 테이블의 관리하의 기술자에 삽입하는, 상기 [8] 또는 [9]에 기재된 송신 장치.The transmission device according to [8] or [9], wherein the resolution information inserting unit inserts the resolution information into a descriptor under the management of an event information table included in the transport stream.

[11] 부호화 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너를 송신하는 스텝과,[11] a step of transmitting a container of a predetermined format having a video stream containing encoded image data;

상기 비디오 스트림에, 상기 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보를 삽입하는 스텝A step of inserting auxiliary information for downscaling the spatial and/or temporal resolution of the image data into the video stream.

을 구비하는 송신 방법.Transmission method comprising a.

[12] 부호화 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너를 송신하는 송신부와,[12] a transmitter for transmitting a container of a predetermined format having a video stream containing encoded image data;

를 구비하는 송신 장치.Transmission device comprising a.

[13] 상기 식별 정보에는, 상기 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 포함되는, 상기 [12]에 기재된 송신 장치.[13] The transmission device according to [12], wherein the identification information includes spatial and/or temporal resolution information of the image data.

[14] 상기 식별 정보에는,[14] The identification information includes:

상기 비디오 스트림에, 상기 화상 데이터의 공간적 및/또는 시간적인 해상도에 대응하지 않는 저능력 디코더를 위한 서포트가 되어 있는지 여부를 나타내는 서포트 정보가 부가되어 있는, 상기 [12] 또는 [13]에 기재된 송신 장치.The transmission according to [12] or [13], in which support information indicating whether or not support is provided for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is added to the video stream. Device.

[15] 상기 컨테이너는 트랜스포트 스트림이며,[15] The container is a transport stream,

상기 식별 정보 삽입부는, 상기 식별 정보를, 상기 트랜스포트 스트림에 포함되는 이벤트 인포메이션 테이블의 관리하의 기술자에 삽입하는, 상기 [12] 내지 [14] 중 어느 하나에 기재된 송신 장치.The identification information insertion unit is any one of [12] to [14], wherein the identification information is inserted into a descriptor under the management of an event information table included in the transport stream.

[16] 화상 데이터를 포함하는 비디오 스트림을 갖는 소정 포맷의 컨테이너를 송신하는 스텝과,[16] a step of transmitting a container of a predetermined format having a video stream containing image data;

상기 컨테이너의 레이어에, 상기 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별할 수 있도록 식별 정보를 삽입하는 스텝Step of inserting identification information in the layer of the container so that the high-definition service by the video stream can be identified at least in program units.

을 구비하는 송신 방법.Transmission method comprising a.

[17] 부호화 화상 데이터를 포함하는 비디오 스트림을 수신하는 수신부와,[17] A receiving unit for receiving a video stream containing encoded image data,

를 더 구비하는 수신 장치.A receiving device further comprising.

[18] 상기 수신부는, 상기 비디오 스트림을 포함하는 소정 포맷의 컨테이너를 수신하고,[18] The receiving unit receives a container of a predetermined format including the video stream,

상기 컨테이너의 레이어에, 공간적 및/또는 시간적인 해상도의 다운스케일링에 있어서 가능한 비율을 나타내는 다운스케일링 정보가 삽입되어 있으며,In the container layer, downscaling information indicating a possible ratio in downscaling of spatial and/or temporal resolution is inserted,

상기 처리부는, 상기 다운스케일링 정보에 기초하여, 상기 표시 화상 데이터를 얻기 위한 상기 다운스케일링 처리를 제어하는, 상기 [17]에 기재된 수신 장치.The receiving device according to [17], wherein the processing unit controls the downscaling process for obtaining the display image data based on the downscaling information.

[19] 상기 수신부는, 상기 비디오 스트림을 포함하는 소정 포맷의 컨테이너를 수신하고,[19] The receiving unit receives a container of a predetermined format including the video stream,

상기 컨테이너의 레이어에, 상기 비디오 스트림에 포함되는 화상 데이터의 공간적 및/또는 시간적인 해상도 정보가 삽입되어 있으며,In the layer of the container, spatial and/or temporal resolution information of image data included in the video stream is inserted,

상기 처리부는, 상기 해상도 정보에 기초하여, 상기 표시 화상 데이터를 얻기 위한 상기 다운스케일링 처리를 제어하는, 상기 [17] 또는 [18]에 기재된 수신 장치.The receiving device according to [17] or [18], wherein the processing unit controls the downscaling process for obtaining the display image data based on the resolution information.

[20] 부호화 화상 데이터를 포함하고, 그 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보가 삽입되어 있는 비디오 스트림을 수신하는 스텝과,[20] a step of receiving a video stream including encoded image data and inserting auxiliary information for downscaling of spatial and/or temporal resolution of the image data;

상기 부호화 화상 데이터에 대하여 상기 보조 정보에 기초하여 공간적 및/또는 시간적인 해상도의 다운스케일링 처리를 실시하여 원하는 해상도의 표시 화상 데이터를 얻는 스텝A step of downscaling spatial and/or temporal resolution on the encoded image data based on the auxiliary information to obtain display image data of a desired resolution

을 구비하는 수신 방법.A receiving method comprising a.

본 발명의 주된 특징은, 비디오 스트림에 화상 데이터의 공간적 및/또는 시간적인 해상도의 다운스케일링을 위한 보조 정보(SEI 메시지)를 삽입하여 송신함으로써, 수신측에 있어서의 다운스케일링 처리의 부하 경감을 가능하게 한 것이다(도 19 참조). 또한, 본 발명의 주된 특징은, 컨테이너(트랜스포트 스트림)의 레이어에, 비디오 스트림에 의한 초고화질 서비스를 적어도 프로그램 단위로 식별할 수 있도록 식별 정보를 삽입함으로써, 수신측에 있어서, 비디오 스트림을 디코드하지 않고, 초고화질 서비스의 식별을 가능하게 한 것이다(도 19 참조).The main feature of the present invention is that by inserting and transmitting auxiliary information (SEI message) for downscaling of spatial and/or temporal resolution of image data into a video stream, it is possible to reduce the load of downscaling processing on the receiving side. This was done (see Fig. 19). In addition, the main feature of the present invention is to decode the video stream at the receiving side by inserting identification information so that the ultra-high-definition service by the video stream can be identified in at least a program unit in a layer of a container (transport stream). Instead, it is possible to identify the ultra-high quality service (see FIG. 19).

10: 화상 송수신 시스템
100: 방송국
110: 송신 데이터 생성부
111: 화상 데이터 출력부
112: 비디오 인코더
115: 음성 데이터 출력부
116: 오디오 인코더
117: 멀티플렉서
200: 수신기
201: CPU
212: 디지털 튜너
213: 트랜스포트 스트림 버퍼(TS 버퍼)
214: 디멀티플렉서
215: 코디드 버퍼
216: 비디오 디코더
217: 디코디드 버퍼
218: 비디오 RAM
241: 코디드 버퍼
242: 오디오 디코더
243: 채널 믹싱부
10: video transmission and reception system
100: broadcasting station
110: transmission data generation unit
111: image data output unit
112: video encoder
115: audio data output unit
116: audio encoder
117: multiplexer
200: receiver
201: CPU
212: digital tuner
213: transport stream buffer (TS buffer)
214: demultiplexer
215: coded buffer
216 video decoder
217: decoded buffer
218: video RAM
241: coded buffer
242: audio decoder
243: channel mixing unit

Claims

As a transmitting device,
A transmitter for transmitting a container of a predetermined format having a video stream containing encoded image data;
An auxiliary information inserting unit for inserting auxiliary information for downscaling spatial and/or temporal resolution of the image data into the video stream.
Equipped with,
The auxiliary information identifies whether support for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is provided to the video stream and includes capability information added to the resolution information. The transmission device.

According to claim 1,
The auxiliary information is information indicating a precision limit of a motion vector included in the coded image data.

According to claim 1,
The auxiliary information is information for identifying a picture to be selected when downscaling a temporal resolution at a predetermined ratio.

According to claim 1,
And an identification information inserting unit for inserting identification information indicating that the auxiliary information is inserted into the video stream in the layer of the container.

According to claim 4,
A transmission apparatus in which downscaling information indicating a possible ratio in downscaling of spatial and/or temporal resolution is added to the identification information.

According to claim 4,
Spatial and/or temporal resolution information of image data included in the video stream is added to the identification information.

According to claim 4,
The container is a transport stream,
The identification information inserting unit inserts the identification information into a descriptor under the management of a video elementary loop of a program map table included in the transport stream.

According to claim 1,
And a resolution information inserting unit for inserting spatial and/or temporal resolution information of image data included in the video stream into a layer of the container.

delete

The method of claim 8,
The container is a transport stream,
The resolution information inserting unit inserts the resolution information into a descriptor under management of an event information table included in the transport stream.

As a transmission method,
A step of transmitting a container of a predetermined format having a video stream containing encoded image data;
A step of inserting auxiliary information for downscaling the spatial and/or temporal resolution of the image data into the video stream.
Equipped with,
The auxiliary information identifies whether or not support for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is provided to the video stream, and includes capability information added to the resolution information. .

As a transmitting device,
A transmitter for transmitting a container of a predetermined format having a video stream containing encoded image data;
Identification information inserting unit for inserting identification information in the layer of the container so that the high-definition service by the video stream can be identified by at least a program unit.
A transmission device comprising a.

The method of claim 12,
The identification information includes spatial and/or temporal resolution information of the image data.

The method of claim 12,
In the identification information,
A transmission device is added to the video stream, in which support information indicating whether support for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is added.

The method of claim 12,
The container is a transport stream,
The identification information inserting unit inserts the identification information into a descriptor under management of an event information table included in the transport stream.

As a transmission method,
A step of transmitting a container of a predetermined format having a video stream containing image data;
Step of inserting identification information in the layer of the container so that the high-definition service by the video stream can be identified at least in program units.
A transmission method comprising a.

As a receiving device,
A receiver for receiving a video stream including encoded image data;
In the video stream, auxiliary information for downscaling of spatial and/or temporal resolution of the image data is inserted,
A processing unit that obtains display image data of a desired resolution by performing downscaling of spatial and/or temporal resolution on the encoded image data based on the auxiliary information.
Equipped with,
The auxiliary information identifies whether or not support for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is provided to the video stream, and includes capability information added to the resolution information. .

The method of claim 17,
The receiving unit receives a container of a predetermined format including the video stream,
In the container layer, downscaling information indicating a possible ratio in downscaling of spatial and/or temporal resolution is inserted,
The processing unit controls the downscaling process for obtaining the display image data based on the downscaling information.

The method of claim 17,
The receiving unit receives a container of a predetermined format including the video stream,
In the layer of the container, spatial and/or temporal resolution information of image data included in the video stream is inserted,
The processing unit controls the downscaling process for obtaining the display image data based on the resolution information.

As a receiving method,
A step of receiving a video stream including encoded image data and inserting auxiliary information for downscaling the spatial and/or temporal resolution of the image data;
A step of downscaling spatial and/or temporal resolution on the encoded image data based on the auxiliary information to obtain display image data of a desired resolution
Equipped with,
The auxiliary information includes whether the support for a low-capacity decoder that does not correspond to the spatial and/or temporal resolution of the image data is provided to the video stream and includes capability information added to the resolution information. .