WO2014109321A1

WO2014109321A1 - Transmission device, transmission method, receiving device, and receiving method

Info

Publication number: WO2014109321A1
Application number: PCT/JP2014/050092
Authority: WO
Inventors: 山岸　靖明; 塚越　郁夫
Original assignee: ソニー株式会社
Priority date: 2013-01-09
Filing date: 2014-01-07
Publication date: 2014-07-17

Abstract

An objective of the present invention is to allow a legacy 2-D receiving device to desirably acquire superposition information data. Another objective of the present invention is to allow a 3-D receiving device to efficiently and appropriately carry out an acquisition of the superposition information data and disparity information corresponding thereto. According to a request from a receiving side, a video data stream is transmitted which includes left-eye image data and right-eye image data which configure a stereoscopic image, together with a first private data stream which includes subtitles and other superposition information data or a second private data stream which includes the superposition information data and disparity information being transmitted.

Description

Transmitting apparatus, transmitting method, receiving apparatus, and receiving method

The present technology relates to a transmission device, a transmission method, a reception device, and a reception method, and in particular, according to a request from the reception side, transmits a video data stream including left-eye image data and right-eye image data constituting a stereoscopic image. The present invention relates to a transmitting device and the like.

Conventionally, there has been proposed a transmission method using a television broadcast radio wave of stereoscopic image data (see, for example, Patent Document 1). In this transmission method, stereoscopic image data having left-eye image data and right-eye image data is transmitted, and stereoscopic image display using binocular parallax is performed.

FIG. 52 shows the relationship between the display position of the left and right images of an object (object) on the screen and the playback position of the stereoscopic image in stereoscopic image display using binocular parallax. For example, with respect to the object A in which the left image La is displayed on the right side and the right image Ra is shifted to the left side as shown in the figure on the screen, the right and left line of sight intersects in front of the screen surface. The position is in front of the screen surface. DPa represents a horizontal disparity vector related to the object A.

Further, for example, with respect to the object B in which the left image Lb and the right image Rb are displayed at the same position as shown in the figure on the screen, the right and left lines of sight intersect on the screen surface. It becomes on the surface. Further, for example, with respect to the object C displayed on the screen as shown in the figure, the left image Lc is shifted to the left side and the right image Rc is shifted to the right side, the right and left lines of sight intersect at the back of the screen surface. The playback position is behind the screen. DPc represents a horizontal disparity vector related to the object C.

Also, conventionally, an IPTV (Internet Protocol Television) distribution system using a network such as the Internet has been proposed (see, for example, Patent Document 2). Recently, standardization in Internet streaming such as IPTV has been performed. For example, standardization of a method applied to VoD (Video on Demand) streaming by HTTP (Hypertext Transfer Protocol) streaming and live streaming is being performed.

In particular, DASH (Dynamic Adaptive Streaming Over HTTP) that is standardized by ISO / IEC / MPEG is attracting attention. In DASH, a client terminal acquires and plays streaming data based on a metafile called MPD (Media Presentation Description) and an address (url) of chunked media data described therein. The media data in this case is media data such as audio (Audio) / video (Video) / subtitle (Subtitle).

Japanese Patent Laid-Open No. 2005-6114 JP 2011-193058 A

As described above, in stereoscopic image display, a viewer usually perceives the perspective of a stereoscopic image using binocular parallax. Superimposition information superimposed on an image, such as subtitles, is expected to be rendered in conjunction with stereoscopic image display not only in a two-dimensional space but also in a three-dimensional sense of depth. For example, when subtitles are superimposed (overlaid) on an image, viewers may feel a sense of inconsistency in perspective unless they are displayed in front of the closest object (object) in the perspective. .

Therefore, it is conceivable to transmit the parallax information between the left eye image and the right eye image together with the superimposition information data, and to give the parallax between the left eye superimposition information and the right eye superimposition information on the receiving side. In this manner, the parallax information is meaningful information in the receiving apparatus that can display a stereoscopic image. On the other hand, in a legacy 2D (two-dimensional) compatible receiving apparatus, this disparity information is unnecessary. In this 2D-compatible receiving apparatus, it is necessary to take some measures so that transmission of the parallax information does not hinder normal reception processing.

There is a request to apply TTML (Timed Text Markup Language) to live broadcasting and terrestrial IP broadband retransmission. In this TTML, the disparity information storage method applied to the 3D subtitle is not defined. This is a problem when TTML is applied to DASH-based IPTV streaming.

The purpose of the present technology is to enable a legacy 2D-compatible receiving apparatus to obtain superimposition information data satisfactorily. In addition, an object of the present technology is to enable a 3D-compatible receiving apparatus to efficiently and accurately acquire parallax information corresponding to superimposition information data.

The concept of this technology is
An image data output unit for outputting left-eye image data and right-eye image data constituting a stereoscopic image;
A superimposition information data output unit for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A parallax information output unit for outputting parallax information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data and providing parallax;
In response to a request from the receiving side, a video data stream including the image data is transmitted and a first private data stream including the superimposition information data or a second data including the superimposition information data and the disparity information And a data transmission unit for transmitting the private data stream.

In the present technology, the image data output unit outputs left eye image data and right eye image data constituting a stereoscopic image. The superimposition information data output unit outputs the superimposition information data to be superimposed on the image based on the left eye image data and the right eye image data. Here, the superimposition information is information such as subtitles, graphics, and text superimposed on the image. The disparity information output unit outputs disparity information for giving disparity by shifting the superimposition information to be superimposed on the image based on the left eye image data and the right eye image data.

In response to a request from the receiving side, the data transmission unit transmits a video data stream including image data, and includes a first private data stream including superimposition information data, or superimposition information data and disparity information A second private data stream is transmitted.

For example, the data transmission unit may have a distribution server and distribute each data stream to the receiving side through the network. In this case, a metafile generation unit that generates a metafile having information for the receiver to acquire each data stream, and a metafile transmission that transmits the metafile to the receiver via the network in response to a request from the receiver And a unit. For example, each data stream may be an MPEG-DASH based data stream, the metafile may be an MPD file, and the network may be a CDN.

In this case, the first identification information is added to the first metafile corresponding to the first private data stream, and the first identification is added to the second metafile corresponding to the second private data stream. Second identification information different from the information may be added. In this case, the first identification information corresponding to the first private data stream and the second identification information corresponding to the second private data stream may have a unique relationship.

In this case, the first metafile is further added with first type information indicating the first type, and the second metafile has a second type different from the first type. The second type information indicating that the information is present may be further added. In this case, first language information indicating a predetermined language is further added to the first metafile, and second language information indicating a non-language is further added to the second metafile. May be.

Thus, in the present technology, the first private data stream or the second private data stream is transmitted in response to a request from the receiving side. Therefore, the legacy 2D-compatible receiving device on the receiving side can obtain only the superimposition information data by sending the first private data stream. In addition, the 3D-compatible receiving device can efficiently and accurately acquire the parallax information corresponding to the superimposition information data by having the second private data stream sent.

Other concepts of this technology are
A request is made to the transmission side, and a video data stream including left-eye image data and right-eye image data constituting a stereoscopic image, and superimposition information data to be superimposed on an image based on the left-eye image data and the right-eye image data are transmitted. A second private data stream including parallax information for shifting the superimposition information to be superposed on the first private data stream or the superimposition information data and the left eye image data and the right eye image data to add the parallax. A data receiver for receiving the private data stream;
A first decoding unit for decoding the video stream;
And a second decoding unit that decodes the first private data stream or the second private data stream.

In the present technology, a request to the transmission side is made by the data receiving unit, and the video data stream is received, and the first private data stream or the second private data stream is received. The video data stream includes left eye image data and right eye image data constituting a stereoscopic image. The first private data stream includes superimposition information data to be superimposed on an image based on left-eye image data and right-eye image data. In addition to the superimposition information data, the second private data stream includes disparity information for shifting the superimposition information to give parallax.

For example, the data receiving unit may receive each data stream from the distribution server on the transmission side through the network. In this case, a metafile receiving unit that receives a metafile having information for acquiring each data stream is further provided, and the data receiving unit makes a request to the transmitting side based on the metafile. Good.

In this case, the first identification information is added to the first metafile corresponding to the first private data stream, and the first identification is added to the second metafile corresponding to the second private data stream. Second identification information different from the information may be added. In this case, each data stream may be an MPEG-DASH-based data stream, the metafile may be an MPD file, and the network may be a CDN.

The video data stream is decoded by the first decoding unit. Further, the first private data stream or the second private data stream is decoded by the second decoding unit. Here, when the data receiving unit receives the first private data stream, data of superimposition information is acquired. In addition, when the data reception unit receives the second private data stream, the superimposition information data and the disparity information are acquired.

Thus, in the present technology, a request is made to the transmission side, and the first private data stream or the second private data stream is received. Therefore, the legacy 2D-compatible receiving device on the receiving side can obtain only the superimposition information data by sending the first private data stream. In addition, the 3D-compatible receiving device can efficiently and accurately acquire the parallax information corresponding to the superimposition information data by having the second private data stream sent.

Other concepts of this technology are
An image data output unit for outputting left-eye image data and right-eye image data constituting a stereoscopic image;
A superimposition information data output unit for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A parallax information output unit for outputting parallax information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data and providing parallax;
In response to a request from the receiving side, the video data stream including the image data is transmitted, and a data transmission unit that transmits the data of the superimposition information and the private data stream including the disparity information,
In the private data stream, the first identification information is added to the data of the superimposition information, and the second identification information different from the first identification information is added to the parallax information.

The data transmission unit transmits a video data stream including image data and a private data stream including superimposition information data and disparity information in response to a request from the reception side. Here, in this private data stream, the first identification information is added to the superimposition information data, and the second identification information different from the first identification information is added to the parallax information.

For example, the data transmission unit may have a distribution server and distribute each data stream to the receiving side through the network. In this case, a metafile generation unit that generates a metafile having information for the receiver to acquire each data stream, and a metafile transmission that transmits the metafile to the receiver via the network in response to a request from the receiver And a unit.

As described above, in the present technology, the private data stream includes the superimposition information data and the disparity information, and identification information is added to them. Therefore, in the legacy 2D-compatible receiving device on the receiving side, it is possible to skip parallax information based on the identification information and obtain only the superimposition information data satisfactorily. That is, it is possible to prevent the transmission of disparity information from interfering with the reception process of the legacy 2D-compatible receiving device. In addition, the 3D-compatible receiving apparatus can efficiently and appropriately acquire the parallax information corresponding to the superimposition information data from the private data stream.

Other concepts of this technology are
Making a request to the transmission side, a video data stream including left-eye image data and right-eye image data constituting a stereoscopic image, superimposition information data to be superimposed on an image based on the left-eye image data and the right-eye image data, and A data receiving unit that receives a private data stream including disparity information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data;
A first decoding unit for decoding the video data stream;
A second decoding unit for decoding the private data stream,
In the private data stream, first identification information is added to the superimposition information data, and second identification information different from the first identification information is added to the parallax information,
The second decoding unit acquires the superimposition information data or the superimposition information data and the disparity information from the private data stream based on the first identification information and the second identification information. In the device.

In this technology, a request to the transmission side is made by the data receiving unit, and a video data stream and a private data stream are received. The video data stream includes left eye image data and right eye image data constituting a stereoscopic image. The private data stream is given parallax by shifting the superimposition information data superimposed on the image based on the left eye image data and the right eye image data and the superimposition information superimposed on the image based on the left eye image data and the right eye image data. Parallax information is included.

For example, the data receiving unit further includes a metafile receiving unit that receives each data stream from the distribution server of the transmission side through the network and receives a metafile having information for acquiring each data stream, and receives the data The unit may make a request to the transmission side based on the metafile.

The video data stream is decoded by the first decoding unit. The private data stream is decoded by the second decoding unit. Here, in the private data stream, the first identification information is added to the superimposition information data, and the second identification information different from the first identification information is added to the parallax information. In the second decoding unit, the superimposition information data, or the superimposition information data and the disparity information are acquired from the private data stream based on the identification information.

According to the present technology, a legacy 2D-compatible receiving apparatus can acquire superimposition information data satisfactorily, and a 3D-compatible receiving apparatus efficiently and accurately acquires the parallax information corresponding to the superimposition information data. Can do.

It is a block diagram which shows the structural example of the stream delivery system as embodiment. It is a figure which shows the hierarchical structure of a MPD file. It is the figure which arranged and showed an example of each structure contained in a MPD file on a time axis. It is a figure which shows an example of the relationship of each structure arrange | positioned hierarchically in the MPD file. It is a figure which shows an example of the relationship between a period (Period), a representation (Representation), and a segment (Segment). It is a figure which shows an example of the flow until it produces | generates a DASH segment and a DASH-MPD file from a content. It is a figure which shows the structural example of an IPTV client. It is a figure which shows the system of a general DASH-based stream delivery system. It is a block diagram which shows the structural example of a stream delivery system. It is a block diagram which shows the structural example of the transmission data generation part in a broadcast station. It is a figure which shows the image data of a pixel format of 1920 * 1080. It is a figure for demonstrating the transmission method of stereo image data (3D image data) "Top" & "Bottom" system, "Side" By "Side" system, and "Frame" Sequential "system. It is a figure for demonstrating the example which detects the parallax vector of the right eye image with respect to a left eye image. It is a figure for demonstrating calculating | requiring a parallax vector by a block matching system. It is a figure which shows the example of an image at the time of using the value of the parallax vector for every pixel (pixel) as a luminance value of each pixel (each pixel). It is a figure which shows an example of the parallax vector for every block (Block). It is a figure for demonstrating the downsizing process performed in the parallax information creation part of a transmission data generation part. In subtitle data, it is a figure which shows an example of the region defined on a screen, and the subregion defined in this region. It is a figure which shows the structural example of the subtitle data stream which a FragmentedMP4 stream has. It is a figure which shows the structural example of FragmentedMP4 stream. It is a figure which shows the example which expands an original DASH-MPD schema and introduces a subtitling type. It is a figure for demonstrating the information (Component_type = 0x15, 0x25) which shows the format of the subtitle for 3D. It is a figure which shows the excerpt of an ISO language code (ISO-639-2 Code) list. It is a figure which shows the structural example of the adaptation set corresponding to the 1st, 2nd subtitle data stream. FIG. 10 is a diagram illustrating an example of updating disparity information using an interval period (Interval period), in which the interval period is fixed and the period is equal to the update period. FIG. 10 is a diagram illustrating an example of updating disparity information using an interval period (Interval 図 period) and illustrating an example of updating disparity information when the interval period is a short period. It is a figure which shows the structural example of a subtitle data stream. It is a figure which shows the example of an update of parallax information in the case of transmitting a TTML-DSS segment sequentially. It is a figure which shows the example of an update of the disparity information (disparity) represented by the multiple of the interval period (ID: Interval | Duration) as an update frame interval as a unit period. It is a figure which shows the example of a display of the subtitle in which the page area | region (Area | for_Page_default) contains two regions (Region) as a caption display area. Each region in the TTML-DSS segment includes both disparity information in units of regions and disparity information in units of pages including all regions as disparity information (Disparity) that is sequentially updated during the caption display period. FIG. 6 is a diagram illustrating an example of a parallax information curve of a page. It is a figure which shows what kind of structure the parallax information of a page and each region is sent. It is a figure which shows an example of the TTML-DSS document corresponding to the data structure of the parallax information of a page and each region. It is a figure which shows the schema of TTML prescribed | regulated by W3C. It is a figure for demonstrating extending a part of W3C TTML specification in order to describe the parameter regarding disparity information (Disparity). It is a figure for demonstrating extending a part of W3C TTML specification in order to describe the parameter regarding disparity information (Disparity). It is a figure which shows the schema definition (ttaf1-dfxp-du-attribs.xsd) newly added for the extension of the TTML specification of W3C. It is a figure which shows the schema definition (ttaf1-dfxp-du.xsd) newly added for the extension of the TTML specification of W3C. It is a figure which shows the broadcast reception concept in case a set top box and a television receiver are 3D corresponding | compatible apparatuses. It is a figure which shows the broadcast reception concept in case a set top box and a television receiver are legacy 2D corresponding | compatible apparatuses. It is a figure which shows collectively the broadcast reception concept in case the receiver is a legacy 2D compatible device (2D 機器 Receiver) and a 3D compatible device (3D Receiver) (in the case of SBS). It is a figure which shows collectively the broadcast reception concept in case the receiver is a legacy 2D compatible device (2D2Receiver) and a 3D compatible device (3D Receiver) (in the case of MVC). It is a figure which shows the example of a subtitle (graphics information) display on an image, and the perspective of a background, a foreground object, and a subtitle. It is a figure which shows the example of a display of a subtitle on an image, and the left eye subtitle LGI and the right eye subtitle RGI for displaying a subtitle. It is a block diagram which shows the structural example of the set top box which comprises a stream delivery system. It is a block diagram which shows the structural example (3D corresponding apparatus) of the bit stream process part which comprises a set top box. It is a block diagram which shows the other structural example (2D corresponding | compatible apparatus) of the bit stream process part which comprises a set top box. It is a block diagram which shows the structural example of the television receiver which comprises a stream delivery system. It is a figure which shows the structural example of the subtitle data stream which a FragmentedMP4 stream has. It is a figure which shows the structural example of FragmentedMP4 stream. It is a block diagram which shows the other structural example of a stream delivery system. In stereoscopic image display using binocular parallax, it is a figure for demonstrating the relationship between the display position of the left-right image of the object on a screen, and the reproduction | regeneration position of the stereoscopic image.

Hereinafter, modes for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The description will be given in the following order.
1. Embodiment 2. FIG. Modified example

<1. Embodiment>
[Stream distribution system]
FIG. 1 shows a configuration example of a stream distribution system 10 as an embodiment. The stream distribution system 10 is an MPEG-DASH based stream distribution system. In this stream distribution system 10, N IPTV clients 13-1, 13-2,..., 13 -N are connected to a DASH segment streamer 11 and a DASH MPD server 12 via a CDN (Content Delivery Network) 14. The connected configuration.

The DASH segment streamer 11 generates a DASH specification stream segment (hereinafter referred to as “DASH segment”) based on media data (video data, audio data, caption data, etc.) of predetermined content, and HTTP from the IPTV client. Send segments on demand. The DASH segment streamer 11 is a web server.

In this embodiment, the DASH segment streamer 11 generates a DASH segment of the video data stream based on the left eye image data and the right eye image data constituting the stereoscopic image. The DASH segment streamer 11 generates a DASH segment of a video data stream having a plurality of rates as a DASH segment of the video data stream.

Further, the DASH segment streamer 11 responds to a request for a segment of a predetermined stream sent from the IPTV client 13 (13-1, 13-2,..., 13-N) via the CDN 14, and the stream is streamed. Are sent to the requesting IPTV client 13 via the CDN 14. In this case, the IPTV client 13 refers to the rate value described in the MPD (Media Presentation Description) file, and selects the stream with the optimum rate according to the state of the network environment where the client is placed. Make a request.

The DASH MPD server 12 is a server that generates an MPD file for acquiring a DASH segment generated in the DASH segment streamer 11. The MPD file is generated based on the content metadata from the content management server (not shown in FIG. 1) and the segment address (url) generated in the DASH segment streamer 11.

In the MPD format, each attribute is described using an element called “Representation” for each stream such as video and audio. For example, in the MPD file, for each of a plurality of video data streams having different rates, representations are described by dividing the representation. The IPTV client 13 can select an optimum stream according to the state of the network environment where the IPTV client 13 is located as described above with reference to the rate value.

MPD file has a hierarchical structure as shown in FIG. In this MPD file, information such as the compression method, encoding speed, image size, and language of the moving image stored in the DASH segment streamer 11 is hierarchically described in the XML format. This MPD file has a structure such as period, adaptation set, representation, segment info (SegmentInfo), initialization segment (Initialization Segment), and media segment (Media Segment). It is included hierarchically.

The structure of the period has information on the program (a set of synchronized video and audio data). Further, the adaptation set structure included in the period structure groups the stream selection range (representation group). In addition, the representation structure included in the adaptation set structure has information such as the encoding speed of the moving image and audio, and the audio size of the moving image.

Also, the segment info structure included in the representation structure has information related to video and audio segments. The initialization segment structure included in the segment info structure has initialization information such as a data compression method. Further, the media segment structure included in the segment info structure has information such as an address for acquiring a moving image or audio segment.

FIG. 3 shows an example of each structure included in the MPD file described above side by side on the time axis. In this example, the MPD file includes two periods, and each period includes two segments. In this example, each period includes two adaptation sets, and each adaptation set includes two representations related to streams of the same content with different stream attributes.

FIG. 4 shows an example of the relationship between the structures arranged hierarchically in the MPD file described above. As shown in FIG. 4A, a media presentation (Media Presentation) as an entire MPD file includes a plurality of periods (Periods) separated by time intervals. For example, the first period starts from 0 seconds, the next period starts from 100 seconds, and so on.

As shown in FIG. 4B, there are a plurality of representations in the period. The plurality of representations include a group of representations related to video data streams having the same content with different stream attributes, for example, rates, grouped by the above-described adaptation set (AdaptationSet).

4 (c), the representation includes segment info (SegmentInfo). In this segment info, as shown in FIG. 4D, the initialization segment (Initialization セグメント Segment) and a plurality of media segments (Media) in which information for each segment (Segment) in which the period is further divided are described. Segment) exists. In the media segment, there is information on an address (url) for actually acquiring segment data such as video and audio.

Note that stream switching can be freely performed between a plurality of representations grouped in the adaptation set. This makes it possible to select an optimal rate stream according to the state of the network environment where the IPTV client is placed, and to enable continuous video distribution.

FIG. 5 shows an example of the relationship between a period, a representation, and a segment. In this example, the MPD file includes two periods, and each period includes two segments. In this example, each period includes a plurality of representations related to the same media content.

FIG. 6 shows an example of a flow from generation of content to a DASH segment or DASH MPD file. Content is sent from the content management server 15 to the DASH segment streamer 11. The DASH segment streamer 11 generates a DASH segment for each data stream based on video data, audio data, and the like that constitute the content.

Also, the DASH segment streamer 11 sends the DASH segment address (url) information of the generated data stream to the DASH MPD server 12. The content management server 15 sends the metadata of the content to the DASH MPD server 12. The DASH MPD server 12 generates a DASH MPD file based on the address information of the DASH segment of each data stream and the content metadata.

FIG. 7 shows a configuration example of the IPTV client 13 (13-1 to 13-N). The IPTV client 13 includes a streaming data control unit 131, an HTTP access unit 132, and a moving image playback unit 133. The streaming data control unit 131 acquires an MPD file from the DASH MPD server 12 and analyzes the content.

The HTTP access unit 132 requests the DASH segment streamer 11 for a moving image or audio segment used for moving image reproduction. At this time, considering the screen size of the IPTV client 13 and the state of the transmission path, etc., a stream having the optimum image size and encoding speed is selected. For example, in a first stage, a segment of a stream having a low encoding rate (rate) is requested, and when a communication condition is good, a segment of a stream having a high encoding rate (rate) is requested.

The HTTP access unit 132 sends the received video or audio segment to the video playback unit 133. The moving image reproduction unit 133 performs decoding processing on each segment sent from the HTTP access unit 132 to obtain one moving image content, and reproduces the moving image and the sound. The processing of each unit of the IPTV client 13 is performed by software, for example.

FIG. 8 shows a general DASH-based stream distribution system. Both the DASH MPD file and the DASH segment are distributed via a CDN (Content Delivery Network) 14. The CDN 14 has a configuration in which a plurality of cache servers (DASH cache servers) are arranged in a network.

The cache server receives an HTTP request for acquiring an MPD file from the IPTV client 13. If the cache server is in the local MPD cache, it returns an HTTP response to the IPTV client 13. If the cache server is not in the local MPD cache, the cache server transfers the request to the DASHＳMPD server 12 or a higher-level cache server. The cache server receives the HTTP response in which the MPD file is stored, transfers the HTTP response to the IPTV client 13, and performs a cache process.

In addition, the cache server receives an HTTP request for acquiring a DASH segment from the IPTV client 13. If the cache server is in the local segment cache, it returns it to the IPTV client 13 as an HTTP response. If the cache server is not in the local segment cache, the request is transferred to the DASH segment streamer 11 or a higher-level cache server. The cache server receives the HTTP response in which the DASH segment is stored, transfers the HTTP response to the IPTV client 13, and performs cache processing.

In the CDN 14, the DASH segment delivered to the IPTV client 13-1 that originally issued the HTTP request is temporarily cached in the path cache server, and the subsequent HTTP request from the other IPTV client 13-2 is received. Is delivered its cached DASH segment. Therefore, it is possible to improve the delivery efficiency of HTTP streaming for the majority of IPTV clients.

The CDN 14 has a predetermined number of cache management servers in addition to a plurality of cache servers. The cache management server creates a cache control policy based on an index relating to the cache of the DASH segment of each video data stream included in the MPD file, and distributes the cache control policy to each cache server. Each cache server performs caching processing of the DASH segment of each video data stream based on this cache control policy.

FIG. 9 shows the stream distribution system 10 shown in FIG. 1 in another form. The stream distribution system 10 includes a broadcasting station 100, a set top box (STB) 200, and a television receiver (TV) 300. The broadcasting station 100 includes a DASH segment streamer 11 and a DASH server 12 in the stream distribution system 10 of FIG. The set top box 200 and the television receiver 300 constitute the IPTV client 13 (13-1 to 13-N) in the stream distribution system 10 of FIG.

The set top box 200 and the television receiver 300 are connected by a digital interface, for example, HDMI (High Definition Multimedia Interface). The set top box 200 and the television receiver 300 are connected using an HDMI cable 400. The set top box 200 is provided with an HDMI terminal 202. The television receiver 300 is provided with an HDMI terminal 302. One end of the HDMI cable 400 is connected to the HDMI terminal 202 of the set top box 200, and the other end of the HDMI cable 400 is connected to the HDMI terminal 302 of the television receiver 300.

[Description of broadcasting station]
The broadcasting station 100 transmits the Fragmented MP4 stream to the set top box STB 200 via a CDN (Content Delivery Network) 14 (see FIG. 1). The broadcast station 100 includes a transmission data generation unit 110 that generates a Fragmented MP4 stream. This Fragmented MP4 stream includes image data, audio data, superimposition information data, disparity information, and the like. Here, the image data is stereoscopic image data of a predetermined transmission method including left eye image data and right eye image data constituting a stereoscopic image. The stereoscopic image data has a predetermined transmission format. The superimposition information is generally subtitles, graphics information, text information, etc., but in this embodiment is a subtitle (caption).

"Configuration example of transmission data generator"
FIG. 10 shows a configuration example of the transmission data generation unit 110 in the broadcast station 100. The transmission data generation unit 110 transmits disparity information (disparity vector) with a data structure that can be easily linked to the DVB (Digital Video Broadcasting) method, which is one of existing broadcasting standards. The transmission data generation unit 110 includes a data extraction unit 111, a video encoder 112, and an audio encoder 113. Further, the transmission data generation unit 110 includes a subtitle generation unit 114, a disparity information creation unit 115, a subtitle processing unit 116, a subtitle encoder 118, and a multiplexer 119.

A data recording medium 111a is detachably attached to the data extraction unit 111, for example. In this data recording medium 111a, audio data and parallax information are recorded in association with left-eye image data and right-eye image data constituting a stereoscopic image. The data extraction unit 111 extracts and outputs image data, audio data, parallax information, and the like from the data recording medium 111a. The data recording medium 111a is a disk-shaped recording medium, a semiconductor memory, or the like.

The left eye image data and right eye image data extracted from the data extraction unit 111 are transmitted as stereoscopic image data (3D image data) of a predetermined transmission method. An example of a transmission method of stereoscopic image data will be described. Here, the following first to third transmission methods are listed, but other transmission methods may be used. In addition, here, as shown in FIG. 11, the case where the image data of the left eye (L) and the right eye (R) is image data of a predetermined resolution, for example, 1920 * 1080 pixel format. Let's take an example.

The first transmission method is a top-and-bottom (Top & Bottom) method. As shown in FIG. 12A, in the first half of the vertical direction, the data of each line of the left eye image data is transmitted, and the vertical direction The latter half of the system is a method for transmitting data of each line of right eye image data. In this case, since the lines of the left eye image data and the right eye image data are thinned out to ½, the vertical resolution is halved with respect to the original signal.

The second transmission method is a side-by-side (Side By Side) method. As shown in FIG. 12B, in the first half in the horizontal direction, pixel data of the left eye image data is transmitted, and in the second half in the horizontal direction. Then, the pixel data of the right eye image data is transmitted. In this case, in the left eye image data and the right eye image data, the pixel data in the horizontal direction is thinned out to 1/2. The horizontal resolution is halved with respect to the original signal.

The third transmission method is a frame-sequential method or an L / R-no interleaving method, as shown in FIG. 12 (c). In this method, eye image data is sequentially switched for each frame and transmitted. This method includes a full frame method or a service compatible method for a conventional 2D format.

Further, the disparity information recorded in the data recording medium 111a is, for example, a disparity vector for each pixel (pixel) constituting the image. A detection example of a disparity vector will be described. Here, an example in which the parallax vector of the right eye image with respect to the left eye image is detected will be described. As illustrated in FIG. 13, the left eye image is a detection image, and the right eye image is a reference image. In this example, the disparity vectors at the positions (xi, yi) and (xj, yj) are detected.

A case where a disparity vector at the position of (xi, yi) is detected will be described as an example. In this case, for example, a 4 * 4, 8 * 8, or 16 * 16 pixel block (parallax detection block) Bi is set in the left eye image with the pixel at the position (xi, yi) at the upper left. Then, a pixel block matching the pixel block Bi is searched in the right eye image.

In this case, a search range centered on the position of (xi, yi) is set in the right eye image, and each pixel in the search range is sequentially set as a pixel of interest, for example, 4 * 4, similar to the pixel block Bi described above , 8 * 8 or 16 * 16 comparison blocks are sequentially set.

Between the pixel block Bi and the comparison blocks that are sequentially set, the sum of the absolute differences for each corresponding pixel is obtained. Here, as shown in FIG. 14, when the pixel value of the pixel block Bi is L (x, y) and the pixel value of the comparison block is R (x, y), the pixel block Bi, a certain comparison block, The sum of absolute differences between the two is represented by Σ | L (x, y) −R (x, y) |.

When n pixels are included in the search range set in the right eye image, n total sums S1 to Sn are finally obtained, and the minimum sum Smin is selected. Then, the position of the upper left pixel (xi ′, yi ′) is obtained from the comparison block from which the sum Smin is obtained. Thus, the disparity vector at the position (xi, yi) is detected as (xi′−xi, yi′−yi). Although detailed description is omitted, for the disparity vector at the position (xj, yj), the left eye image has the pixel at the position (xj, yj) at the upper left, for example, 4 * 4, 8 * 8, or 16 *. Sixteen pixel blocks Bj are set and detected in the same process.

Referring back to FIG. 10, the video encoder 112 converts the left eye image data and right eye image data extracted from the data extraction unit 111 into stereoscopic image data of a predetermined transmission method. Then, the video encoder 112 performs encoding such as MPEG4-AVC, MPEG2, or VC-1 on the stereoscopic image data to generate a video data stream (video elementary stream). The audio encoder 113 performs encoding such as AC3 or AAC on the audio data extracted from the data extraction unit 111 to generate an audio data stream (audio elementary stream).

The subtitle generation unit 114 generates subtitle data that is DVB (Digital Video Broadcasting) subtitle data. This subtitle data is subtitle data for a two-dimensional image. The subtitle generation unit 114 constitutes a superimposition information data output unit.

The disparity information creating unit 115 performs a downsizing process on the disparity vector (horizontal disparity vector) for each pixel (pixel) extracted from the data extracting unit 111 or for a plurality of pixels. As shown, disparity information for each layer is generated. The disparity information does not necessarily have to be generated by the disparity information creating unit 115, and a configuration in which the disparity information is supplied separately from the outside is also possible.

FIG. 15 shows an example of data in the relative depth direction given as the luminance value of each pixel (pixel). Here, the data in the relative depth direction can be handled as a disparity vector for each pixel by a predetermined conversion. In this example, the luminance value of the person portion is high. This means that the value of the parallax vector of the person portion is large, and therefore, in stereoscopic image display, this means that the person portion is perceived as being raised. In this example, the luminance value of the background portion is low. This means that the value of the parallax vector in the background portion is small, and therefore, in stereoscopic image display, this means that the background portion is perceived as a sunken state.

FIG. 16 shows an example of a disparity vector for each block. The block corresponds to an upper layer of pixels (picture elements) located at the lowermost layer. This block is configured by dividing an image (picture) region into a predetermined size in the horizontal direction and the vertical direction. The disparity vector of each block is obtained, for example, by selecting the disparity vector having the largest value from the disparity vectors of all pixels (pixels) existing in the block. In this example, the disparity vector of each block is indicated by an arrow, and the length of the arrow corresponds to the magnitude of the disparity vector.

FIG. 17 shows an example of the downsizing process performed by the parallax information creating unit 115. First, as shown in FIG. 17A, the disparity information creating unit 115 obtains a signed disparity vector for each block using the disparity vector for each pixel (pixel). As described above, a block corresponds to an upper layer of pixels located at the lowest layer, and is configured by dividing an image (picture) region into a predetermined size in the horizontal direction and the vertical direction. For the disparity vector of each block, for example, the disparity vector having the smallest negative value or the smallest absolute value is selected from the disparity vectors of all the pixels (pixels) present in the block. It is obtained by.

Next, the disparity information creating unit 115 obtains a disparity vector for each group (Group Of Block) using the disparity vector for each block, as shown in FIG. A group is an upper layer of a block, and is obtained by grouping a plurality of adjacent blocks together. In the example of FIG. 17B, each group is composed of four blocks bounded by a broken line frame. The disparity vector of each group is obtained, for example, by selecting the disparity vector having the smallest value or the negative value having the largest absolute value from the disparity vectors of all the blocks in the group.

Next, the disparity information creating unit 115 obtains a disparity vector for each partition (Partition) using the disparity vector for each group as shown in FIG. The partition is an upper layer of the group and is obtained by grouping a plurality of adjacent groups together. In the example of FIG. 17C, each partition is configured by two groups bounded by a broken line frame. The disparity vector of each partition is obtained, for example, by selecting the disparity vector having the smallest negative value or the smallest absolute value from the disparity vectors of all the groups in the partition.

Next, the disparity information creating unit 115 obtains a disparity vector of the entire picture (entire image) located in the highest layer using the disparity vector for each partition, as shown in FIG. In the example of FIG. 17D, the entire picture includes four partitions that are bounded by a broken line frame. The disparity vector for the entire picture is obtained, for example, by selecting the disparity vector having the smallest negative value or the smallest absolute value from the disparity vectors of all partitions included in the entire picture.

In this way, the disparity information creating unit 115 performs the downsizing process on the disparity vector for each pixel (pixel) located in the lowest layer, and the disparity vectors of the respective regions in each layer of the block, group, partition, and entire picture Can be requested. In the example of the downsizing process shown in FIG. 17, finally, in addition to the pixel (pixel) layer, four layers of disparity vectors of blocks, groups, partitions, and pictures are obtained. However, the number of hierarchies, how to cut areas in each hierarchy, and the number of areas are not limited to this.

Returning to FIG. 10, the subtitle processing unit 116 can define the region of the subregion in the region based on the subtitle data generated by the subtitle generating unit 114. Further, the subtitle processing unit 116 sets parallax information for shift adjustment of the display position of the superimposition information in the left eye image and the right eye image based on the parallax information created by the parallax information creating unit 115. This disparity information can be set for each subregion or region, or for each page.

FIG. 18 (a) shows an example of a region defined on the screen and subregions defined in this region in the subtitle data. In this example, two sub-regions “SubRegion 1” and “SubRegion 2” are defined in region 0 (Region 0) where “Region_Starting Position” is R0. The horizontal position (Horizontal Position) x of “SubRegion 1” is SR1, and the horizontal position (Horizontal Position) x of “SubRegion 2” is SR2. In this example, the disparity information “disparity 1” is set for the subregion “SubRegion 1”, and the disparity information “disparity 2” is set for the subregion “SubRegion 2”.

FIG. 18B shows a shift adjustment example in the sub-region region in the left eye image based on the parallax information. Disparity information “disparity 1” is set for the subregion “SubRegion 1”. Therefore, for the subregion “SubRegionReg1”, shift adjustment is performed so that the horizontal position (HorizontalｘPosition) x becomes SR1−disparity 1. Also, disparity information “disparity 2” is set for the subregion “SubRegion 2”. Therefore, with respect to the subregion “SubRegion 2”, the shift adjustment is performed so that the horizontal position (Horizontal Position) x becomes SR2-disparity 2.

FIG. 18C illustrates an example of shift adjustment in the sub-region region in the right eye image based on disparity information. Disparity information “disparity 1” is set for the subregion “SubRegion 1”. Therefore, with respect to the subregion “SubRegion 1”, shift adjustment is performed so that the horizontal position (Horizontal Position) x becomes SR1 + disparity 1 in the opposite direction to the above left-eye image. Also, disparity information “disparity 2” is set for the subregion “SubRegion 2”. Therefore, with respect to the subregion “SubRegion 2”, the shift adjustment is performed so that the horizontal position (Horizontal Position) x becomes SR2 + disparity 向き 2 in the opposite direction to the left eye image.

The subtitle processing unit 116 outputs display control information such as region information and disparity information of the above-described subregion region together with the subtitle data generated by the subtitle generation unit 114. Note that disparity information can be set in units of subregions as described above, or in units of regions or pages.

The subtitle data is a segment of a TTML (Timed Text Markup Language) document (XML format). TTML is a markup language that can specify text display timing, display position (layout), display timing, and the like. In this embodiment, a segment of TTML-DSS (Disparity Signaling Segment) is further defined. Display control information such as the above-described parallax information is inserted into the TTML-DSS segment as an XML document based on the TTML format.

10, the subtitle encoder 118 generates a subtitle data stream (private data stream) including TTML segments of TTML and TTML-DSS. The multiplexer 119 converts each data stream from the video encoder 112, the audio encoder 113, and the subtitle encoder 118 into a file, and generates a Fragmented MP4 stream as a file. This Fragmented MP4 stream has a video data stream, an audio data stream, and a subtitle data stream.

FIG. 19 shows a configuration example of the subtitle data stream included in the Fragmented MP4 stream. Corresponding to this subtitle data stream, an adaptation set / representation element is described in MPD. An ID attribute (AdaptationSet / @ id) is defined for each adaptation set element.

The ID attribute of the adaptation set element corresponding to the first subtitle data stream including only the TTML segment and the ID attribute of the adaptation set element corresponding to the second subtitle data stream including the TTML-DSS segment in addition to the TTML segment It will be different. Thereby, it is indicated that the first subtitle data stream and the second subtitle data stream are separate services, and identification thereof is possible.

In this embodiment, the value of the ID attribute of the adaptation set element corresponding to the second subtitle data stream is a predetermined value determined in advance as the value of the ID attribute of the adaptation set element corresponding to the first subtitle data stream. It is the added value. Thereby, the first subtitle data stream and the second subtitle data stream are linked on the ID attribute of the adaptation set element.

The operation of the transmission data generation unit 110 shown in FIG. 10 will be briefly described. The left eye image data and right eye image data extracted from the data extraction unit 111 are supplied to the video encoder 112. In the video encoder 112, the left eye image data and the right eye image data are converted into stereoscopic image data of a predetermined transmission method (see FIGS. 12A to 12C). Then, the video encoder 112 performs encoding such as MPEG4-AVC, MPEG2, VC-1 on the stereoscopic image data, and generates a video data stream including the encoded video data. This video data stream is supplied to the multiplexer 119.

The audio data extracted by the data extraction unit 111 is supplied to the audio encoder 113. In the audio encoder 113, the audio data is encoded such as MPEG-2ＭＰＥＧAudio AAC or MPEG-4 AAC, and an audio data stream including the encoded audio data is generated. This audio data stream is supplied to the multiplexer 119.

The subtitle generator 114 generates subtitle data for a two-dimensional image. This subtitle data is supplied to the disparity information creating unit 115 and the subtitle processing unit 116.

The disparity vector for each pixel (pixel) extracted from the data extracting unit 111 is supplied to the disparity information creating unit 115. In the disparity information creating unit 115, downsizing processing is performed on disparity vectors for each pixel or for a plurality of pixels, and disparity information (disparity) of each layer is created. This disparity information is supplied to the subtitle processing unit 116.

In the subtitle processing unit 116, based on the subtitle data generated by the subtitle generation unit 114, for example, a subregion region is defined in the region. In addition, the subtitle processing unit 116 sets disparity information for shift adjustment of the display position of the superimposition information in the left eye image and the right eye image based on the disparity information created by the disparity information creating unit 115. In this case, the disparity information is set for each subregion, each region, or each page.

The subtitle data and display control information output from the subtitle processing unit 116 are supplied to the subtitle encoder 118. The display control information includes area information of the sub-region area, parallax information, and the like. The subtitle encoder 118 generates a subtitle data stream (private data stream) including TTML segments of TTML and TTML-DSS.

As described above, each data stream from the video encoder 112, the audio encoder 113, and the subtitle encoder 118 is supplied to the multiplexer 119. In the multiplexer 119, each data stream is converted into a file, and a Fragmented MP4 stream as a file is generated. This Fragmented MP4 stream has a video data stream, an audio data stream, and a subtitle data stream (private data stream).

FIG. 20 shows a configuration example of a Fragmented MP4 stream. Each FragmentedMP4 stream includes FragmentedMP4 obtained by packetizing the elementary stream. In this figure, for the sake of simplification of the drawing, illustration of portions related to video and audio is omitted.

In this configuration example, a FragmentedMP4 stream of the first subtitle data stream including only the TTML segment and a FragmentedMP4 stream of the second subtitle data stream including the TTML-DSS segment in addition to the TTML segment are shown. The ID attribute of the adaptation set element corresponding to each stream is different from each other and can be identified.

Each Fragmented MP4 stream has an adaptation set / representation element described in MPD corresponding to each Fragmented MP4 stream. Segments (Segment) listed (associated) under the representation element refer to the columns of stypsbox, sidx box, and fragmentedMP4 (moof and mdat) shown in the figure. A program unit is defined as a group of a plurality of adaptation sets.

In the adaptation set / representation element corresponding to the MPD subtitle data stream, information related to the subtitle data stream such as a subtitle language code is described. A subtitling type (subtitlingType) is introduced as one of information related to the subtitle data stream, and can be arranged as an adaptation set element attribute such as “AdaptationSet / @ subtitlingType”. FIG. 21 shows an example in which the original DASH-MPD schema is extended to introduce a subtitling type.

The subtitling type (subtitling_type) corresponding to the first subtitle data stream (FragmentedMP4 stream) is a value indicating a 2D subtitle, for example, “0x14” or “0x24” (see “component_type” in FIG. 22). Furthermore, the ISO (International Organization for Standardization) language code corresponding to the subtitle data stream is a lang attribute (in the example shown, AdaptationSet / @ lang) that is an attribute of the adaptation set element to indicate the language of the subtitle (caption). In the illustrated example, “eng” indicating English is set.

Also, the subtitling type (subtitling_type) corresponding to the second subtitle data stream (FragmentedMP4 stream) is a value indicating a 3D subtitle, for example, “0x15” or “0x25” (“component_type” in FIG. 22). reference). Furthermore, the ISO language code corresponding to the second subtitle data stream is set to “zxx” indicating a non-language, for example.

In the above description, the ISO language code corresponding to the second subtitle data stream is set to “zxx” indicating a non-language, for example. However, it may be possible to set the ISO language code corresponding to the second subtitle data stream so as to indicate the language of the subtitle (caption) in the same manner as the ISO language code corresponding to the first subtitle data stream.

As an ISO language code indicating a non-language, one of the language codes included in the space “qaa” to “qrz” of the ISO language code, or the language code “mis” or “und” may be used. Conceivable. For reference, FIG. 23 shows an excerpt of the ISO language code (ISO 639-2 Code) list.

FIG. 24A shows a configuration example of an adaptation set corresponding to the first subtitle data stream. This example is a language service example of English “eng”. “AdaptationSet / @ id” is set to “A1”. Further, “AdaptationSet / @ subtitlingType” is a value indicating a 2D subtitle in association with “AdaptationSet / @ id = A1”. Further, “AdaptationSet / @ lang” is set to “eng” indicating English in correspondence with “AdaptationSet / @ id = A1”.

FIG. 24B shows a configuration example of an adaptation set corresponding to the second subtitle data stream. “AdaptationSet / @ id” is set to “A2”. Further, “AdaptationSet / @ subtitlingType” is a value indicating a 3D subtitle in association with “AdaptationSet / @ id = A2”. Further, “AdaptationSet / @ lang” is set to “zxx” indicating a non-language in association with “AdaptationSet / @ id = A2”.

[Update parallax information]
As described above, disparity information is transmitted by the TTML-DSS segment included in the subtitle data stream. The update of the parallax information will be described.

25 and 26 show an example of updating disparity information using an interval period (Interval period). FIG. 25 shows a case where the interval period (Interval period) is fixed and the period is equal to the update period. That is, each update period of AB, BC, CD,... Consists of one interval period.

FIG. 26 is a general example and shows an example of updating disparity information when the interval period (Interval period) is a short period (for example, a frame period may be used). In this case, the number of interval periods is M, N, P, Q, and R in each update period. In FIG. 25 and FIG. 26, “A” indicates the start frame (start point) of the caption display period, and “B” to “F” indicate subsequent update frames (update point).

When sending disparity information that is sequentially updated within the caption display period to the receiving side (such as the set top box 200), the receiving side performs an interpolation process on the disparity information for each updating period, for example, an arbitrary frame interval, for example, It is possible to generate and use disparity information at intervals of one frame.

FIG. 27 shows a configuration example of the subtitle data stream. FIG. 27A illustrates an example in which a plurality of pieces of disparity information sequentially updated in the caption display period are included in one TTML-DSS segment and transmitted. This TTML-DSS segment exists only in the second subtitle data stream for 3D, and does not exist in the first subtitle data stream for 2D.

The time information (PTS) is generated from the information of the moof header of Fragmented MP4. Control information such as subtitle strings to be displayed in the subtitle display period after the start of the PTS and their display timing and style are stored in one TTML file and stored in mdat of FragmentedMP4. Each TTML segment is collectively transmitted before the start of the caption display period.

Note that a plurality of pieces of disparity information that are sequentially updated in the caption display period are divided into a plurality of TTML files, and each piece of the plurality of pieces of disparity information that is sequentially updated is included in one TTML-DSS segment. Box 200, etc.). In this case, a TTML-DSS segment is inserted into the subtitle data stream at every update timing.

FIG. 27B shows a configuration example of the subtitle data stream in that case. In this case, first, control information such as a subtitle column and a style displayed from one PTS timing to the next PTS timing is stored in one TTML file. Thereafter, at each update timing, the moof header includes parameters for generating time information PTSn, PTSn + 1,... At that timing, and TTML segments of TTML and TTML-DSS are transmitted by mdat.

FIG. 28 illustrates an example of disparity information update in the case where TTML-DSS segments are sequentially transmitted as illustrated in FIG. 27B. In FIG. 28, “A” indicates the start frame (start point) of the caption display period, and “B” to “F” indicate subsequent update frames (update point).

Even when the TTML-DSS segment is sequentially transmitted and the disparity information sequentially updated within the caption display period is transmitted to the reception side (such as the set top box 200), the reception side can perform the same processing as described above. is there. That is, in this case as well, on the receiving side, it is possible to generate and use disparity information at an arbitrary frame interval, for example, one frame interval, by performing an interpolation process on the disparity information for each update period.

FIG. 29 shows an example of updating disparity information (disparity) similar to FIG. 26 described above. The update frame interval is represented by a multiple of an interval period (ID: Interval Duration) as a unit period. For example, the update frame interval "Division Period 1" is represented by "ID * M", the update frame interval "Division Period 2" is represented by "ID * N", and the following update frame intervals are similarly represented. . In the example of updating disparity information shown in FIG. 29, the update frame interval is not fixed, and the update frame interval is set according to the disparity information curve.

Also, in this update example of disparity information (disparity), on the receiving side, the start frame (start time) T1_0 of the caption display period is calculated from the parameter of the moof header of FragmentedMP4 including this disparity information. Stamp). Then, on the receiving side, each update time of the disparity information is obtained based on information on interval periods (information on unit periods) that is information on each update frame interval and information on the number of the interval periods.

In this case, each update time is sequentially obtained from the start frame (start time) T1_0 of the caption display period based on the following equation (1). In this equation (1), “interval_count” indicates the number of interval periods, and is a value corresponding to M, N, P, Q, R, and S in FIG. In the equation (1), “interval_time” is a value corresponding to an interval period (ID: Interval Duration) in FIG.
Tm_n = Tm_ (n-1) + (interval_time * interval_count) (1)

For example, in the update example shown in FIG. 29, each update time is obtained as follows based on the equation (1). That is, the update time T1_1 is obtained as “T1_1 = T1_0 + (ID * M)” using the start time (T1_0), the interval period (ID), and the number (M). Further, the update time T1_2 is obtained as “T1_2 = T1_1 + (ID * N)” using the update time (T1_1), the interval period (ID), and the number (N). Each subsequent update time is obtained in the same manner.

In the update example shown in FIG. 29, on the reception side, interpolation processing is performed on disparity information that is sequentially updated within the caption display period, and disparity information at an arbitrary frame interval, for example, one frame interval within the caption display period. Generated and used. For example, the interpolation process is not a linear interpolation process but an interpolation process with a low-pass filter (LPF) process in the time direction (frame direction), so that the time direction of the disparity information at a predetermined frame interval after the interpolation process is performed. The change in (frame direction) is made gentle. A broken line a in FIG. 29 shows an example of LPF output.

FIG. 30 shows a display example of subtitles as subtitles. In this display example, the page region (Area for Page_default) includes two regions (Region 1 and Region 2) as subtitle display regions. A region includes one or more subregions. Here, it is assumed that the region includes one subregion, and the region region and the subregion region are equal.

FIG. 31 shows a case where each region includes disparity information and disparity information in units of pages as disparity information (Disparity) sequentially updated in the caption display period in the TTML-DSS segment. An example of the parallax information curve of the page is shown. Here, the parallax information curve of the page is configured to take the minimum value of the parallax information curves of the two regions.

Regarding region 1 (Region1), there are seven pieces of disparity information of T1_0 which is a start time and T1_1, T1_2, T1_3,..., T1_6 which are subsequent update times. For region 2 (Region2), there are eight pieces of disparity information of T2_0 which is a start time and T2_1, T2_2, T2_3,..., T2_7 which are update times thereafter. Further, regarding the page (Page_default), there are seven pieces of disparity information, that is, T0_0 that is a start time and T0_1, T0_2, T0_3,.

FIG. 32 shows in what data structure the disparity information of the page and each region shown in FIG. 31 is sent. FIG. 33 shows an example of a TTML-DSS document corresponding to the data structure. Hereinafter, the data structure shown in FIG. 31 will be described. In [], the elements / attributes of the corresponding TTML-DSS document are shown. Also, the correspondence between the data structure of FIG. 32 and the TTML-DSS document shown in FIG. 33 is indicated by a number in the circle.

First, the page layer will be explained. In this page layer, “page_default_disparity” [→ pageDefaultDisparityShift], which is a fixed value of disparity information, is arranged. For the disparity information sequentially updated in the caption display period, “interval_count” [→ intervalCount] indicating the number of interval periods corresponding to the start time and each subsequent update time, and “disparity_page_update” [ → disparityShiftUpdateIntegerPart] is arranged sequentially. The start time “interval_count” is set to “0”.

Next, the region layer will be described. For region 1 (subregion 1), “subregion_disparity_integer_part” [→ subregionDisparityShiftIntegerPart] and “subregion_disparity_fractional_part” [→ subregionDisparityShiftFractionPart], which are fixed values of disparity information, are arranged. Here, “subregion_disparity_integer_part” indicates an integer part of disparity information, and “subregion_disparity_fractional_part” indicates a decimal part of disparity information.

For the disparity information sequentially updated in the caption display period, “interval_count” indicating the number of interval periods corresponding to the start time and each subsequent update time, “disparity_region_update_integer_part” [→ disparityShiftUpdateIntegerPart] indicating disparity information, and “Disparity_region_update_fractional_part” [→ disparityShiftUpdateFractionPart] is sequentially arranged. Here, “disparity_region_update_integer_part” indicates an integer part of disparity information, and “disparity_region_update_fractional_part” indicates a decimal part of disparity information. The start time “interval_count” is set to “0”.

Region 2 (subregion 2) is the same as region 1 described above, and “subregion_disparity_integer_part” and “subregion_disparity_fractional_part”, which are fixed values of disparity information, are arranged. For the disparity information sequentially updated in the caption display period, “interval_count” indicating the number of interval periods corresponding to the start time and each subsequent update time, “disparity_region_update_integer_part” and “disparity_region_update_fractional_part” indicating disparity information are Are arranged sequentially.

In the example of the TTML-DSS document in FIG. 33, the value of [→DU.set/@dur] corresponding to the interval period “interval_duration” is “D”. For example, this value designates an interval period (Interval Duration) (see FIG. 29) as a unit period in units of 90 KHz. For example, this value is a value obtained by measuring the interval period (Interval Duration) with a 90 KHz clock in a 24-bit length.

The reason why the PTS calculated from the parameter of the fragment header of FragmentedMP4 is 33 bits long is 24 bits long for the following reason. That is, a time exceeding 24 hours can be expressed with a 33-bit length, but this interval period (Interval Duration) within the caption display period is an unnecessary length. In addition, by using 24 bits, the data size can be reduced and compact transmission can be performed. Further, 24 bits are 8 × 3 bits, and byte alignment is easy.

Note that when a region has a plurality of subregions divided in the horizontal direction, the TTML-DSS document includes information on tts: origin and tts: extent by the number of subregions. The first value of tts: origin indicates the leftmost pixel position of the subregion. The first value of tts: extent indicates the subregion extent in pixels.

FIG. 34 shows a TTML schema defined by W3C. In this embodiment, as shown in FIGS. 35 and 36, in order to describe parameters related to disparity information (Disparity), the parts of Q1 and Q2 are extended from the TTML specification of W3C. FIG. 37 shows a schema definition (ttaf1-dfxp-du-attribs.xsd) newly added for extending the Q1 portion. FIG. 38 shows a schema definition (ttaf1-dfxp-du.xsd) newly added for extending the Q2 portion.

[Broadcast reception concept]
FIG. 39 shows a broadcast reception concept when the set-top box 200 and the television receiver 300 are 3D-compatible devices. In this case, in the broadcasting station 100, the sub-region “SR 00” is defined in the region “Region 0”, and the disparity information “Disparity 1” is set. Here, it is assumed that the region “Region 0” and the sub-region “SR 00” are the same region. In response to a request from the receiving side, a video data stream, a subtitle data stream (second subtitle data stream), and the like are transmitted from the broadcasting station 100 to the receiving side.

First, a description will be given of the case of reception by the set-top box 200 that is a 3D-compatible device. The set top box 200 requests the broadcast station 100 to transmit a subtitle data stream (second subtitle data stream) including a TTML-DSS segment based on the MPD file. The set top box 200 reads the data of each TTML segment constituting the subtitle data from the second subtitle data stream and reads and uses the data of the TTML-DSS segment including display control information such as disparity information.

In this case, the set top box 200 recognizes the adaptation set element corresponding to the second subtitle data stream in the MPD file based on the ID attribute or the like, and transmits the second subtitle data stream to the broadcasting station 100. Request appropriately. In addition to the ID attribute, the set-top box 200 can further increase the recognition degree of being an adaptation set element corresponding to the second subtitle data stream by using the subtitle type information and the language information.

The set top box 200 generates region display data for displaying the subtitle based on the subtitle data. Then, the set-top box 200 obtains output stereoscopic image data by superimposing the region display data on the left-eye image frame (frame0) portion and the right-eye image frame (frame1) portion constituting the stereoscopic image data.

At this time, the set top box 200 shifts and adjusts the position of the display data superimposed on each based on the parallax information. The set-top box 200 corresponds to the transmission format of stereoscopic image data (side-by-side method, top-and-bottom method, frame-sequential method, or format method in which each view has a full screen size). The superimposition position, size, etc. are changed as appropriate.

The set-top box 200 transmits the output stereoscopic image data obtained as described above to the 3D-compatible television receiver 300 through, for example, an HDMI digital interface. The television receiver 300 performs 3D signal processing on the stereoscopic image data sent from the set-top box 200, and generates left-eye image data and right-eye image data on which subtitles are superimposed. Then, the television receiver 300 displays binocular parallax images (a left-eye image and a right-eye image) for allowing the user to recognize a stereoscopic image on a display panel such as an LCD.

Next, a case where the television receiver 300 that is a 3D compatible device receives the signal will be described. The television receiver 300 requests the broadcast station 100 to transmit a subtitle data stream (second subtitle data stream) including a TTML-DSS segment based on the MPD file. The television receiver 300 reads the data of each TTML segment constituting the subtitle data from the second subtitle data stream, and reads and uses the data of the TTML-DSS segment including display control information such as disparity information.

In this case, similarly to the set top box 200 described above, the television receiver 300 recognizes the adaptation set element corresponding to the second subtitle data stream in the MPD file based on the ID attribute or the like, and notifies the broadcasting station 100 of the adaptation set element. , Appropriately request transmission of the second subtitle data stream. Note that the television receiver 300 can increase the recognition degree of the adaptation set element corresponding to the second subtitle data stream by using the subtitle type information and the language information together with the ID attribute.

The television receiver 300 generates region display data for displaying the subtitle based on the subtitle data. Then, the television receiver 300 superimposes the region display data on the left-eye image data and the right-eye image data obtained by performing processing according to the transmission format on the stereoscopic image data, and the subtitle is superimposed on the left. Data of an eye image and a right eye image is generated. Then, the television receiver 300 displays binocular parallax images (a left-eye image and a right-eye image) for allowing the user to recognize a stereoscopic image on a display panel such as an LCD.

FIG. 40 shows a broadcast reception concept when the set-top box 200 and the television receiver 300 are legacy 2D-compatible devices. Also in this case, in the broadcasting station 100, the subregion “SR 00” is defined in the region “Region 0”, and the disparity information “Disparity 1” is set. In response to a request from the receiving side, a video data stream, a subtitle data stream (first subtitle data stream), and the like are transmitted from the broadcasting station 100 to the receiving side.

First, the case where the signal is received by the set top box 200 which is a legacy 2D-compatible device will be described. The set top box 200 requests the broadcast station 100 to transmit a subtitle data stream (first subtitle data stream) including only the TTML segment based on the MPD file. The set top box 200 reads and uses data of each TTML segment constituting the subtitle data from the first subtitle data stream.

In this case, the set top box 200 recognizes the adaptation set element corresponding to the first subtitle data stream based on the ID attribute or the like in the MPD file, and transmits the first subtitle data stream to the broadcasting station 100. Request appropriately. In addition to the ID attribute, the set top box 200 can further increase the degree of recognition that it is an adaptation set element corresponding to the first subtitle data stream by using the subtitle type information and language information.

The set top box 200 generates region display data for displaying the subtitle based on the subtitle data. Then, the set-top box 200 obtains output two-dimensional image data by superimposing the region display data on the two-dimensional image data obtained by processing the stereoscopic image data according to the transmission format.

The set top box 200 transmits the output two-dimensional image data obtained as described above to the television receiver 300 through, for example, an HDMI digital interface. The television receiver 300 displays a two-dimensional image based on the two-dimensional image data sent from the set top box 200.

Next, the case where the signal is received by the television receiver 300 which is a legacy 2D compatible device will be described. The television receiver 300 requests the broadcast station 100 to transmit a subtitle data stream including only the TTML segment (first subtitle data stream) based on the MPD file. The television receiver 300 reads and uses the data of each TTML segment constituting the subtitle data from the first subtitle data stream.

In this case, similarly to the set top box 200 described above, the television receiver 300 recognizes the adaptation set element corresponding to the first subtitle data stream in the MPD file based on the ID attribute or the like, and notifies the broadcasting station 100 of the adaptation set element. , Appropriately request transmission of the first subtitle data stream. Note that the television receiver 300 can increase the recognition degree of the adaptation set element corresponding to the first subtitle data stream by using the subtitle type information and the language information together with the ID attribute.

The television receiver 300 generates region display data for displaying the subtitle based on the subtitle data. Then, the television receiver 300 obtains output two-dimensional image data by superimposing the region display data on the two-dimensional image data obtained by processing the stereoscopic image data according to the transmission format. Then, the television receiver 300 displays a two-dimensional image based on the two-dimensional image data.

FIG. 41 shows a broadcast reception concept when the above-described receiver (set top box 200, television receiver 300) is a legacy 2D compatible device (2D2Receiver) and a 3D compatible device (3D Receiver). Yes. In this figure, the transmission method of stereoscopic image data (3D image data) is a side-by-side (Side By Side) method.

In 3D compatible devices (3D Receiver), 3D mode (3D mode) or 2D mode (2D mode) can be selected. When the 3D mode (3D mode) is selected by the user, it is as described above with reference to FIG.

On the other hand, when the 2D mode (2D mode) is selected by the user, in the 3D-compatible device (3D mode), for example, based on the segment URL added to each TTML segment from the received second subtitle data stream. Thus, only the data of each TTML segment constituting the subtitle data is read and used. Others are the same as those of the 2D-compatible device (2D Receiver) described with reference to FIG. In this case, the segment URL constitutes identification information for identifying the TTML segment and the TTML-DSS segment.

FIG. 42 also shows other broadcast reception concepts when the above-described receiver (set top box 200, television receiver 300) is a legacy 2D-compatible device (2D Receiver) and when it is a 3D-compatible device (3D Receiver). Show. In this figure, stereoscopic image data (3D image data) is H.264. An example of transmission using the H.264 / MVC (Multi-view-Video Coding) scheme is shown. In this case, for example, the left eye image data is transmitted as base view image data, and the right eye image data is transmitted as non-base view image data. Although detailed description is omitted, the operations of the legacy 2D-compatible device (2D Receiver) and the 3D-compatible device (3D Receiver) in this case are the same as the example shown in FIG.

10 transmits the first subtitle data stream or the second subtitle data stream as a subtitle data stream (FragmentedMP4 stream) in response to a request from the receiving side. The first subtitle data stream includes only each TTML segment constituting the subtitle data. Further, the second subtitle data stream includes a TTML-DSS segment including display control information such as disparity information together with each TTML segment constituting the subtitle data.

Therefore, the legacy 2D-compatible receiving device on the receiving side can obtain only the subtitle data satisfactorily by sending the first subtitle data stream. In addition, the 3D-compatible receiving apparatus can efficiently and appropriately acquire the disparity information corresponding to the subtitle data by receiving the second subtitle data stream.

In this case, an ID attribute is defined as an attribute of the adaptation set (AdaptationSet) element described in the MPD corresponding to each subtitle data stream, and further, a subtitling type attribute and a language attribute are defined. Therefore, the receiving side 2D-compatible receiving device or 3D-compatible receiving device can appropriately recognize the adaptation set element necessary for itself based on these attributes, and the transmitting side can transmit an appropriate subtitle data stream. Can request.

In addition, since the transmission data generation unit 110 illustrated in FIG. 10 can transmit a TTML-DSS segment including disparity information that is sequentially updated in the subtitle display period, the display positions of the left eye subtitle and the right eye subtitle are dynamically controlled. it can. As a result, on the receiving side, the parallax provided between the left eye subtitle and the right eye subtitle can be dynamically changed in conjunction with the change in the image content. In this case, the disparity information of the frame for each update frame interval is not the offset value from the previous disparity information but the disparity information itself. Therefore, even if an error occurs in the interpolation process on the receiving side, it is possible to recover from the error within a certain delay time.

[Description of Set Top Box]
Returning to FIG. 9, the set-top box 200 makes a request to the broadcasting station 100 and receives a Fragmented MP4 stream from the broadcasting station 100. This Fragmented MP4 stream includes stereoscopic image data and audio data including left eye image data and right eye image data. The Fragmented MP4 stream also includes subtitle data for displaying a subtitle (caption).

The received Fragmented MP4 stream has a video data stream, an audio data stream, and a subtitle data stream (private data stream). When the set top box 200 is a legacy 2D-compatible device, the subtitle data stream is a first subtitle data stream including only each TTML segment constituting the subtitle data. On the other hand, when the set-top box 200 is a 3D-compatible device, the subtitle data stream includes second TTML-DSS segments including display control information such as disparity information together with each TTML segment constituting the subtitle data. It becomes a stream.

The set top box 200 has a bit stream processing unit 201. When the set-top box 200 is a 3D-compatible device (3D STB), the bit stream processing unit 201 acquires stereoscopic image data, audio data, and subtitle data (including display control information) from the Fragmented MP4 stream.

Then, the bit stream processing unit 201 uses the stereoscopic image data and the subtitle data (including display control information) to superimpose the subtitles on the left eye image frame (frame0) portion and the right eye image frame (frame1) portion, respectively. Output stereoscopic image data is generated (see FIG. 39). In this case, parallax can be given between the subtitle (left eye subtitle) superimposed on the left eye image and the subtitle (right eye subtitle) superimposed on the right eye image.

For example, as described above, the display control information sent from the broadcast station 100 includes disparity information, and disparity can be given between the left eye subtitle and the right eye subtitle based on the disparity information. . In this manner, by providing parallax between the left eye subtitle and the right eye subtitle, the user can recognize the subtitle (caption) in front of the image.

When determining that the set-top box 200 is a 3D service, the set-top box 200 acquires data of each TTML segment constituting the subtitle data from the second subtitle data stream, and receives data of the TTMLDSSS segment including display control information such as disparity information. get. Then, the set top box 200 uses the subtitle data and the disparity information to perform processing (superimposition processing) for pasting the subtitle to the background image as described above. When disparity information cannot be acquired, the bit stream processing unit 201 performs processing (superimposition processing) for pasting a subtitle (caption) to a background image according to the logic of the receiver.

For example, the set top box 200 may be MPD / AdaptationSet / Role / @ schemeIdURI = ”urn: mpeg: dash: 14496: 10: frame_packing_arrangement_type: 2011” or MPD / AdaptationSet / Role / @ schemeIdURI = ”urn: mpeg: dash: 13818 In the case of a 3D format such as “1: 1: stereo_video_format_type: 2011”, it is determined to be a 3D service.

FIG. 43A shows a display example of a subtitle (caption) on an image. In this display example, captions are superimposed on an image composed of a background and a foreground object. FIG. 43B shows the perspective of the background, the foreground object, and the subtitle, and indicates that the subtitle is recognized at the forefront.

FIG. 44 (a) shows a display example of subtitles (captions) on the same image as FIG. 43 (a). FIG. 44B shows a left-eye caption LGI superimposed on the left-eye image and a right-eye caption RGI superimposed on the right-eye image. FIG. 44 (c) shows that a parallax is given between the left-eye caption LGI and the right-eye caption RGI because the caption is recognized most forward.

When the set-top box 200 is a legacy 2D-compatible device (2D STB), the bit stream processing unit 201 extracts stereoscopic image data, audio data, subtitle data (a bit map that does not include display control information) from the Fragmented MP4 stream. Pattern data). Then, the bit stream processing unit 201 uses the stereoscopic image data and the subtitle data to generate 2D image data on which the subtitle (caption) is superimposed (see FIG. 40).

[Configuration example of set-top box]
A configuration example of the set top box 200 will be described. FIG. 45 shows a configuration example of the set top box 200. The set-top box 200 includes a bit stream processing unit 201, an HDMI terminal 202, a network interface 204, a video signal processing circuit 205, an HDMI transmission unit 206, and an audio signal processing circuit 207. The set-top box 200 includes a CPU 211, a flash ROM 212, a DRAM 213, an internal bus 214, a remote control receiver (RC receiver) 215, and a remote control transmitter (RC transmitter) 216. Yes.

The network interface 204 makes a request to the broadcast station 100 based on the MPD file, and receives a Fragmented MP4 stream (bit stream data) corresponding to the user's selected channel. Based on this Fragmented MP4 stream, the bit stream processing unit 201 outputs image data and audio data on which the subtitle is superimposed.

When the set top box 200 is a 3D-compatible device (3D STB), the bit stream processing unit 201 acquires stereoscopic image data, audio data, and subtitle data (including display control information) from the Fragmented MP4 stream. Then, the bit stream processing unit 201 generates output stereoscopic image data in which subtitles are respectively superimposed on the left eye image frame (frame0) portion and the right eye image frame (frame1) portion constituting the stereoscopic image data (see FIG. 39). ).

At this time, the bit stream processing unit 201 gives disparity between the subtitle (left eye subtitle) to be superimposed on the left eye image and the subtitle (right eye subtitle) to be superimposed on the right eye image based on the disparity information. That is, the bit stream processing unit 201 generates region display data for displaying a subtitle, based on the subtitle data. Then, the bit stream processing unit 201 superimposes the region display data on the left-eye image frame (frame0) portion and the right-eye image frame (frame1) portion constituting the stereoscopic image data, and obtains output stereoscopic image data. . At this time, the bit stream processing unit 201 shifts and adjusts the position of the display data to be superimposed on each based on the disparity information.

If the set top box 200 is a 2D-compatible device (2D2STB), the bit stream processing unit 201 acquires stereoscopic image data, audio data, and subtitle data (not including display control information). The bit stream processing unit 201 uses the stereoscopic image data and the subtitle data to generate two-dimensional image data on which the subtitle is superimposed (see FIG. 40).

That is, the bit stream processing unit 201 generates region display data for displaying a subtitle based on the subtitle data. Then, the bit stream processing unit 201 superimposes the region display data on the two-dimensional image data obtained by processing the stereoscopic image data according to the transmission format to obtain output two-dimensional image data. .

The video signal processing circuit 205 performs image quality adjustment processing on the image data obtained by the bit stream processing unit 201 as necessary, and supplies the processed image data to the HDMI transmission unit 206. The audio signal processing circuit 207 performs sound quality adjustment processing or the like on the audio data output from the bit stream processing unit 201 as necessary, and supplies the processed audio data to the HDMI transmission unit 206.

The HDMI transmitting unit 206 transmits, for example, uncompressed image data and audio data from the HDMI terminal 202 by communication conforming to HDMI. In this case, since transmission is performed using an HDMI TMDS channel, image data and audio data are packed and output from the HDMI transmission unit 206 to the HDMI terminal 202.

The CPU 211 controls the operation of each part of the set top box 200. The flash ROM 212 stores control software and data. The DRAM 213 constitutes a work area for the CPU 211. The CPU 211 develops software and data read from the flash ROM 212 on the DRAM 213 to activate the software, and controls each part of the set top box 200.

The RC receiver 215 receives the remote control signal (remote control code) transmitted from the RC transmitter 216 and supplies it to the CPU 211. The CPU 211 controls each part of the set top box 200 based on the remote control code. The CPU 211, flash ROM 212 and DRAM 213 are connected to the internal bus 214.

The operation of the set top box 200 will be briefly described. The network interface 204 makes a request to the broadcast station 100 based on the MPD file, and receives a Fragmented MP4 stream (bit stream data) corresponding to the user's selected channel. This Fragmented MP4 stream is supplied to the bit stream processing unit 201. Based on this Fragmented MP4 stream, the bit stream processing unit 201 obtains image data and audio data on which subtitles are superimposed. In this case, output image data is generated as follows.

When the set top box 200 is a 3D compatible device (3D STB), the bit stream processing unit 201 acquires stereoscopic image data, audio data, and subtitle data (including display control information) from the Fragmented MP4 stream. The bit stream processing unit 201 generates output stereoscopic image data in which subtitles are superimposed on the left-eye image frame (frame0) portion and the right-eye image frame (frame1) portion constituting the stereoscopic image data. At this time, based on the parallax information, parallax is given between the left-eye subtitle superimposed on the left-eye image and the right-eye subtitle superimposed on the right-eye image.

If the set top box 200 is a 2D-compatible device (2D2STB), the bit stream processing unit 201 acquires stereoscopic image data, audio data, and subtitle data (not including display control information). In the bit stream processing unit 201, two-dimensional image data on which the subtitle is superimposed is generated using the stereoscopic image data and the subtitle data.

The output image data obtained by the bit stream processing unit 201 is supplied to the video signal processing circuit 205. In the video signal processing circuit 205, image quality adjustment processing or the like is performed on the output image data as necessary. The processed image data output from the video signal processing circuit 205 is supplied to the HDMI transmission unit 206.

The audio data obtained by the bit stream processing unit 201 is supplied to the audio signal processing circuit 207. The audio signal processing circuit 207 performs processing such as sound quality adjustment processing on the audio data as necessary. The processed audio data output from the audio signal processing circuit 207 is supplied to the HDMI transmission unit 206. The image data and audio data supplied to the HDMI transmission unit 206 are transmitted from the HDMI terminal 202 to the HDMI cable 400 through the HDMI TMDS channel.

[Configuration example of bit stream processing unit]
FIG. 46 shows a configuration example of the bit stream processing unit 201 when the set top box 200 is a 3D-compatible device (3D STB). The bit stream processing unit 201 has a configuration corresponding to the transmission data generation unit 110 shown in FIG. The bit stream processing unit 201 includes a demultiplexer 221, a video decoder 222, and an audio decoder 229.

The bit stream processing unit 201 includes an encoded data buffer 223, a subtitle decoder 224, a pixel buffer 225, a disparity information interpolation unit 226, a position control unit 227, and a video superimposing unit 228. Here, the encoded data buffer 223 constitutes a decode buffer.

The demultiplexer 221 extracts the video data stream and audio data stream packets from the Fragmented MP4 stream, and sends them to each decoder for decoding. Further, the demultiplexer 221 further extracts a subtitle data stream (second subtitle data stream) and temporarily stores it in the encoded data buffer 223.

The video decoder 222 performs processing opposite to that of the video encoder 112 of the transmission data generation unit 110 described above. That is, the video decoder 222 reconstructs a video data stream from the video packets extracted by the demultiplexer 221 and performs decoding processing to obtain stereoscopic image data including left eye image data and right eye image data. The transmission format of the stereoscopic image data is, for example, a side-by-side method, a top-and-bottom method, a frame-sequential method, or a video transmission format method in which each view occupies a full screen size.

The subtitle decoder 224 performs processing opposite to that of the subtitle encoder 125 of the transmission data generation unit 110 described above. That is, the subtitle decoder 224 performs a decoding process on the subtitle data stream stored in the encoded data buffer 223 to acquire data of the following segments. That is, the subtitle decoder 224 obtains data of each TTML segment constituting the subtitle data from the subtitle data stream and obtains data of a TTML-DSS segment including display control information such as disparity information.

The subtitle decoder 224 generates region display data (bitmap data) for displaying the subtitle, based on the data of each TTML segment constituting the subtitle data and the region information of the subregion. Here, a transparent color is assigned to an area in the region that is not surrounded by the sub-region. The pixel buffer 225 temporarily stores this display data.

The video superimposing unit 228 obtains output stereoscopic image data Vout. In this case, the video superimposing unit 228 is stored in the pixel buffer 225 in the left eye image frame (frame0) portion and the right eye image frame (frame1) portion of the stereoscopic image data obtained by the video decoder 222, respectively. Superimpose display data. In this case, the video superimposing unit 228 appropriately superimposes the position, size, etc. depending on the transmission method of the stereoscopic image data (side-by-side method, top-and-bottom method, frame-sequential method, MVC method, etc.). Make changes. The video superimposing unit 228 outputs the output stereoscopic image data Vout to the outside of the bit stream processing unit 201.

The parallax information interpolation unit 226 sends the parallax information obtained by the subtitle decoder 224 to the position control unit 227. The disparity information interpolation unit 226 performs interpolation processing on the disparity information as necessary, and sends the information to the position control unit 227. The position control unit 227 shifts and adjusts the position of the display data superimposed on each frame based on the parallax information (see FIG. 39). In this case, the position control unit 227 displays display data (caption pattern data) superimposed on the left eye image frame (frame0) portion and the right eye image frame (frame1) portion in opposite directions based on the disparity information. The shift is adjusted as described above to give parallax.

Note that the display control information includes disparity information that is commonly used within the caption display period. Further, the display control information may further include disparity information that is sequentially updated within the caption display period. As described above, the disparity information that is sequentially updated within the caption display period is composed of the disparity information of the first frame in the caption display period and the disparity information of the frame at each subsequent update frame interval.

The position control unit 227 uses the disparity information that is commonly used in the caption display period as it is. On the other hand, regarding the disparity information sequentially updated within the caption display period, the position control unit 227 uses information that has been subjected to interpolation processing as necessary by the disparity information interpolation unit 226. For example, the disparity information interpolation unit 226 generates disparity information at an arbitrary frame interval within the caption display period, for example, one frame interval.

The disparity information interpolation unit 226 performs not the linear interpolation process as the interpolation process but, for example, an interpolation process with a low-pass filter (LPF) process in the time direction (frame direction). Thereby, the change in the time direction (frame direction) of the disparity information at the predetermined frame interval after the interpolation processing becomes gentle.

Also, the audio decoder 229 performs a process reverse to that of the audio encoder 113 of the transmission data generation unit 110 described above. That is, the audio decoder 229 reconstructs an audio elementary stream from the audio packet extracted by the demultiplexer 221 and performs a decoding process to obtain output audio data Aout. The audio decoder 229 outputs the output audio data Aout to the outside of the bit stream processing unit 201.

The operation of the bit stream processing unit 201 shown in FIG. 46 will be briefly described. The Fragmented MP4 stream received by the network interface 204 (see FIG. 45) is supplied to the demultiplexer 221. In the demultiplexer 221, a video data stream and an audio data stream are extracted from the Fragmented MP4 stream and supplied to each decoder. Also, in this demultiplexer 221, a subtitle data stream (second subtitle data stream) is extracted from the Fragmented MP4 stream and temporarily stored in the encoded data buffer 223.

The video decoder 222 performs a decoding process on the video data stream extracted by the demultiplexer 221 to obtain stereoscopic image data including left eye image data and right eye image data. The stereoscopic image data is supplied to the video superimposing unit 228.

In the subtitle decoder 224, the subtitle data stream is read from the encoded data buffer 223 and decoded. The subtitle decoder 224 generates region display data (bitmap data) for displaying the subtitle based on the data of each TTML segment constituting the subtitle data and the region information of the subregion. This display data is temporarily stored in the pixel buffer 225.

In the video superimposing unit 228, display data stored in the pixel buffer 225 is respectively stored in the left eye image frame (frame0) portion and the right eye image frame (frame1) portion of the stereoscopic image data obtained by the video decoder 222. Superimposed. In this case, the superimposition position, the size, and the like are appropriately changed according to the transmission method of the stereoscopic image data (side-by-side method, top-and-bottom method, frame-sequential method, MVC method, etc.). The output stereoscopic image data Vout obtained by the video superimposing unit 228 is output to the outside of the bit stream processing unit 201.

Also, the disparity information obtained by the subtitle decoder 224 is sent to the position control unit 227 through the disparity information interpolation unit 226. In the parallax information interpolation unit 226, interpolation processing is performed as necessary. For example, with respect to disparity information at several frame intervals that are sequentially updated within the caption display period, the disparity information interpolation unit 226 performs interpolation processing as necessary, and disparity information at an arbitrary frame interval, for example, one frame interval is obtained. Generated.

In the position control unit 227, the display data (caption pattern data) superimposed on the left eye image frame (frame0) portion and the right eye image frame (frame1) portion by the video superimposing unit 228 in directions opposite to each other based on the disparity information. Shift adjustment is performed so that Thereby, parallax is provided between the left eye subtitle displayed in the left eye image and the right eye subtitle displayed in the right eye image. Therefore, 3D display of the subtitle (caption) according to the content of the stereoscopic image is realized.

Also, in the audio decoder 229, the audio elementary stream extracted by the demultiplexer 221 is decoded, and the audio data Aout corresponding to the display stereoscopic image data Vout is obtained. The audio data Aout is output to the outside of the bit stream processing unit 201.

FIG. 47 shows a configuration example of the bit stream processing unit 201 when the set top box 200 is a 2D-compatible device (2D STB). In FIG. 47, portions corresponding to those in FIG. 46 are denoted by the same reference numerals, and detailed description thereof is omitted. Hereinafter, for convenience of description, the bit stream processing unit 201 illustrated in FIG. 46 is referred to as a 3D-compatible bit stream processing unit 201, and the bit stream processing unit 201 illustrated in FIG. 47 is referred to as a 2D-compatible bit stream processing unit 201. .

In the 3D-compatible bit stream processing unit 201 illustrated in FIG. 46, the video decoder 222 performs a decoding process on the video data stream extracted by the demultiplexer 221, and includes a stereoscopic image including left-eye image data and right-eye image data. Get image data. On the other hand, in the 2D-compatible bit stream processing unit 201 shown in FIG. 47, the video decoder 222 obtains stereoscopic image data, and then extracts left-eye image data or right-eye image data, and performs scaling processing or the like as necessary. To obtain 2D image data.

Also, in the 3D-compatible bit stream processing unit 201 shown in FIG. 46, the subtitle decoder 224 reads the subtitle data stream (second subtitle data stream) from the encoded data buffer 223 and decodes it. Thereby, the subtitle decoder 224 acquires data of each TTML segment constituting the subtitle data and also acquires data of a TTML-DSS segment including display control information such as disparity information.

On the other hand, in the 2D-compatible bit stream processing unit 201 shown in FIG. 47, the subtitle decoder 224 reads and decodes the subtile data stream (first subtitle data stream). Thereby, the subtitle decoder 224 acquires only data of each TTML segment constituting the subtitle data. Then, the subtitle decoder 224 generates region display data (bitmap data) for displaying the subtitle based on the data of each TTML segment, and temporarily stores it in the pixel buffer 225.

46, the video superimposing unit 228 obtains the output stereoscopic image data Vout and outputs it to the outside of the bit stream processing unit 201. In this case, the display data accumulated in the pixel buffer 225 is superimposed on the left eye image frame (frame0) portion and the right eye image frame (frame1) portion of the stereoscopic image data obtained by the video decoder 222, respectively. The output stereoscopic image data Vout is obtained. Then, the position control unit 227 shifts the display data so as to be in opposite directions based on the parallax information, and the right eye displayed in the left eye subtitle and the right eye image displayed in the left eye image. Parallax is given to the subtitle.

On the other hand, in the 2D-compatible bit stream processing unit 201 shown in FIG. 47, the video superimposing unit 228 superimposes the display data accumulated in the pixel buffer 225 on the two-dimensional image data obtained by the video decoder 222, and outputs it. Two-dimensional image data Vout is obtained. Then, the video superimposing unit 228 outputs the output two-dimensional image data Vout to the outside of the bit stream processing unit 201.

The operation of the 2D bitstream processing unit 201 shown in FIG. 47 will be briefly described. The operation of the audio system is the same as that of the 3D bit stream processing unit 201 shown in FIG.

The Fragmented MP4 stream received by the network interface 204 (see FIG. 45) is supplied to the demultiplexer 221. In the demultiplexer 221, a video data stream and an audio data stream are extracted from the Fragmented MP4 stream and supplied to each decoder. Also, in this demultiplexer 221, a subtitle data stream (first subtitle data stream) is extracted from the Fragmented MP4 stream and temporarily stored in the encoded data buffer 223.

The video decoder 222 performs a decoding process on the video data stream extracted by the demultiplexer 221 to obtain stereoscopic image data including left eye image data and right eye image data. In the video decoder 222, left-eye image data or right-eye image data is further cut out from the stereoscopic image data, and subjected to scaling processing or the like as necessary to obtain two-dimensional image data. The two-dimensional image data is supplied to the video superimposing unit 228.

In the subtitle decoder 224, the subtitle data stream is read from the encoded data buffer 223 and decoded. Then, in the subtitle decoder 224, region display data (bitmap data) for displaying the subtitle is generated based on the data of each TTML segment. This display data is temporarily stored in the pixel buffer 225.

The video superimposing unit 228 superimposes subtitle display data (bitmap data) accumulated in the pixel buffer 225 on the two-dimensional image data obtained by the video decoder 222, and obtains output two-dimensional image data Vout. . The output two-dimensional image data Vout is output to the outside of the bit stream processing unit 201.

45, a request is made to the transmission side in the set top box 200, and the first subtitle data stream or the second subtitle data stream is received as the subtitle data stream (FragmentedMP4 stream). That is, when the set-top box 200 is a legacy 2D-compatible receiving device, the set-top box 200 can receive the first subtitle data stream including only each TTML segment constituting the subtitle data, and obtains only the subtitle data satisfactorily. it can.

Further, when the set-top box 200 is a 3D-compatible receiving device, the second subtitle data includes a TTML-DSS segment including display control information such as disparity information together with each TTML segment constituting the subtitle data. Stream can be received. Therefore, the set top box 200 can efficiently and accurately acquire the paratitle information corresponding to the subtitle data.

In the case of a 3D-compatible receiving apparatus in the set top box 200 shown in FIG. 45, the second subtitle data stream received by the network interface 204 includes display control information in addition to stereoscopic image data and subtitle data. included. This display control information includes display control information (subregion region information, parallax information, and the like). Therefore, parallax can be given to the display position of the left-eye subtitle and the right-eye subtitle, and in the display of the subtitle (caption), the consistency of perspective with each object in the image can be maintained in an optimal state. It becomes.

45, the display control information acquired by the subtitle decoder 224 of the 3D-compatible bitstream processing unit 201 (see FIG. 46) includes disparity information that is sequentially updated within the caption display period. In this case, the display positions of the left eye subtitle and the right eye subtitle can be dynamically controlled. Thereby, the parallax provided between the left eye subtitle and the right eye subtitle can be dynamically changed in conjunction with the change of the image content.

45, the disparity information sequentially updated within the caption display period (a predetermined number of frame periods) by the disparity information interpolation unit 226 of the 3D bitstream processing unit 201 (see FIG. 46) is set. Interpolation processing is performed on disparity information of a plurality of frames constituting the frame. In this case, even when disparity information is transmitted from the transmission side every update frame interval, the disparity provided between the left eye subtitle and the right eye subtitle is controlled at a fine interval, for example, for each frame. Is possible.

Also, in the set top box 200 shown in FIG. 45, the interpolation processing in the disparity information interpolation unit 226 of the 3D bitstream processing unit 201 (see FIG. 46) involves, for example, low-pass filter processing in the time direction (frame direction). You can also Therefore, even when disparity information is transmitted from the transmission side every update frame interval, the change in the time direction of the disparity information after the interpolation process can be gently performed, and is provided between the left eye subtitle and the right eye subtitle. It is possible to suppress a sense of incongruity due to discontinuity in the disparity transitions at every update frame interval.

Although not described above, a configuration in which the set-top box 200 is a 3D-compatible device and the user can select the 2D display mode or the 3D display mode is also conceivable. In this case, when the three-dimensional display mode is selected, the bit stream processing unit 201 has the same configuration and operation as the 3D-compatible bit stream processing unit 201 (see FIG. 46) described above.

When the 2D display mode is selected, the bitstream processing unit 201 has substantially the same configuration and operation as the 2D-compatible bitstream processing unit 201 (see FIG. 47) described above. In this case, the bit stream processing unit 201 reads only the data of each TTML segment constituting the subtitle data from the received second subtitle data stream, for example, based on the segment URL added to each TTML segment. Use.

[Description of TV receiver]
Returning to FIG. 1, when the television receiver 300 is a 3D-compatible device, the television receiver 300 receives stereoscopic image data sent from the set-top box 200 via the HDMI cable 400. The television receiver 300 includes a 3D signal processing unit 301. The 3D signal processing unit 301 performs processing (decoding processing) corresponding to the transmission format on the stereoscopic image data to generate left-eye image data and right-eye image data.

[Configuration example of TV receiver]
A configuration example of the 3D-compatible television receiver 300 will be described. FIG. 48 illustrates a configuration example of the television receiver 300. The television receiver 300 includes a 3D signal processing unit 301, an HDMI terminal 302, an HDMI receiving unit 303, a network interface 305, and a bit stream processing unit 306.

The television receiver 300 includes a video / graphic processing circuit 307, a panel drive circuit 308, a display panel 309, an audio signal processing circuit 310, an audio amplification circuit 311, and a speaker 312. The television receiver 300 includes a CPU 321, a flash ROM 322, a DRAM 323, an internal bus 324, a remote control receiver (RC receiver) 325, and a remote control transmitter (RC transmitter) 326. Yes.

The network interface 305 makes a request to the broadcast station 100 based on the MPD file, and receives a Fragmented MP4 stream (bit stream data) corresponding to the user's selected channel. Based on this Fragmented MP4 stream, the bit stream processing unit 306 outputs image data and audio data on which the subtitle is superimposed.

The bit stream processing unit 306 is not described in detail, but has the same configuration as the 3D-compatible bit stream processing unit 201 (see FIG. 46) of the set-top box 200 described above, for example. The bit stream processing unit 306 combines the display data of the left eye subtitle and the right eye subtitle with the stereoscopic image data, and generates and outputs output stereoscopic image data on which the subtitle is superimposed.

Note that the bit stream processing unit 306 performs scaling processing, for example, when the transmission format of the stereoscopic image data is a side-by-side method or a top-and-bottom method, and performs full-resolution left-eye image data and Outputs right eye image data. The bit stream processing unit 306 outputs audio data corresponding to the image data.

The HDMI receiving unit 303 receives uncompressed image data and audio data supplied to the HDMI terminal 302 via the HDMI cable 400 by communication conforming to HDMI. The HDMI receiving unit 303 has a version of, for example, HDMI 1.4a, and can handle stereoscopic image data.

The 3D signal processing unit 301 performs a decoding process on the stereoscopic image data received by the HDMI receiving unit 303 to generate full-resolution left-eye image data and right-eye image data. The 3D signal processing unit 301 performs a decoding process corresponding to the TMDS transmission data format. Note that the 3D signal processing unit 301 does nothing with the full-resolution left-eye image data and right-eye image data obtained by the bit stream processing unit 306.

The video / graphic processing circuit 307 generates image data for displaying a stereoscopic image based on the left eye image data and right eye image data generated by the 3D signal processing unit 301. The video / graphic processing circuit 307 performs image quality adjustment processing on the image data as necessary.

Also, the video / graphic processing circuit 307 synthesizes superimposition information data such as a menu and a program guide with the image data as necessary. The panel drive circuit 308 drives the display panel 309 based on the image data output from the video / graphic processing circuit 307. The display panel 309 includes, for example, an LCD (Liquid Crystal Display), a PDP (Plasma Display Panel), an organic EL display (organic electroluminescence display), and the like.

The audio signal processing circuit 310 performs necessary processing such as D / A conversion on the audio data received by the HDMI receiving unit 303 or obtained by the bit stream processing unit 306. The audio amplification circuit 311 amplifies the audio signal output from the audio signal processing circuit 310 and supplies the amplified audio signal to the speaker 312.

The CPU 321 controls the operation of each unit of the television receiver 300. The flash ROM 322 stores control software and data. The DRAM 323 constitutes a work area for the CPU 321. The CPU 321 develops software and data read from the flash ROM 322 on the DRAM 323 to activate the software, and controls each unit of the television receiver 300.

The RC receiver 325 receives the remote control signal (remote control code) transmitted from the RC transmitter 326 and supplies it to the CPU 321. The CPU 321 controls each part of the television receiver 300 based on the remote control code. The CPU 321, flash ROM 322, and DRAM 323 are connected to the internal bus 324.

The operation of the television receiver 300 shown in FIG. The HDMI receiving unit 303 receives stereoscopic image data and audio data transmitted from the set top box 200 connected to the HDMI terminal 302 via the HDMI cable 400. The stereoscopic image data received by the HDMI receiving unit 303 is supplied to the 3D signal processing unit 301. The audio data received by the HDMI receiving unit 303 is supplied to the audio signal processing circuit 310.

The network interface 305 makes a request to the broadcast station 100 based on the MPD file, and receives a Fragmented MP4 stream (bit stream data) corresponding to the user's selected channel. This Fragmented MP4 stream is supplied to the bit stream processing unit 306.

The bit stream processing unit 306 obtains output stereoscopic image data and audio data on which a subtitle is superimposed based on a video data stream, an audio data stream, and a subtitle data stream. In this case, the display data of the left eye subtitle and the right eye subtitle is combined with the stereoscopic image data, and output stereoscopic image data (full resolution left eye image data and right eye image data) on which the subtitle is superimposed is generated. The The output stereoscopic image data is supplied to the video / graphic processing circuit 307 through the 3D signal processing unit 301.

In the 3D signal processing unit 301, the stereoscopic image data received by the HDMI receiving unit 303 is decoded, and full-resolution left-eye image data and right-eye image data are generated. The left eye image data and right eye image data are supplied to the video / graphic processing circuit 307. In the video / graphic processing circuit 307, image data for displaying a stereoscopic image is generated based on the left eye image data and the right eye image data, and image quality adjustment processing, OSD (on-screen display) is performed as necessary. The superimposing information data is synthesized.

The image data obtained by the video / graphic processing circuit 307 is supplied to the panel drive circuit 308. Therefore, a stereoscopic image is displayed on the display panel 309. For example, the left eye image based on the left eye image data and the right eye image based on the right eye image data are alternately displayed on the display panel 309 in a time division manner. For example, the viewer can see only the left eye image with the left eye and the right eye with the shutter glasses by alternately opening the left eye shutter and the right eye shutter in synchronization with the display on the display panel 309. Only the right eye image can be seen, and a stereoscopic image can be perceived.

Also, the audio data obtained by the bit stream processing unit 306 is supplied to the audio signal processing circuit 310. In the audio signal processing circuit 310, necessary processing such as D / A conversion is performed on the audio data received by the HDMI receiving unit 303 or the audio data obtained by the bit stream processing unit 306. The audio data is amplified by the audio amplification circuit 311 and then supplied to the speaker 312. Therefore, sound corresponding to the display image on the display panel 309 is output from the speaker 312.

Note that FIG. 48 shows the 3D-compatible television receiver 300 as described above. Although detailed description is omitted, the legacy 2D-compatible television receiver has almost the same configuration. However, in the case of a legacy 2D-compatible television receiver, the bit stream processing unit 306 has the same configuration and operation as the 2D-compatible bit stream processing unit 201 shown in FIG. 47 described above. Further, in the case of a legacy 2D-compatible television receiver, the 3D signal processing unit 301 is not necessary.

Also, a configuration in which the user can select the 2D display mode or the 3D display mode in the 3D-compatible television receiver 300 is also conceivable. In that case, when the three-dimensional display mode is selected, the bit stream processing unit 306 has the same configuration and operation as described above.

On the other hand, when the two-dimensional display mode is selected, the bit stream processing unit 306 has substantially the same configuration and operation as the 2D-compatible bit stream processing unit 201 (see FIG. 47) described above. In this case, the bit stream processing unit 306 reads only the data of each TTML segment constituting the subtitle data from the received second subtitle data stream, for example, based on the segment URL added to each TTML segment. Use.

<2. Modification>
In the above-described embodiment, an example in which only one language service of English “eng” exists is shown (see FIG. 24). However, it goes without saying that the present technology can be applied to multilingual services as well. Hereinafter, an example will be described in which there are two language services, for example, a first language service (1st Language Service) of English “eng” and a second language service (2nd Language Service) of German “ger”.

FIG. 49 shows a configuration example of the subtitle data stream included in the Fragmented MP4 stream. Corresponding to this subtitle data stream, an adaptation set / representation element is described in MPD. An ID attribute (AdaptationSet / @ id) is defined for each adaptation set element.

The ID attribute of the adaptation set element corresponding to the first 2D subtitle data stream including only the TTML segment related to the first language service (English “eng”) is “@ id = PID1-1”. Further, the ID attribute of the adaptation set element corresponding to the second 2D subtitle data stream including only the TTML segment related to the second language service (German “ger”) is “@ id = PID2-1”.

The ID attribute of the adaptation set element corresponding to the first 3D subtitle data stream including the TTML segment and the TTML-DSS segment related to the first language service (English “eng”) is “@ id = PID1_2”. . Further, the ID attribute of the adaptation set element corresponding to the second 3D subtitle data stream including only the TTML segment related to the second language service (German “ger”) is “@ id = PID2_2_2”.

The receiving side can identify these adaptation set elements in the MPD file based on the ID attribute, and can request reception of a necessary subtitle data stream. For example, if the receiving side is a 2D-compatible device and the first language service (English “eng”) is selected, the segment information related to the adaptation set element whose ID attribute is “@ id = PID1-1” is displayed. The first 2D subtitle data stream can be received by using the reception request. Also, for example, when the second language service (German “ger”) is selected on the receiving side and the 3D-compatible device is selected, the segment information related to the adaptation set element whose ID attribute is “@ id = PID2_2” is displayed. The second 3D subtitle data stream can be received by using the reception request.

Here, the ID attribute value of the adaptation set element corresponding to the second 2D subtitle data stream is obtained by adding a predetermined value to the ID attribute value of the adaptation set element corresponding to the first 2D subtitle data stream. Value. As a result, the first and second 2D subtitle data streams are linked on the ID attribute of the adaptation set element. Similarly, the ID attribute value of the adaptation set element corresponding to the second 3D subtitle data stream is obtained by adding a predetermined value to the ID attribute value of the adaptation set element corresponding to the first 3D subtitle data stream. Value. As a result, the first and second 3D subtitle data streams are linked on the ID attribute of the adaptation set element.

FIG. 50 shows a configuration example of a Fragmented MP4 stream. Each FragmentedMP4 stream includes FragmentedMP4 obtained by packetizing the elementary stream. In this figure, for the sake of simplification of the drawing, illustration of portions related to video and audio is omitted.

In this configuration example, the Fragmented MP4 stream of the first and second 2D subtitle data streams including only the TTML segment is shown. In this configuration example, the Fragmented MP4 stream of the first and second 3D subtitle data streams including the TTML-DSS segment in addition to the TTML segment is shown. The ID attribute of the adaptation set element corresponding to each stream is different from each other and can be identified as described above with reference to FIG.

The subtitling type (subtitlingType) is introduced as one of the information related to the subtitle data stream, and “AdaptationSet / @ subtitlingTyp” is arranged as an attribute of the adaptation set element. The subtitling type (subtitling_type) corresponding to the first and second 2D subtitle data streams (FragmentedMP4 streams) is a value indicating a 2D subtitle, for example, “0x14” or “0x24” ((FIG. 22 Also, the subtitling type (subtitling_type) corresponding to the first and second 3D subtitle data stream (FragmentedMP4 stream) is a value indicating a 3D subtitle, for example, “0x15” or “0x25”. (See “component_type” in FIG. 22).

Furthermore, in the ISO (International Organization for Standardization) language code corresponding to the subtitle data stream, a lang attribute which is an attribute of the adaptation set element is set so as to indicate the language of the subtitle (caption). The lang attribute corresponding to the first 2D subtitle data stream is set to “eng” indicating English. The lang attribute corresponding to the second 2D subtitle data stream is set to “ger” indicating German.

Note that the ISO language code corresponding to the first and second 3D subtitle data streams is set to, for example, “zxx” indicating a non-language. However, any of the language codes included in the space “qaa” to “qrz” of the ISO language code or the language code “mis” or “und” may be used as the ISO language code indicating non-language. Possible (see FIG. 23). It is also conceivable to set the ISO language code corresponding to the first and second 3D subtitle data streams so as to indicate the language of the subtitle (caption) as in the first and second 2D subtitle data streams.

In the above embodiment, when the receiving side is a 2D-compatible device, a 2D subtitle data stream including only a TTML segment is received as a subtitle data stream based on the MPD file, and data of each TTML segment is received from this stream. Explained that it was taken out and used.

However, even when the receiving side is a 2D-compatible device, it is possible to receive a 3D subtitle data stream including a TTML segment and a TTML-DSS segment as a subtitle data stream, and extract and use the data of each TTML segment from this stream. It is done. In this case, for example, only the data of each TTML segment constituting the subtitle data is read and used from the 3D subtitle data stream, for example, based on the segment URL added to each TTML segment.

In the above-described embodiment, the stream distribution system 10 includes the broadcasting station 100, the set-top box 200, and the television receiver 300 (see FIG. 9). However, the television receiver 300 includes a bit stream processing unit 306 that functions in the same manner as the bit stream processing unit 201 in the set top box 200, as shown in FIG. Therefore, as shown in FIG. 51, a stream distribution system 10A including a broadcasting station 100 and a television receiver 300 is also conceivable.

In the above embodiment, the set-top box 200 and the television receiver 300 are connected via an HDMI digital interface. However, even when these are connected by a digital interface similar to the HDMI digital interface (including wireless as well as wired), the present technology can be similarly applied.

Further, in the above-described embodiment, the information that handles the subtitle (caption) is shown as the superimposition information. However, other information such as superimposition information such as graphics information and text information is encoded so that what is divided into the basic stream and the additional stream is output in association with it. The present technology can be applied similarly.

Moreover, this technique can also take the following structures.
(1) an image data output unit that outputs left-eye image data and right-eye image data constituting a stereoscopic image;
A superimposition information data output unit for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A parallax information output unit for outputting parallax information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data and providing parallax;
In response to a request from the receiving side, a video data stream including the image data is transmitted and a first private data stream including the superimposition information data or a second data including the superimposition information data and the disparity information And a data transmission unit for transmitting the private data stream.
(2) The data transmission unit
A distribution server,
The transmission device according to (1), wherein each of the data streams is distributed to a reception side through a network.
(3) a metafile generation unit that generates a metafile having information for the receiver to acquire each data stream;
The transmission device according to (2), further including: a metafile transmission unit that transmits the metafile to the reception side through the network in response to a request from the reception side.
(4) First identification information is added to the first metafile corresponding to the first private data stream, and the first metafile corresponding to the second private data stream is set to the first metafile. The transmission apparatus according to (3), wherein second identification information different from the identification information is added.
(5) The transmission device according to (4), wherein the first identification information corresponding to the first private data stream and the second identification information corresponding to the second private data stream have a unique relationship. .
(6) First type information indicating the first type is further added to the first metafile, and a second type different from the first type is added to the second metafile. The transmission apparatus according to (4) or (5), further including second type information indicating that:
(7) First language information indicating a predetermined language is further added to the first metafile, and second language information indicating a non-language is further added to the second metafile. 4) The transmission device according to any one of (6).
(8) Each data stream is an MPEG-DASH based data stream,
The metafile is an MPD file,
The transmission device according to any one of (3) to (7), wherein the network is a CDN.
(9) an image data output step for outputting left eye image data and right eye image data constituting a stereoscopic image;
A superimposition information data output step for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A disparity information output step for outputting disparity information for shifting the superimposition information to be superimposed on the image based on the left eye image data and the right eye image data to give disparity;
In response to a request from the receiving side, a video data stream including the image data is transmitted and a first private data stream including the superimposition information data or a second data including the superimposition information data and the disparity information A data transmission step of transmitting a private data stream of the transmission method.
(10) A superimposition information to be superposed on a video data stream including left eye image data and right eye image data constituting a stereoscopic image, and an image based on the left eye image data and the right eye image data by making a request to the transmission side Including the first private data stream including the above-described data, or the superimposition information data and the parallax information for shifting the superimposition information superimposed on the image based on the left-eye image data and the right-eye image data to give the parallax. A data receiver for receiving the second private data stream;
A first decoding unit for decoding the video data stream;
A receiving apparatus comprising: a second decoding unit that decodes the first private data stream or the second private data stream.
(11) The data receiving unit
The receiving device according to (10), wherein each data stream is received from a distribution server included in the transmitting side through a network.
(12) a metafile receiving unit that receives a metafile having information for acquiring each data stream;
The receiving device according to (11), wherein the data receiving unit makes the request to the transmitting side based on the metafile.
(13) The first identification information is added to the first metafile corresponding to the first private data stream, and the first metafile corresponding to the second private data stream is set to the first metafile. The receiving apparatus according to (12), wherein second identification information different from the identification information is added.
(14) Each of the data streams is an MPEG-DASH based data stream,
The metafile is an MPD file,
The receiving apparatus according to (12) or (13), wherein the network is a CDN.
(15) A request is made to the transmission side, and a video stream including left-eye image data and right-eye image data constituting a stereoscopic image, and superposition superimposed on an image based on the left-eye image data and the right-eye image data Disparity information for giving disparity by shifting the first private data stream including information data, or the superimposition information data and the superimposition information superimposed on the image by the left eye image data and the right eye image data. A data receiving step for receiving a second private data stream comprising:
A first decoding step for decoding the video data stream;
And a second decoding step of decoding the first private data stream or the second private data stream.
(16) an image data output unit that outputs left-eye image data and right-eye image data constituting a stereoscopic image;
A superimposition information data output unit for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A parallax information output unit for outputting parallax information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data and providing parallax;
In response to a request from the receiving side, the video data stream including the image data is transmitted, and a data transmission unit that transmits the data of the superimposition information and the private data stream including the disparity information,
In the private data stream, first identification information is added to the data of the superimposition information, and second identification information different from the first identification information is added to the parallax information.
(17) The data transmission unit
A distribution server,
The transmission device according to (16), wherein each of the data streams is distributed to a reception side through a network.
(18) a metafile generating unit that generates a metafile having information for the receiving side to acquire each of the data streams;
The transmission device according to (17), further comprising: a metafile transmission unit that transmits the metafile to the reception side through the network in response to a request from the reception side.
(19) A superimposition information to be superposed on a video data stream including left eye image data and right eye image data constituting a stereoscopic image, and an image based on the left eye image data and the right eye image data by making a request to the transmission side A data receiver that receives the data and the private data stream including disparity information for shifting the superimposition information to be superimposed on the image by the left eye image data and the right eye image data,
A first decoding unit for decoding the video data stream;
A second decoding unit for decoding the private data stream,
In the private data stream, first identification information is added to the superimposition information data, and second identification information different from the first identification information is added to the parallax information,
The second decoding unit acquires the superimposition information data or the superimposition information data and the disparity information from the private data stream based on the first identification information and the second identification information. apparatus.
(20) The data reception unit receives the data streams from the distribution server on the transmission side through the network,
A metafile receiving unit for receiving a metafile having information for acquiring each data stream;
The receiving device according to (19), wherein the data receiving unit makes the request to the transmitting side based on the metafile.

The main feature of this technology is that an adaptation set element corresponding to each MPEG-DASH-based 2D and 3D subtitle data stream can be identified by an ID attribute in the MPD file, and a 2D subtitle data stream or 3D subtitle data is received on the receiving side. This means that the stream can be selectively received (see FIGS. 19 and 20).

10, 10A: Stream distribution system 11: DASH segment streamer 12: DASHMPD server 13-1 to 13-N: IPTV client 14: CDN
DESCRIPTION OF SYMBOLS 15 ... Content management server 100 ... Broadcasting station 111 ... Data extraction part 112 ... Video encoder 113 ... Audio encoder 114 ... Subtitle generation part 115 ... Disparity information preparation part 116 ... Subtitle processing unit 118 ... Subtitle encoder 119 ... Multiplexer 131 ... Streaming data control unit 132 ... HTTP access unit 133 ... Movie playback unit 200 ... Set top box (STB)
DESCRIPTION OF SYMBOLS 201 ... Bit stream processing part 202 ... HDMI terminal 204 ... Network interface 205 ... Video signal processing circuit 206 ... HDMI transmission part 207 ... Audio signal processing circuit 211 ... CPU
215: Remote control receiver 216: Remote control transmitter 221 ... Demultiplexer 222 ... Video decoder 223 ... Encoded data buffer 224 ... Subtitle decoder 225 ... Pixel buffer 226 .... Parallax information interpolation unit 227 ... Position control unit 228 ... Video superimposition unit 229 ... Audio decoder 300 ... Television receiver (TV)
301 ... 3D signal processing unit 302 ... HDMI terminal 303 ... HDMI receiving unit 305 ... network interface 306 ... bit stream processing unit 307 ... video / graphic processing circuit 308 ... panel drive Circuit 309 ... Display panel 310 ... Audio signal processing circuit 311 ... Audio amplification circuit 312 ... Speaker 400 ... HDMI cable

Claims

An image data output unit for outputting left-eye image data and right-eye image data constituting a stereoscopic image;
A superimposition information data output unit for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A parallax information output unit for outputting parallax information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data and providing parallax;
In response to a request from the receiving side, a video data stream including the image data is transmitted and a first private data stream including the superimposition information data or a second data including the superimposition information data and the disparity information And a data transmission unit for transmitting the private data stream.
The data transmitter is
A distribution server,
The transmission device according to claim 1, wherein each of the data streams is distributed to a reception side through a network.
A metafile generation unit that generates a metafile having information for the receiver to acquire each data stream;
The transmission device according to claim 2, further comprising: a metafile transmission unit that transmits the metafile to the reception side through the network in response to a request from the reception side.
First identification information is added to the first metafile corresponding to the first private data stream, and the first identification information and the second metafile corresponding to the second private data stream are The transmitting apparatus according to claim 3, wherein different second identification information is added.
The transmission apparatus according to claim 4, wherein the first identification information corresponding to the first private data stream and the second identification information corresponding to the second private data stream have a unique relationship.
First type information indicating the first type is further added to the first metafile, and the second metafile is a second type different from the first type. The transmission apparatus according to claim 4, wherein second type information indicating is further added.
The first language information indicating a predetermined language is further added to the first metafile, and second language information indicating a non-language is further added to the second metafile. Transmitter.
Each of the above data streams is an MPEG-DASH based data stream,
The metafile is an MPD file,
The transmission device according to claim 3, wherein the network is a CDN.
An image data output step for outputting left-eye image data and right-eye image data constituting a stereoscopic image;
A superimposition information data output step for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A disparity information output step for outputting disparity information for shifting the superimposition information to be superimposed on the image based on the left eye image data and the right eye image data to give disparity;
In response to a request from the receiving side, a video data stream including the image data is transmitted and a first private data stream including the superimposition information data or a second data including the superimposition information data and the disparity information A data transmission step of transmitting a private data stream of the transmission method.
A request is made to the transmission side, and a video data stream including left-eye image data and right-eye image data constituting a stereoscopic image, and superimposition information data to be superimposed on an image based on the left-eye image data and the right-eye image data are transmitted. A second private data stream including parallax information for shifting the superimposition information to be superposed on the first private data stream or the superimposition information data and the left eye image data and the right eye image data to add the parallax. A data receiver for receiving the private data stream;
A first decoding unit for decoding the video data stream;
A receiving apparatus comprising: a second decoding unit that decodes the first private data stream or the second private data stream.
The data receiver is
The receiving device according to claim 10, wherein the data streams are received through a network from a distribution server included in the transmitting side.
A metafile receiving unit for receiving a metafile having information for acquiring each data stream;
The receiving device according to claim 11, wherein the data receiving unit makes the request to the transmitting side based on the metafile.
First identification information is added to the first metafile corresponding to the first private data stream, and the first identification information and the second metafile corresponding to the second private data stream are The receiving apparatus according to claim 12, wherein different second identification information is added.
Each of the above data streams is an MPEG-DASH based data stream,
The metafile is an MPD file,
The receiving device according to claim 12, wherein the network is a CDN.
A request is made to the transmission side, and a video data stream including left eye image data and right eye image data constituting a stereoscopic image, and data of superimposition information to be superimposed on an image based on the left eye image data and the right eye image data Including a first private data stream including: or second disparity information for providing disparity by shifting the superimposition information data and the superimposition information superimposed on the image based on the left eye image data and the right eye image data. A data receiving step for receiving a private data stream of
A first decoding step for decoding the video data stream;
And a second decoding step of decoding the first private data stream or the second private data stream.
An image data output unit for outputting left-eye image data and right-eye image data constituting a stereoscopic image;
A superimposition information data output unit for outputting superimposition information data to be superimposed on an image based on the left eye image data and the right eye image data;
A parallax information output unit for outputting parallax information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data and providing parallax;
In response to a request from the receiving side, the video data stream including the image data is transmitted, and a data transmission unit that transmits the data of the superimposition information and the private data stream including the disparity information,
In the private data stream, first identification information is added to the data of the superimposition information, and second identification information different from the first identification information is added to the parallax information.
The data transmitter is
A distribution server,
The transmission device according to claim 16, wherein each of the data streams is distributed to a reception side through a network.
A metafile generation unit that generates a metafile having information for the receiver to acquire each data stream;
The transmission device according to claim 17, further comprising: a metafile transmission unit that transmits the metafile to the reception side through the network in response to a request from the reception side.
Making a request to the transmission side, a video data stream including left-eye image data and right-eye image data constituting a stereoscopic image, superimposition information data to be superimposed on an image based on the left-eye image data and the right-eye image data, and A data receiving unit that receives a private data stream including disparity information for shifting the superimposition information to be superimposed on the image based on the left-eye image data and the right-eye image data;
A first decoding unit for decoding the video data stream;
A second decoding unit for decoding the private data stream,
In the private data stream, first identification information is added to the superimposition information data, and second identification information different from the first identification information is added to the parallax information,
The second decoding unit acquires the superimposition information data or the superimposition information data and the disparity information from the private data stream based on the first identification information and the second identification information. apparatus.
The data receiving unit receives each data stream from a distribution server of the transmitting side through a network,
A metafile receiving unit for receiving a metafile having information for acquiring each data stream;
The receiving device according to claim 19, wherein the data receiving unit makes the request to the transmitting side based on the metafile.