JP2020025325A

JP2020025325A - Receiver, transmitter, and data processing method

Info

Publication number: JP2020025325A
Application number: JP2019197479A
Authority: JP
Inventors: 北里　直久; Naohisa Kitazato; 直久北里; 山岸　靖明; Yasuaki Yamagishi; 靖明山岸; 武敏山根; Taketoshi Yamane
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-13

Abstract

To provide a receiver capable of displaying captions at desired timing.SOLUTION: The receiver acquires a piece of caption information relevant to captions, which are transmitted on a broadcast wave of digital broadcasting, and control information which includes a piece of selection information for selecting a specific mode from plural modes for specifying a timing to display of the captions. The receiver controls the display of the captions in accordance with the caption information at a display timing corresponding to a specific mode based on a piece of selection information included in the control information. This technique is applicable to, for example, a television receiver conforming to ATSC 3.0.SELECTED DRAWING: Figure 1

Description

本技術は、受信装置、送信装置、及び、データ処理方法に関し、特に、所望のタイミングで字幕を表示させることができるようにした受信装置、送信装置、及び、データ処理方法に関する。 The present technology relates to a receiving device, a transmitting device, and a data processing method, and particularly to a receiving device, a transmitting device, and a data processing method that can display subtitles at desired timing.

映像に対して字幕を重畳して表示するための方式として、表示タイミングや表示位置等を指定可能なマークアップ言語であるTTML(Timed Text Markup Language)が知られている(例えば、特許文献１参照)。TTMLは、W3C(World Wide Web Consortium)により標準化されている。 As a method for superimposing and displaying subtitles on a video, TTML (Timed Text Markup Language), which is a markup language capable of specifying display timing, display position, and the like, is known (for example, see Patent Document 1). ). TTML is standardized by the W3C (World Wide Web Consortium).

特開２０１２−１６９８８５号公報JP 2012-169885 A

ところで、TTML等を用いた字幕を表示するための技術方式は確立されておらず、所望のタイミングで字幕を表示させるための提案が要請されていた。 By the way, a technical method for displaying subtitles using TTML or the like has not been established, and a proposal for displaying subtitles at desired timing has been requested.

本技術はこのような状況に鑑みてなされたものであり、所望のタイミングで字幕を表示させることができるようにするものである。 The present technology has been made in view of such a situation, and is to enable subtitles to be displayed at a desired timing.

本技術の第１の側面の受信装置は、デジタル放送の放送波を受信する受信部と、前記放送波で伝送される、字幕に関する字幕情報と、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報を取得する取得部と、前記制御情報に含まれる前記選択情報に基づいて、前記特定のモードに応じた表示のタイミングで、前記字幕情報に応じた前記字幕の表示を制御する制御部とを備える受信装置である。 The receiving device according to the first aspect of the present technology includes a receiving unit that receives a broadcast wave of a digital broadcast, a plurality of subtitle information related to subtitles transmitted by the broadcast wave, and a plurality of subtitle information for specifying a display timing of the subtitle. An acquisition unit that acquires control information including selection information for selecting a specific mode among the modes, and a display timing according to the specific mode based on the selection information included in the control information. , A control unit that controls the display of the caption according to the caption information.

本技術の第１の側面の受信装置は、独立した装置であってもよいし、１つの装置を構成している内部ブロックであってもよい。また、本技術の第１の側面のデータ処理方法は、上述した本技術の第１の側面の受信装置に対応するデータ処理方法である。 The receiving device according to the first aspect of the present technology may be an independent device, or may be an internal block configuring one device. Further, the data processing method according to the first aspect of the present technology is a data processing method corresponding to the above-described receiving device according to the first aspect of the present technology.

本技術の第１の側面の受信装置及びデータ処理方法においては、デジタル放送の放送波で伝送される、字幕に関する字幕情報と、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報が取得され、前記制御情報に含まれる前記選択情報に基づいて、前記特定のモードに応じた表示のタイミングで、前記字幕情報に応じた前記字幕の表示が制御される。 In the receiving device and the data processing method according to the first aspect of the present technology, the caption information related to captions transmitted by a broadcast wave of a digital broadcast and a plurality of modes for specifying the timing of display of the captions are included. Control information including selection information for selecting a specific mode is acquired, and based on the selection information included in the control information, at a display timing according to the specific mode, the subtitle information according to the subtitle information is obtained. The display of subtitles is controlled.

本技術の第２の側面の送信装置は、字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報を生成する生成部と、前記字幕に関する字幕情報とともに、前記制御情報を、デジタル放送の放送波で送信する送信部とを備える送信装置である。 A transmission device according to a second aspect of the present technology, a generation unit that generates control information including selection information for selecting a specific mode from among a plurality of modes for specifying a subtitle display timing, A transmission unit that transmits the control information in the form of a digital broadcast wave together with the subtitle information relating to the subtitles.

本技術の第２の側面の送信装置は、独立した装置であってもよいし、１つの装置を構成している内部ブロックであってもよい。また、本技術の第２の側面のデータ処理方法は、上述した本技術の第２の側面の送信装置に対応するデータ処理方法である。 The transmission device according to the second aspect of the present technology may be an independent device, or may be an internal block configuring one device. Further, a data processing method according to the second aspect of the present technology is a data processing method corresponding to the above-described transmission device according to the second aspect of the present technology.

本技術の第２の側面の送信装置及びデータ処理方法においては、デジタル放送の放送波で字幕に関する字幕情報とともに伝送される、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報が生成される。 In the transmission device and the data processing method according to the second aspect of the present technology, in the transmission device and the data processing method, in the broadcast wave of the digital broadcast, identification of a plurality of modes for specifying the display timing of the subtitles, which is transmitted together with the subtitle information on the subtitles Control information including selection information for selecting the mode is generated.

本技術の第１の側面及び第２の側面によれば、所望のタイミングで字幕を表示させることができる。 According to the first and second aspects of the present technology, captions can be displayed at desired timing.

なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれかの効果であってもよい。 Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

本技術を適用した伝送システムの一実施の形態の構成を示す図である。FIG. 1 is a diagram illustrating a configuration of an embodiment of a transmission system to which the present technology is applied. 本実施例のプロトコルスタックを示す図である。FIG. 3 is a diagram illustrating a protocol stack according to the embodiment. MPDファイルの構造を示す図である。FIG. 3 is a diagram showing the structure of an MPD file. MPDファイルの記述例を示す図である。FIG. 3 is a diagram illustrating a description example of an MPD file. 多重化ストリームを表現する場合におけるAdaptationSet要素とRepresentation要素の関係を示す図である。FIG. 10 is a diagram illustrating a relationship between an AdaptationSet element and a Representation element when expressing a multiplexed stream. 多重化ストリームを表現する場合におけるAdaptationSet要素と、Representation要素と、SubRepresentation要素の関係を示す図である。FIG. 4 is a diagram illustrating a relationship among an AdaptationSet element, a Representation element, and a SubRepresentation element when expressing a multiplexed stream. AdaptationSet要素に含めることが可能な属性や要素の例を示す図である。FIG. 4 is a diagram illustrating an example of attributes and elements that can be included in an AdaptationSet element. Representation要素に含めることが可能な属性や要素の例を示す図である。FIG. 3 is a diagram illustrating an example of attributes and elements that can be included in a Representation element. MP4のファイルフォーマットを示す図である。FIG. 3 is a diagram showing an MP4 file format. TTML処理モードの例を示す図である。It is a figure showing an example of TTML processing mode. モード１での運用が行われる場合のMPDファイルの記述例を示す図である。FIG. 8 is a diagram illustrating an example of a description of an MPD file when operation in mode 1 is performed. モード１での運用が行われる場合の字幕の表示タイミングの例を示す図である。FIG. 11 is a diagram illustrating an example of subtitle display timing when operation in mode 1 is performed. モード２−１での運用が行われる場合のMPDファイルの記述例を示す図である。FIG. 14 is a diagram illustrating an example of a description of an MPD file when operation in mode 2-1 is performed. モード２−１での運用が行われる場合の字幕の表示タイミングの例を示す図である。It is a figure which shows the example of the display timing of a subtitle in case operation | movement in mode 2-1 is performed. モード２−２での運用が行われる場合のMPDファイルの記述例を示す図である。It is a figure showing the example of description of the MPD file when operation in mode 2-2 is performed. モード２−２での運用が行われる場合の字幕の表示タイミングの例を示す図である。It is a figure which shows the example of the display timing of a subtitle when operation | movement in mode 2-2 is performed. モード３での運用が行われる場合のMPDファイルの記述例を示す図である。FIG. 13 is a diagram illustrating an example of a description of an MPD file when operation in mode 3 is performed. モード３での運用が行われる場合の字幕の表示タイミングの例を示す図である。FIG. 10 is a diagram illustrating an example of subtitle display timing when operation in mode 3 is performed. ATSCサーバの構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of an ATSC server. ATSCサーバの詳細な構成例を示す図である。FIG. 3 is a diagram illustrating a detailed configuration example of an ATSC server. ATSCクライアントの詳細な構成例を示す図である。FIG. 3 is a diagram illustrating a detailed configuration example of an ATSC client. ATSCクライアントのソフトウェア構成例を示す図である。FIG. 3 is a diagram illustrating an example of a software configuration of an ATSC client. 送信処理を説明するフローチャートである。It is a flowchart explaining a transmission process. コンポーネント・シグナリング処理を説明するフローチャートである。It is a flowchart explaining a component signaling process. 受信処理を説明するフローチャートである。It is a flowchart explaining a receiving process. コンポーネント・シグナリング処理を説明するフローチャートである。It is a flowchart explaining a component signaling process. コンピュータの構成例を示す図である。FIG. 14 is a diagram illustrating a configuration example of a computer.

以下、図面を参照しながら本技術の実施の形態について説明する。なお、説明は以下の順序で行うものとする。 Hereinafter, embodiments of the present technology will be described with reference to the drawings. The description will be made in the following order.

１．システムの構成
２．本技術の概要
３．具体的な運用例
（１）モード１：TTML Time Only
（２）モード２：Sample Time Only
（２−１）モード２−１：Sample Time Only
（２−２）モード２−２：Sample Time Only But Till Next
（３）モード３：Asap
４．各装置の構成
５．各装置で実行される処理の流れ
６．変形例
７．コンピュータの構成 1. 1. System configuration 2. Overview of this technology Specific operation example (1) Mode 1: TTML Time Only
(2) Mode 2: Sample Time Only
(2-1) Mode 2-1: Sample Time Only
(2-2) Mode 2-2: Sample Time Only But Till Next
(3) Mode 3: Asap
4. 4. Configuration of each device 5. Flow of processing executed in each device Modification 7 Computer configuration

＜１．システムの構成＞ <1. System Configuration>

図１は、本技術を適用した伝送システムの一実施の形態の構成を示す図である。なお、システムとは、複数の装置が論理的に集合した物をいう。 FIG. 1 is a diagram illustrating a configuration of an embodiment of a transmission system to which the present technology is applied. It should be noted that a system refers to a logical collection of a plurality of devices.

図１において、伝送システム１は、ATSCサーバ１０とATSCクライアント２０から構成される。この伝送システム１では、ATSC3.0等のデジタル放送の規格に準拠したデータ伝送が行われる。ATSC3.0は、現在策定中の次世代のATSC(Advanced Television Systems Committee)規格である。 In FIG. 1, a transmission system 1 includes an ATSC server 10 and an ATSC client 20. In this transmission system 1, data transmission conforming to digital broadcasting standards such as ATSC3.0 is performed. ATSC 3.0 is a next-generation ATSC (Advanced Television Systems Committee) standard that is currently being formulated.

ATSCサーバ１０は、ATSC3.0等のデジタル放送の規格に対応した送信機であって、例えば、複数のサーバから構成される。ATSCサーバ１０は、テレビ番組等のコンテンツを構成するビデオやオーディオ、字幕等(のコンポーネント)のストリームを、デジタル放送信号として、伝送路３０を介して送信(伝送)する。 The ATSC server 10 is a transmitter corresponding to a digital broadcasting standard such as ATSC3.0, and includes, for example, a plurality of servers. The ATSC server 10 transmits (transmits) a stream of (components of) video, audio, subtitles, and the like, which constitute content such as a television program, as a digital broadcast signal via the transmission path 30.

ATSCクライアント２０は、ATSC3.0等のデジタル放送の規格に対応した受信機であって、例えば、テレビ受像機やセットトップボックスなどの固定受信機、あるいは、スマートフォンや携帯電話機、タブレット型コンピュータなどのモバイル受信機である。また、ATSCクライアント２０は、例えば車載テレビなどの自動車に搭載される機器であってもよい。 The ATSC client 20 is a receiver compatible with digital broadcasting standards such as ATSC3.0, and is, for example, a fixed receiver such as a television receiver or a set-top box, or a smartphone, a mobile phone, a tablet computer, or the like. It is a mobile receiver. Further, the ATSC client 20 may be a device mounted on an automobile such as an on-vehicle television, for example.

ATSCクライアント２０は、ATSCサーバ１０から伝送路３０を介して送信(伝送)されてくる、デジタル放送信号を受信して、ビデオやオーディオ、字幕等(のコンポーネント)のストリームを取得して処理し、テレビ番組等のコンテンツの映像や音声を出力する。 The ATSC client 20 receives a digital broadcast signal transmitted (transmitted) from the ATSC server 10 via the transmission path 30, acquires and processes a stream of (a component of) video, audio, subtitles, and the like, Outputs video and audio of content such as television programs.

なお、図１において、伝送路３０としては、例えば地上波のほか、衛星回線やケーブルテレビジョン網(有線回線)等を利用することができる。 In FIG. 1, as the transmission path 30, for example, a satellite line, a cable television network (wired line), or the like can be used in addition to terrestrial waves.

＜２．本技術の概要＞ <2. Overview of this technology>

（プロトコルスタック）
ところで、ATSC3.0では、データ伝送に、TS(Transport Stream)パケットではなく、IP/UDPパケット、すなわち、UDP(User Datagram Protocol)パケットを含むIP(Internet Protocol)パケットを用いることが決定されている。 (Protocol stack)
By the way, in ATSC3.0, it has been decided to use not IP (UDP packets) for data transmission but IP (Internet Protocol) packets including UDP (User Datagram Protocol) packets, instead of TS (Transport Stream) packets. .

また、ATSC3.0においては、トランスポート・プロトコルとして、ROUTE(Real-Time Object Delivery over Unidirectional Transport)と、MMT(MPEG Media Transport)が併存し、いずれか一方のトランスポート・プロトコルを用いてビデオやオーディオ、字幕等(のコンポーネント)のストリームが伝送される。 In ATSC 3.0, ROUTE (Real-Time Object Delivery over Unidirectional Transport) and MMT (MPEG Media Transport) coexist as transport protocols, and video and video are transmitted using either transport protocol. A stream of (a component of) audio, subtitles, and the like is transmitted.

ここで、ROUTEは、バイナリファイルを一方向でマルチキャスト転送するのに適したプロトコルであるFLUTE(File Delivery over Unidirectional Transport)を拡張したプロトコルである。また、MMTは、IP(Internet Protocol)上で用いられるトランスポート方式であり、制御情報によりIPアドレスやURL(Uniform Resource Locator)を設定することで、ビデオやオーディオ等のデータを参照することができる。 Here, ROUTE is an extension of FLUTE (File Delivery over Unidirectional Transport), which is a protocol suitable for multicast transfer of a binary file in one direction. MMT is a transport method used on IP (Internet Protocol), and by setting an IP address and a URL (Uniform Resource Locator) by control information, data such as video and audio can be referred to. .

さらに、ATSC3.0においては、シグナリングとして、LLS(Link Layer Signaling)シグナリング情報と、SLS(Service Layer Signaling)シグナリング情報を規定することが想定されており、先行して取得されるLLSシグナリング情報に記述される情報に従い、サービスごとのSLSシグナリング情報が取得されることになる。 Furthermore, in ATSC3.0, it is assumed that, as signaling, LLS (Link Layer Signaling) signaling information and SLS (Service Layer Signaling) signaling information are defined, and described in the LLS signaling information obtained in advance. The SLS signaling information for each service is obtained according to the information to be provided.

ここで、LLSシグナリング情報としては、例えば、SLT(Service List Table)等のメタデータが含まれる。SLTメタデータは、サービスの選局に必要な情報(選局情報)など、放送ネットワークにおけるストリームやサービスの構成を示す情報を含む。 Here, the LLS signaling information includes, for example, metadata such as an SLT (Service List Table). The SLT metadata includes information indicating the configuration of a stream or service in a broadcast network, such as information necessary for selecting a service (tuning information).

また、SLSシグナリング情報としては、例えば、USD(User Service Description)，LSID(LCT Session Instance Description)，MPD(Media Presentation Description)等のメタデータが含まれる。USDメタデータは、他のメタデータの取得先などの情報を含む。LSIDメタデータは、ROUTEプロトコルの制御情報である。MPDメタデータは、コンポーネントのストリームの再生を管理するための制御情報である。なお、USD，LSID，MPD等のメタデータは、XML(Extensible Markup Language)等のマークアップ言語により記述される。また、MPDメタデータは、MPEG-DASH(Dynamic Adaptive Streaming over HTTP)の規格に準じている。MPDメタデータは、XML形式のファイルとして提供されるので、以下の説明では、MPDファイルと称して説明する。 The SLS signaling information includes, for example, metadata such as USD (User Service Description), LSID (LCT Session Instance Description), and MPD (Media Presentation Description). The USD metadata includes information such as the acquisition source of other metadata. LSID metadata is control information of the ROUTE protocol. The MPD metadata is control information for managing reproduction of a component stream. Note that metadata such as USD, LSID, and MPD are described in a markup language such as XML (Extensible Markup Language). The MPD metadata conforms to the MPEG-DASH (Dynamic Adaptive Streaming over HTTP) standard. Since the MPD metadata is provided as an XML format file, it will be described as an MPD file in the following description.

図２は、本実施例のプロトコルスタックを示す図である。 FIG. 2 is a diagram illustrating a protocol stack according to the present embodiment.

図２において、最も下位の階層は、物理層(Physical Layer)とされる。この物理層に隣接する上位の階層は、レイヤ２の階層(Layer2)とされ、さらに、レイヤ２の階層に隣接する上位の階層は、IP層とされる。また、IP層に隣接する上位の階層はUDP層とされる。すなわち、UDPパケットを含むIPパケット(IP/UDPパケット)が、レイヤ２のGenericパケットのペイロードに配置され、カプセル化(encapsulation)される。また、物理層のフレーム(ATSC Physical Frame)は、プリアンブルとデータ部から構成されるが、データ部には、複数のGenericパケットをカプセル化して得られるBBフレームに対してエラー訂正用のパリティを付加した後に、インターリーブやマッピング等の物理層に関する処理が行われることで得られるデータがマッピングされる。 In FIG. 2, the lowest layer is a physical layer. The upper layer adjacent to the physical layer is the layer 2 (Layer 2), and the upper layer adjacent to the layer 2 is the IP layer. The upper layer adjacent to the IP layer is the UDP layer. That is, an IP packet (IP / UDP packet) including a UDP packet is arranged in the payload of the layer 2 generic packet, and is encapsulated. The physical layer frame (ATSC Physical Frame) is composed of a preamble and a data part.In the data part, parity for error correction is added to the BB frame obtained by encapsulating multiple generic packets. After that, data obtained by performing processing related to the physical layer such as interleaving and mapping is mapped.

UDP層に隣接する上位の階層は、ROUTE，MMT，SLTとされる。すなわち、ROUTEセッションで伝送される、ビデオ、オーディオ、及び、字幕のストリームと、SLSシグナリング情報のストリームと、NRTコンテンツのストリームは、IP/UDPパケットに格納されて伝送される。なお、NRTコンテンツは、NRT(Non Real Time)放送で配信されるコンテンツであって、ATSCクライアント２０のストレージに一旦蓄積された後で再生が行われる。また、NRTコンテンツ以外のファイル(例えばアプリケーションのファイル)がROUTEセッションで伝送されるようにしてもよい。 The upper layers adjacent to the UDP layer are ROUTE, MMT, and SLT. That is, the video, audio, and subtitle streams, the SLS signaling information stream, and the NRT content stream transmitted in the ROUTE session are stored in IP / UDP packets and transmitted. The NRT content is a content distributed by NRT (Non Real Time) broadcasting, and is reproduced after being temporarily stored in the storage of the ATSC client 20. Also, a file (for example, an application file) other than the NRT content may be transmitted in the ROUTE session.

一方で、MMTセッションで伝送される、ビデオ、オーディオ、及び、字幕のストリームと、SLSシグナリング情報のストリームは、IP/UDPパケットに格納されて伝送される。また、SLTメタデータは、IP/UDPパケットに格納されて伝送される。 On the other hand, the video, audio, and subtitle streams and the SLS signaling information stream transmitted in the MMT session are stored in IP / UDP packets and transmitted. Further, the SLT metadata is stored in an IP / UDP packet and transmitted.

以上のようなプロトコルスタックが採用されているため、ATSCクライアント２０は、ROUTEセッションで伝送されるコンポーネントのストリームにより提供されるサービス(チャンネル)の選局時には、SLTメタデータに含まれる選局情報に従い、ROUTEセッションで伝送されるSLSシグナリング情報を取得する(Ｓ１−１，Ｓ１−２)。そして、ATSCクライアント２０は、USD，LSID，MPD等のメタデータに従い、選局されたサービスを提供するコンポーネントのストリームに接続する(Ｓ１−３)。これにより、ATSCクライアント２０では、選局されたサービスに応じたコンテンツ(例えばテレビ番組)の映像や音声が出力される。 Since the protocol stack as described above is employed, the ATSC client 20 uses the channel selection information included in the SLT metadata when selecting a service (channel) provided by the component stream transmitted in the ROUTE session. , SLS signaling information transmitted in the ROUTE session is acquired (S1-1, S1-2). Then, the ATSC client 20 connects to the stream of the component providing the selected service according to the metadata such as USD, LSID, and MPD (S1-3). Thus, the ATSC client 20 outputs video and audio of content (for example, a television program) according to the selected service.

また、ATSCクライアント２０は、MMTセッションで伝送されるコンポーネントのストリームにより提供されるサービスの選局時には、SLTメタデータに含まれる選局情報に従い、MMTセッションで伝送されるSLSシグナリング情報を取得する(Ｓ２−１，Ｓ２−２)。そして、ATSCクライアント２０は、USD，LSID，MPD等のメタデータに従い、選局されたサービスを提供するコンポーネントのストリームに接続する(Ｓ２−３)。これにより、ATSCクライアント２０では、選局されたサービスに応じたコンテンツ(例えばテレビ番組)の映像や音声が出力される。 Also, when selecting a service provided by the component stream transmitted in the MMT session, the ATSC client 20 acquires SLS signaling information transmitted in the MMT session according to the tuning information included in the SLT metadata ( S2-1, S2-2). Then, the ATSC client 20 connects to the stream of the component providing the selected service according to the metadata such as USD, LSID, and MPD (S2-3). Thus, the ATSC client 20 outputs video and audio of content (for example, a television program) according to the selected service.

（MPDファイルの構造）
次に、図３乃至図８を参照して、SLSシグナリング情報として伝送されるMPDファイルについて説明する。図３は、MPDファイルの構造を示す図である。また、MPDファイルは、図４の記述例で示すように、XML形式の階層構造で記述される。 (MPD file structure)
Next, an MPD file transmitted as SLS signaling information will be described with reference to FIGS. FIG. 3 is a diagram showing the structure of the MPD file. In addition, the MPD file is described in a hierarchical structure in an XML format, as shown in the description example of FIG.

ここで、図３に示すように、MPDファイルは、Period要素、AdaptationSet要素、Representation要素、及び、SubRepresentation要素が階層構造で記述されている。Period要素は、テレビ番組等のコンテンツの構成を記述する単位となる。また、AdaptationSet要素、Representation要素、又は、SubRepresentation要素は、ビデオやオーディオ、字幕等(のコンポーネント)のそれぞれのストリームごとに利用され、ぞれぞれのストリームの属性を記述できるようになっている。 Here, as shown in FIG. 3, in the MPD file, a Period element, an AdaptationSet element, a Representation element, and a SubRepresentation element are described in a hierarchical structure. The Period element is a unit that describes the configuration of content such as a TV program. An AdaptationSet element, a Representation element, or a SubRepresentation element is used for each stream of (components of) video, audio, subtitles, and the like, and can describe attributes of each stream.

具体的には、AdaptationSet要素は、各種のソースからエンコードされたストリームを表している。そして、当該ストリームを、例えばビットレート等のパラメトリックに応じて、ATSCクライアント２０側で選択させるために、AdaptationSet要素内に、Representation要素を配置して、例えばビットレート等のパラメータが異なる複数の選択肢となるストリームを列挙している。通常、AdaptationSet要素やRepresentation要素は、ビデオやオーディオ、字幕のストリームなど、単一のストリームに対応させている。 Specifically, the AdaptationSet element represents a stream encoded from various sources. Then, in order to allow the ATSC client 20 to select the stream according to, for example, a parametric such as a bit rate, a Representation element is arranged in an AdaptationSet element, and a plurality of options having different parameters such as a bit rate are displayed. Enumerated streams. Normally, the AdaptationSet element and the Representation element correspond to a single stream such as a video, audio, or subtitle stream.

また、AdaptationSet要素が、ビデオストリームやオーディオストリーム、字幕ストリームなどの複数のストリームが多重化されたストリームを表現する場合、AdaptationSet要素内に、Representation要素を配置して、例えばビットレート等のパラメータが異なる複数の選択肢となる多重化されたストリームを列挙する。すなわち、図５に示すように、時間間隔を表すPeriod要素ごとに、多重化されたストリームを表す複数のAdaptationSet要素が配置され、それらのAdaptationSet要素内に配置された複数のRepresentation要素により、例えばビットレートが異なる多重化されたストリームを複数列挙することができる。 When the AdaptationSet element represents a stream in which a plurality of streams such as a video stream, an audio stream, and a subtitle stream are multiplexed, a Representation element is arranged in the AdaptationSet element, and parameters such as a bit rate are different. Enumerate multiplexed streams for multiple choices. That is, as shown in FIG. 5, for each Period element representing a time interval, a plurality of AdaptationSet elements representing a multiplexed stream are arranged, and a plurality of Representation elements arranged in those AdaptationSet elements form, for example, a bit. A plurality of multiplexed streams having different rates can be listed.

また、この場合においては、Representation要素の配下にさらにSubRepresentation要素を配置することで、多重化されたストリームを構成する各コンポーネントのストリームの属性を記述することもできる。すなわち、図６に示すように、時間間隔を表すPeriod要素ごとに、多重化されたストリームを表す複数のAdaptationSet要素が配置され、それらのAdaptationSet要素内に、例えばビットレートが異なる多重化されたストリームを表す複数のRepresentation要素が配置され、さらに、それらのRepresentation要素内に配置されたSubRepresentation要素により、例えば、ビデオストリームやオーディオストリーム、字幕ストリームの属性を記述することができる。 In this case, by further arranging the SubRepresentation element under the Representation element, the attribute of the stream of each component constituting the multiplexed stream can be described. That is, as shown in FIG. 6, a plurality of AdaptationSet elements representing multiplexed streams are arranged for each Period element representing a time interval, and multiplexed streams having different bit rates, for example, are arranged in the AdaptationSet elements. Are represented, and the attributes of, for example, a video stream, an audio stream, and a subtitle stream can be described by the SubRepresentation elements arranged in the Representation elements.

なお、AdaptationSet要素は、ビデオストリームやオーディオストリーム等の単一のストリームの他、複数のストリームが多重化されたストリームに対応している。MPEG-DASHの規格においては、このようなAdaptationSet要素に含めることが可能な属性や要素として、図７の属性や要素が規定されている。また、Representation要素は、その上位要素（親要素）となるAdaptationSet要素の範囲内で、例えばビットレート等のパラメータが異なる複数の選択肢となるストリームを列挙している。MPEG-DASHの規格においては、このようなRepresentation要素に含めることが可能な属性や要素として、図８の属性や要素が規定されている。なお、図８の属性や要素は、SubRepresentation要素に含めるようにすることができる。 Note that the AdaptationSet element corresponds to a stream in which a plurality of streams are multiplexed in addition to a single stream such as a video stream and an audio stream. In the MPEG-DASH standard, the attributes and elements shown in FIG. 7 are defined as attributes and elements that can be included in such an AdaptationSet element. In addition, the Representation element lists a plurality of alternative streams having different parameters such as bit rates, for example, within the range of the AdaptationSet element which is a higher-order element (parent element). In the MPEG-DASH standard, the attributes and elements shown in FIG. 8 are defined as attributes and elements that can be included in such a Representation element. Note that the attributes and elements in FIG. 8 can be included in the SubRepresentation element.

（MP4のファイルフォーマット）
ところで、トランスポート・プロトコルとして、ROUTEを用いる場合に、ストリーミングのファイルフォーマットに、MP4ファイルフォーマットを採用することが想定される。MP4ファイルフォーマットは、ISO/IEC 14496-12で規定されているISOベースメディアファイルフォーマット(ISO Base Media File Format)の派生フォーマットである。ISOベースメディアファイルフォーマットは、ボックス(Box)と称される木構造から構成される。 (MP4 file format)
By the way, when ROUTE is used as the transport protocol, it is assumed that the MP4 file format is adopted as the streaming file format. The MP4 file format is a derivative of the ISO Base Media File Format defined in ISO / IEC 14496-12. The ISO base media file format has a tree structure called a box.

ここで、ROUTEセッションで伝送されるセグメントは、イニシャライゼイションセグメント(Initialization Segment)とメディアセグメント(Media Segment)から構成される。
イニシャライゼイションセグメントは、データ圧縮方式等の初期化情報を含んでいる。また、メディアセグメントは、ビデオやオーディオ、字幕等(のコンポーネント)のストリームのデータを格納している。 Here, the segment transmitted in the ROUTE session includes an initialization segment (Initialization Segment) and a media segment (Media Segment).
The initialization segment includes initialization information such as a data compression method. The media segment stores stream data of (components of) video, audio, subtitles, and the like.

図９は、MP4のファイルフォーマット形式からなるメディアセグメントの構造を示す図である。 FIG. 9 is a diagram showing the structure of a media segment having the MP4 file format.

メディアセグメントは、styp(segment type)，sidx(segment index)，ssix(subsegment index)，moof(movie fragment)，mdat(madia data)の各ボックス(Box)から構成される。
stypボックスには、セグメント単位のファイルのファイルフォーマット仕様のバージョン情報が含まれる。sidxボックスには、セグメント内のインデックス情報が含まれる。ssixボックスには、セグメント内の各サブセグメント(レベル)ごとのインデックス情報が含まれる。ただし、図９においては、styp，sidx，ssixの各ボックスの図示を省略している。 The media segment is composed of boxes of styp (segment type), sidx (segment index), ssix (subsegment index), moof (movie fragment), and mdat (madia data).
The styp box contains version information of the file format specification of the file in segment units. The sidx box contains index information in the segment. The ssix box contains index information for each sub-segment (level) in the segment. However, in FIG. 9, illustration of each box of stip, sidx, and ssix is omitted.

moofボックスには、フラグメント化された(コンポーネントの)ストリームのデータの制御情報が含まれる。moofボックスには、mfhd(movie fragment header)ボックスが含まれる。また、mfhdボックスには、tfdt(track fragment decode time)ボックスと、trun(track fragment run)ボックスが含まれる。 The moof box contains control information of the data of the fragmented (component) stream. The moof box includes an mfhd (movie fragment header) box. The mfhd box includes a tfdt (track fragment decode time) box and a trun (track fragment run) box.

tfdtボックスには、サンプルのデコード開始時間を表すBMDT(Base Media Decode Time)が含まれる。trunボックスには、サンプルの数を表すSampleCount、サンプルの継続時間を表すSampleDuration、及び、オフセット値を表すCompositionOffsetを示す情報が含まれる。 The tfdt box contains BMDT (Base Media Decode Time) indicating the decoding start time of the sample. The trun box includes information indicating SampleCount representing the number of samples, SampleDuration representing the duration of the sample, and CompositionOffset representing the offset value.

mdatボックスには、サンプル(Sample)として、フラグメント化された(コンポーネントの)ストリームのデータ(データ本体)が格納される。このサンプルは、(コンポーネントの)ストリームのデータが、処理される基本単位で格納される。 The mdat box stores fragmented (component) stream data (data body) as a sample (Sample). In this sample, the data of the stream (of the component) is stored in a basic unit to be processed.

なお、以下の説明では、サンプルのうち、字幕を表示させるためのTTML形式のファイル(TTMLファイル)のサンプルを、TTMLサンプルとも称する。また、メディアセグメントのうち、mdatボックスにTTMLサンプルを含むメディアセグメントを、TTMLセグメントとも称する。さらに、以下の説明で、単にセグメント(Segment)と記述した場合には、メディアセグメントを意味するものとする。 In the following description, among the samples, a sample of a TTML format file (TTML file) for displaying subtitles is also referred to as a TTML sample. Further, among the media segments, a media segment whose mdat box includes a TTML sample is also referred to as a TTML segment. Further, in the following description, a simple description of a segment means a media segment.

（TTML処理モード）
また、トランスポート・プロトコルとして、ROUTEを用いる場合に、字幕(CC：Closed Caption)の表示に、TTML形式のTTMLファイルを利用することが想定される。しかしながら、TTMLファイルで指定される字幕を表示するための技術方式が確立されておらず、所望のタイミングで字幕を表示させるための提案が要請されていた。 (TTML processing mode)
Also, when ROUTE is used as a transport protocol, it is assumed that a TTML file in TTML format is used for displaying closed captions (CC). However, a technical method for displaying subtitles specified in the TTML file has not been established, and a proposal for displaying subtitles at desired timing has been requested.

そこで、本技術では、字幕の表示のタイミングを指定するためのTTML処理モードとして、複数のモードを規定して、ATSCサーバ１０が、MPDファイルに、複数のモードのうちの特定のモードを選択するための選択情報を含めることで、ATSCクライアント２０が、MPDファイルに含まれる選択情報に基づいて、特定のモードに応じた表示タイミングで、TTMLファイルで指定される字幕を表示することができるようにする。 Therefore, in the present technology, a plurality of modes are defined as the TTML processing mode for specifying the timing of displaying subtitles, and the ATSC server 10 selects a specific mode among the plurality of modes for the MPD file. The selection information for the ATSC client 20 based on the selection information included in the MPD file so that the subtitle specified in the TTML file can be displayed at a display timing corresponding to a specific mode. I do.

図１０は、TTML処理モードの例を示す図である。 FIG. 10 is a diagram illustrating an example of the TTML processing mode.

TTML処理モードとしては、モード１、モード２、及び、モード３が規定されている。また、モード２には、モード２−１と、モード２−２の２種類が規定されている。 Mode 1, mode 2, and mode 3 are defined as TTML processing modes. In the mode 2, two types, a mode 2-1 and a mode 2-2, are defined.

モード１は、TTMLファイルで指定される時間情報に応じたタイミングで字幕の表示を行うモードである。モード１が設定される場合、MPDファイルには、選択情報として、AdaptationSet要素のEssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性の値に、"atsc:ttmlMode:ttmlTimeOnly"が指定される。 Mode 1 is a mode in which captions are displayed at a timing according to time information specified in the TTML file. When mode 1 is set, “atsc: ttmlMode: ttmlTimeOnly” is specified as the selection information in the value of the schemeIdUri attribute of the EssentialProperty element of the AdaptationSet element or the SupplementalProperty element in the MPD file.

なお、TTMLファイルでは、body要素内のp要素により、字幕の文字列と、その字幕の表示開始時刻や表示終了時刻等の時間情報が指定される。 In the TTML file, a character string of a caption and time information such as a display start time and a display end time of the caption are specified by a p element in a body element.

モード２は、TTMLファイルで指定される時間情報は無視して、MP4のファイルフォーマットで規定される時間情報に応じたタイミングで字幕の表示を行うモードである。 Mode 2 is a mode in which time information specified in the TTML file is ignored, and subtitles are displayed at timings according to the time information specified in the MP4 file format.

ここで、モード２のうち、モード２−１では、moofボックスに格納されるBMDTに応じた時間に字幕の表示を開始して、そのmoofボックスに格納されるSampleDurationに応じた時間の間だけ表示を継続させる。モード２−１が設定される場合、MPDファイルには、選択情報として、AdaptationSet要素のEssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性の値に、"atsc:ttmlMode:sampleTimeOnly"が指定される。 Here, in the mode 2-1 of the mode 2, the display of the subtitles is started at the time corresponding to the BMDT stored in the moof box, and the display is performed only during the time corresponding to the SampleDuration stored in the moof box. To continue. When the mode 2-1 is set, “atsc: ttmlMode: sampleTimeOnly” is specified as the selection information in the value of the schemeIdUri attribute of the EssentialProperty element of the AdaptationSet element or the SupplementalProperty element in the MPD file.

また、モード２−２では、対象のTTMLサンプルを格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間に字幕の表示を開始して、次のTTMLサンプルを格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間まで表示を継続させる。モード２−２が設定される場合、MPDファイルには、選択情報として、AdaptationSet要素のEssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性の値に、"atsc:ttmlMode:sampleTimeOnlyButTillNext"が指定される。 In the mode 2-2, the subtitles are displayed at the time corresponding to the BMDT stored in the moof box corresponding to the mdat box storing the target TTML sample, and are displayed in the mdat box storing the next TTML sample. The display is continued until the time corresponding to the BMDT stored in the corresponding moof box. When the mode 2-2 is set, “atsc: ttmlMode: sampleTimeOnlyButTillNext” is specified in the MPD file as the selection information in the value of the schemeIdUri attribute of the EssentialProperty element of the AdaptationSet element or the SupplementalProperty element.

モード３は、TTMLファイルで指定される時間情報と、MP4のファイルフォーマットで規定される時間情報を無視して、字幕の表示を行うモードである。この場合、ATSCクライアント２０では、ATSCサーバ１０から伝送路３０を介して伝送される、TTMLファイルが取得されたとき、即時に、そのTTMLファイルで指定される字幕が表示される。モード３が設定される場合、MPDファイルには、選択情報として、AdaptationSet要素のEssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性の値に、"atsc:ttmlMode:asap"が指定される。 Mode 3 is a mode in which subtitles are displayed ignoring time information specified in the TTML file and time information specified in the MP4 file format. In this case, when the TTML file transmitted from the ATSC server 10 via the transmission path 30 is acquired, the ATSC client 20 immediately displays the caption specified by the TTML file. When mode 3 is set, “atsc: ttmlMode: asap” is specified as the selection information in the value of the schemeIdUri attribute of the EssentialProperty element of the AdaptationSet element or the SupplementalProperty element in the MPD file.

＜３．具体的な運用例＞ <3. Specific operation example>

次に、TTML処理モードとして、モード１、モード２−１、モード２−２、及び、モード３が設定された場合の運用例について順に説明する。 Next, an operation example in the case where mode 1, mode 2-1, mode 2-2, and mode 3 are set as the TTML processing mode will be described in order.

（１）モード１：TTML Time Only (1) Mode 1: TTML Time Only

まず、TTML処理モードとして、モード１が設定された場合の運用例を、図１１及び図１２を参照して説明する。このモード１では、TTMLファイルに指定された時間情報のみを使用する。 First, an operation example when mode 1 is set as the TTML processing mode will be described with reference to FIGS. In this mode 1, only the time information specified in the TTML file is used.

（MPDファイルの記述例）
図１１は、モード１での運用が行われる場合のMPDファイルの記述例を示す図である。 (Example of MPD file description)
FIG. 11 is a diagram illustrating a description example of an MPD file in the case where operation in mode 1 is performed.

図１１のMPDファイルにおいて、ルート要素であるMPD要素のPeriod要素の配下のAdaptationSet要素には、Role要素が配置され、schemeIdUri属性として、"urn:mpeg:dash:role:2011"が指定され、value属性として、字幕を表す"caption"が指定されている。 In the MPD file of FIG. 11, a Role element is arranged in the AdaptationSet element under the Period element of the MPD element which is the root element, and "urn: mpeg: dash: role: 2011" is specified as a schemeIdUri attribute, and "Caption" indicating subtitles is specified as an attribute.

また、AdaptationSet要素には、EssentialProperty要素のschemeIdUri属性として、"atsc:ttmlMode:ttmlTimeOnly"が指定されている。すなわち、このEssentialProperty要素のschemeIdUri属性の属性値により、TTML処理モードとして、モード１が設定されていることになる。 In the AdaptationSet element, "atsc: ttmlMode: ttmlTimeOnly" is specified as the schemeIdUri attribute of the EssentialProperty element. That is, the attribute value of the schemeIdUri attribute of the EssentialProperty element indicates that the mode 1 is set as the TTML processing mode.

（字幕表示タイミング例）
図１２は、モード１での運用が行われる場合の字幕の表示タイミングの例を示す図である。 (Example of subtitle display timing)
FIG. 12 is a diagram illustrating an example of subtitle display timing when operation in mode 1 is performed.

図１２において、図１２Ａは、MPDファイルで指定される時間に関する情報を模式的に表し、図１２Ｂは、セグメント(TTMLセグメント)の構造を模式的に表している。また、図１２Ｃは、図１２Ｂのセグメント(TTMLセグメント)のサンプル(TTMLサンプル)から得られるTTMLファイルの記述例を示している。なお、図１２において、時間の方向は、図中の左側から右側の方向とされる。また、これらの関係は、後述する他のモードを説明する図でも同様とされる。 In FIG. 12, FIG. 12A schematically shows information on the time specified in the MPD file, and FIG. 12B schematically shows the structure of a segment (TTML segment). FIG. 12C shows a description example of a TTML file obtained from the segment (TTML segment) sample (TTML sample) shown in FIG. 12B. In FIG. 12, the direction of time is from left to right in the figure. Further, these relationships are the same in the drawings for explaining other modes described later.

図１２Ａに示すように、MPDファイルには、ルート要素であるMPD要素のavailabilityStartTime属性に、UTC(Coordinated Universal Time)時刻に従ったストリーミング配信の開始時刻が指定される。また、MPDファイルには、Period要素として、Period(1)，Period(2)，・・・が指定され、それらのPeriod要素には、start属性として、各Period(期間)の開始時刻が指定される。 As shown in FIG. 12A, in the MPD file, the start time of streaming distribution according to the UTC (Coordinated Universal Time) time is specified in the availabilityStartTime attribute of the MPD element that is the root element. In the MPD file, Period (1), Period (2), ... are specified as Period elements, and the start time of each Period (period) is specified as a start attribute in those Period elements. You.

すなわち、MPDファイルでは、availabilityStartTime属性で指定される開始時刻(WallClock時間軸上の時刻)と、Period要素で指定される時刻(MPD時間軸上の時刻)との和により、各Periodの先頭の時刻が表される。例えば、availabilityStartTime属性で指定される開始時刻(MPD/@availabilityStartTime)と、Period(2)の開始時刻(MPD/Period(2)/@start)との和(MPD/@availabilityStartTime + MPD/Period(2)/@start)により、Period(2)の先頭の時刻が求められる。 In other words, in the MPD file, the start time of each Period is determined by the sum of the start time (time on the WallClock time axis) specified by the availabilityStartTime attribute and the time (time on the MPD time axis) specified by the Period element. Is represented. For example, the sum of the start time (MPD / @ availabilityStartTime) specified by the availabilityStartTime attribute and the start time of Period (2) (MPD / @ availabilityStartTime + MPD / Period (2 ) / @ start) determines the time at the beginning of Period (2).

図１２Ｂには、Period(2)におけるセグメント(Segment)が模式的に示されている。このセグメントは、mdatボックスが、サンプル(sample)としてTTMLサンプルを格納したTTMLセグメントである。このTTMLセグメントのmdatボックスに格納されたTTMLサンプルから、TTMLファイルが得られることになる。なお、モード１においては、moofボックスに格納される時間情報(BMDTやSampleDuration等)は無視される。 FIG. 12B schematically illustrates a segment in Period (2). This segment is a TTML segment in which the mdat box stores a TTML sample as a sample. From the TTML sample stored in the mdat box of the TTML segment, a TTML file is obtained. In the mode 1, time information (BMDT, SampleDuration, etc.) stored in the moof box is ignored.

図１２Ｃには、TTMLファイルの記述例が示されている。TTMLファイルでは、head要素内のstyling要素やlayout要素などの要素により、字幕として表示される文字の色やフォント、表示位置などが指定される。また、TTMLファイルでは、body要素内のp要素により、字幕の文字列と、その字幕の表示開始時刻や表示終了時刻等の時間情報が指定される。 FIG. 12C shows a description example of the TTML file. In the TTML file, elements such as the styling element and the layout element in the head element specify the color, font, display position, and the like of characters displayed as subtitles. In the TTML file, a character string of a caption and time information such as a display start time and a display end time of the caption are specified by a p element in a body element.

具体的には、図１２ＣのTTMLファイルに記述された２つのp要素のうち、上段のp要素の開始タグと終了タグの間には、"text1"である文字列が記述されている。また、この上段のp要素には、表示開始時刻を設定するためのbegin属性として"t1"が指定され、表示終了時刻を設定するためのend属性として"t2"が指定されている。 Specifically, of the two p elements described in the TTML file of FIG. 12C, a character string “text1” is described between the start tag and the end tag of the upper p element. In the upper p element, "t1" is specified as a begin attribute for setting a display start time, and "t2" is specified as an end attribute for setting a display end time.

また、図１２ＣのTTMLファイルにおいて、下段のp要素の開始タグと終了タグの間には、"text2"である文字列が記述されている。また、この下段のp要素には、begin属性として"t2"が指定され、end属性として"t3"が指定されている。 In the TTML file of FIG. 12C, a character string “text2” is described between the start tag and the end tag of the lower p element. In the lower p element, "t2" is specified as the begin attribute and "t3" is specified as the end attribute.

ここで、この運用時に取得されるMPDファイル(図１１)には、AdaptationSet要素のEssentialProperty要素のschemeIdUri属性として、"atsc:ttmlMode:ttmlTimeOnly"が指定されているので、TTML処理モードとして、モード１が設定されていることになる。モード１が設定された場合には、TTMLファイルに記述される時間情報に応じたタイミングで字幕の表示を行うことになるので、p要素のbegin属性とend属性の値に応じたタイミングで、字幕が表示されることになる。 Here, in the MPD file (FIG. 11) acquired at the time of this operation, since “atsc: ttmlMode: ttmlTimeOnly” is specified as the schemeIdUri attribute of the EssentialProperty element of the AdaptationSet element, mode 1 is set as the TTML processing mode. It will be set. When mode 1 is set, subtitles are displayed at a timing according to the time information described in the TTML file. Therefore, subtitles are displayed at timings according to the values of the begin attribute and end attribute of the p element. Will be displayed.

具体的には、図１２に示すように、availabilityStartTime属性で指定される開始時刻と、Period(2)の開始時刻との和(MPD/@availabilityStartTime + MPD/Period(2)/@start)に応じたPeriod(2)の先頭の時刻を基準として、時間t1を経過したとき、上段のp要素に指定された"text1"である字幕の表示が開始され、時間t2を経過するまで、"text1"である字幕の表示が継続され、時間t2を経過したとき、"text1"である字幕の表示が終了される。
また、Period(2)の先頭の時刻を基準として、時間t2を経過したとき、下段のp要素に指定された"text2"である字幕の表示が開始され、時間t3を経過するまで、"text2"である字幕の表示が継続され、時間t3を経過したとき、"text2"である字幕の表示が終了される。 Specifically, as shown in FIG. 12, according to the sum (MPD / @ availabilityStartTime + MPD / Period (2) / @ start) of the start time specified by the availabilityStartTime attribute and the start time of Period (2). When the time t1 has elapsed with reference to the time at the beginning of Period (2), the display of the subtitle "text1" specified in the upper p element is started, and "text1" is displayed until the time t2 has elapsed. Is continued, and when the time t2 has elapsed, the display of the subtitle "text1" ends.
Also, when the time t2 has elapsed with reference to the time at the beginning of Period (2), the display of the subtitle “text2” specified in the lower p element starts, and “text2” continues until the time t3 has elapsed. The display of the subtitle "" is continued, and when the time t3 has elapsed, the display of the subtitle "text2" is terminated.

これにより、ATSCクライアント２０においては、テレビ番組などのコンテンツが再生されている場合に、Period(2)の先頭の時刻を基準として、時間t1の経過後であって、時間t2の経過前となる場合には、"text1"である字幕が映像に重畳表示され、時間t2の経過後であって、時間t3の経過前となる場合には、"text2"である字幕が映像に重畳表示されることになる。 Thereby, in the ATSC client 20, when a content such as a television program is being reproduced, the time is after the lapse of the time t1 and before the lapse of the time t2 with respect to the time at the beginning of Period (2). In this case, the subtitle “text1” is superimposed on the video, and when the time t2 has elapsed and before the time t3 has elapsed, the subtitle “text2” is superimposed on the video. Will be.

以上、TTML処理モードとして、モード１が設定された場合の運用例について説明した。
このモード１は、TTMLファイルに指定された時間情報(p要素のbegin属性とend属性の値)を利用して、begin属性に設定される時間に応じて字幕の表示を開始して、end属性に設定される時間に応じて字幕の表示を終了させることで、所望のタイミングで字幕を表示させることができる。 The operation example in the case where the mode 1 is set as the TTML processing mode has been described above.
This mode 1 uses the time information (values of the begin attribute and end attribute of the p element) specified in the TTML file to start displaying subtitles according to the time set in the begin attribute, By ending the display of the subtitles according to the time set in the subtitles, the subtitles can be displayed at a desired timing.

例えば、ATSCサーバ１０において、スタジオ収録等による、映像と音声と字幕のパッケージをコンテンツとして制作して、当該コンテンツを、伝送路３０を介して、複数のATSCクライアント２０に配信する場合に、TTMLファイルのp要素で指定される字幕を、当該コンテンツの先頭からの相対時間で、そのp要素のbegin属性とend属性が示す時間に応じたタイミングで表示させる運用が適当であると考えられる。このような運用を実現するためには、MPDファイルにおいて、TTML処理モードとして、モード１を設定することで、ATSCクライアント２０では、TTMLファイルのp要素のbegin属性とend属性が示す時間に応じたタイミングで、字幕が表示されることになる。 For example, in the ATSC server 10, when a package of video, audio, and subtitles by studio recording or the like is produced as content, and the content is distributed to a plurality of ATSC clients 20 via the transmission path 30, a TTML file is used. It is considered appropriate to operate the subtitle specified by the p element at a timing relative to the time indicated by the begin attribute and the end attribute of the p element in the relative time from the beginning of the content. In order to realize such operation, by setting mode 1 as the TTML processing mode in the MPD file, the ATSC client 20 responds to the time indicated by the begin attribute and end attribute of the p element of the TTML file. At the timing, subtitles will be displayed.

（２）モード２：Sample Time Only (2) Mode 2: Sample Time Only

次に、TTML処理モードとして、モード２が設定された場合の運用例を、図１３乃至図１６を参照して説明する。このモード２では、TTMLファイルに指定された時間情報を無視して、MP4のファイルフォーマットで規定される時間情報(TTMLサンプルごとの時間情報)を使用する。ここでは、モード２として、モード２−１とモード２−２を順に説明する。 Next, an operation example when mode 2 is set as the TTML processing mode will be described with reference to FIGS. In this mode 2, the time information specified in the MP4 file format (time information for each TTML sample) is used, ignoring the time information specified in the TTML file. Here, mode 2-1 and mode 2-2 will be sequentially described as mode 2.

（２−１）モード２−１：Sample Time Only (2-1) Mode 2-1: Sample Time Only

（MPDファイルの記述例）
図１３は、モード２−１での運用が行われる場合のMPDファイルの記述例を示す図である。 (Example of MPD file description)
FIG. 13 is a diagram illustrating a description example of an MPD file when the operation is performed in the mode 2-1.

図１３のMPDファイルにおいて、ルート要素であるMPD要素のPeriod要素の配下のAdaptationSet要素には、EssentialProperty要素のschemeIdUri属性として、"atsc:ttmlMode:sampleTimeOnly"が指定されている。すなわち、このEssentialProperty要素のschemeIdUri属性の属性値により、TTML処理モードとして、モード２−１が設定されていることになる。 In the MPD file of FIG. 13, “atsc: ttmlMode: sampleTimeOnly” is specified as the schemeIdUri attribute of the EssentialProperty element in the AdaptationSet element under the Period element of the MPD element that is the root element. That is, the attribute value of the schemeIdUri attribute of the EssentialProperty element indicates that the mode 2-1 is set as the TTML processing mode.

（字幕表示タイミング例）
図１４は、モード２−１での運用が行われる場合の字幕の表示タイミングの例を示す図である。 (Example of subtitle display timing)
FIG. 14 is a diagram illustrating an example of subtitle display timing when the operation is performed in the mode 2-1.

図１４Ａに示すように、MPDファイルには、MPD要素のavailabilityStartTime属性に、ストリーミング配信の開始時刻が指定される。また、MPDファイルには、Period要素として、Period(1)，Period(2)，・・・が指定され、それらのPeriod要素には、start属性として、各Period(期間)の開始時刻が指定される。ここでは、availabilityStartTime属性で指定される開始時刻と、Period(2)の開始時刻との和(MPD/@availabilityStartTime + MPD/Period(2)/@start)により、Period(2)の先頭の時刻が求められる。 As shown in FIG. 14A, in the MPD file, the start time of streaming distribution is specified in the availabilityStartTime attribute of the MPD element. In the MPD file, Period (1), Period (2), ... are specified as Period elements, and the start time of each Period (period) is specified as a start attribute in those Period elements. You. Here, the sum of the start time specified by the availabilityStartTime attribute and the start time of Period (2) (MPD / @ availabilityStartTime + MPD / Period (2) / @ start) indicates that the start time of Period (2) is Desired.

図１４Ｂには、Period(2)におけるTTMLセグメント(Segment)が模式的に示されている。
このTTMLセグメントのmdatボックスに格納されたTTMLサンプル(sample)から、TTMLファイル(図１４Ｃ)が得られるが、この運用時に取得されるMPDファイル(図１３)には、EssentialProperty要素のschemeIdUri属性に、"atsc:ttmlMode:sampleTimeOnly"が指定され、TTML処理モードとして、モード２−１が設定されているので、TTMLファイルのp要素に指定された時間情報(begin属性の"t1"と、end属性の"t3")は無視される。 FIG. 14B schematically shows a TTML segment (Segment) in Period (2).
From the TTML sample (sample) stored in the mdat box of this TTML segment, a TTML file (FIG. 14C) is obtained. In the MPD file (FIG. 13) acquired during this operation, the schemeIdUri attribute of the EssentialProperty element includes Since "atsc: ttmlMode: sampleTimeOnly" is specified and mode 2-1 is set as the TTML processing mode, the time information ("t1" of the begin attribute and "t1" of the end attribute) specified in the p element of the TTML file are set. "t3") is ignored.

すなわち、モード２−１では、TTMLファイルに指定された時間情報を無視して、TTMLセグメントのmoofボックスに格納された時間情報(TTMLサンプルごとの時間情報)を用いることになる。ただし、ここでは、TTMLセグメントにおいて、１つのmoofボックス(に格納された時間情報)に対して、１つのTTMLサンプル(mdatボックスに格納されたTTMLサンプル)であることを前提としている。 That is, in the mode 2-1, the time information specified in the TTML file is ignored, and the time information (time information for each TTML sample) stored in the moof box of the TTML segment is ignored. Here, it is assumed that one TTML sample is one TTML sample (TTML sample stored in the mdat box) with respect to one moof box (time information stored in the TTML segment).

具体的には、図１４に示すように、MPDファイルのavailabilityStartTime属性で指定される開始時刻と、Period(2)の開始時刻との和(MPD/@availabilityStartTime + MPD/Period(2)/@start)に応じたPeriod(2)の先頭の時刻を基準として、moofボックス(moof/mfhd/tfdt)に格納されたBMDTに応じた時間(BMDT×ts)を経過したとき、TTMLファイルのp要素に指定された"text1"である字幕の表示が開始される。 Specifically, as shown in FIG. 14, the sum of the start time specified by the availabilityStartTime attribute of the MPD file and the start time of Period (2) (MPD / @ availabilityStartTime + MPD / Period (2) / @ start When the time (BMDT × ts) corresponding to the BMDT stored in the moof box (moof / mfhd / tfdt) elapses with reference to the time at the beginning of Period (2) according to), the p element of the TTML file is used. The display of the subtitle specified by "text1" is started.

ただし、BMDT(Base Media Decode Time)は、Period(2)の先頭の時刻から、TTMLサンプルのデコードを開始するまでの時間、すなわち、オフセットを表している。また、BMDTに乗算されるtsは、タイムスケール(timescale)を表しており、BMDTの値を、MPD時間軸上の値に変換するために用いられる。 However, BMDT (Base Media Decode Time) represents the time from the start time of Period (2) to the start of decoding TTML samples, that is, an offset. The ts multiplied by the BMDT represents a time scale, and is used to convert the value of the BMDT into a value on the MPD time axis.

そして、moofボックス(moof/mfhd/trun)に格納されたSampleDurationに応じた時間(SampleDuration×ts)の間は、"text1"である字幕の表示が継続され、SampleDurationに応じた時間を経過したとき、"text1"である字幕の表示が終了される。 Then, during the time (SampleDuration × ts) corresponding to the SampleDuration stored in the moof box (moof / mfhd / trun), the display of the subtitle “text1” is continued, and the time corresponding to the SampleDuration has elapsed. The display of the subtitle "text1" is terminated.

ただし、SampleDurationは、TTMLサンプルの継続時間を表している。また、SampleDurationに乗算されるtsは、SampleDurationの値を、MPD時間軸上の値に変換するために用いられる。 Here, SampleDuration represents the duration of the TTML sample. The ts multiplied by SampleDuration is used to convert the value of SampleDuration to a value on the MPD time axis.

これにより、ATSCクライアント２０においては、テレビ番組などのコンテンツが再生されている場合に、Period(2)の先頭の時刻を基準として、moofボックスに格納されるBMDTに応じた時間の経過後であって、SampleDurationに応じた時間を経過するまでの間は、"text1"である字幕が映像に重畳表示されることになる。 With this, in the ATSC client 20, when a content such as a television program is being played, a time corresponding to the BMDT stored in the moof box has elapsed with reference to the leading time of Period (2). Until the time corresponding to SampleDuration elapses, the subtitle “text1” is superimposed on the video.

以上、TTML処理モードとして、モード２−１が設定された場合の運用例について説明した。このモード２−１は、MP4のファイルフォーマットで規定される時間情報(TTMLサンプルごとのBMDTとSampleDuration)を利用して、TTMLサンプルのBMDTに応じた時間に字幕の表示を開始して、SampleDurationに応じた時間の間だけその字幕の表示を継続させることで、所望のタイミングで字幕を表示させることができる。 The operation example in the case where the mode 2-1 is set as the TTML processing mode has been described above. This mode 2-1 uses the time information (BMDT and TT for each TTML sample and SampleDuration) specified in the MP4 file format to start displaying subtitles at a time corresponding to the BMDT of the TTML sample. By continuing to display the subtitles only during the appropriate time, the subtitles can be displayed at a desired timing.

（２−２）モード２−２：Sample Time Only But Till Next (2-2) Mode 2-2: Sample Time Only But Till Next

（MPDファイルの記述例）
図１５は、モード２−２での運用が行われる場合のMPDファイルの記述例を示す図である。 (Example of MPD file description)
FIG. 15 is a diagram illustrating a description example of the MPD file when the operation is performed in the mode 2-2.

図１５のMPDファイルにおいて、ルート要素であるMPD要素のPeriod要素の配下のAdaptationSet要素には、EssentialProperty要素のschemeIdUri属性として、"atsc:ttmlMode:sampleTimeOnlyButTillNext"が指定されている。すなわち、このEssentialProperty要素のschemeIdUri属性の属性値により、TTML処理モードとして、モード２−２が設定されていることになる。 In the MPD file of FIG. 15, "atsc: ttmlMode: sampleTimeOnlyButTillNext" is specified as the schemeIdUri attribute of the EssentialProperty element in the AdaptationSet element under the Period element of the MPD element which is the root element. That is, the attribute value of the schemeIdUri attribute of the EssentialProperty element indicates that the mode 2-2 is set as the TTML processing mode.

（字幕表示タイミング例）
図１６は、モード２−２での運用が行われる場合の字幕の表示タイミングの例を示す図である。 (Example of subtitle display timing)
FIG. 16 is a diagram illustrating an example of subtitle display timing when the operation is performed in the mode 2-2.

図１６Ａに示すように、MPDファイルには、MPD要素のavailabilityStartTime属性に、ストリーミング配信の開始時刻が指定される。また、MPDファイルには、Period要素として、Period(1)，Period(2)，・・・が指定され、それらのPeriod要素には、start属性として、各Period(期間)の開始時刻が指定される。ここでは、availabilityStartTime属性で指定される開始時刻と、Period(2)の開始時刻との和(MPD/@availabilityStartTime + MPD/Period(2)/@start)により、Period(2)の先頭の時刻が求められる。 As shown in FIG. 16A, in the MPD file, the start time of streaming distribution is specified in the availabilityStartTime attribute of the MPD element. In the MPD file, Period (1), Period (2), ... are specified as Period elements, and the start time of each Period (period) is specified as a start attribute in those Period elements. You. Here, the sum of the start time specified by the availabilityStartTime attribute and the start time of Period (2) (MPD / @ availabilityStartTime + MPD / Period (2) / @ start) indicates that the start time of Period (2) is Desired.

図１６Ｂには、Period(2)におけるTTMLセグメント(Segment)が模式的に示されている。
このTTMLセグメントのmdatボックスに格納されたTTMLサンプル(sample)から、TTMLファイル(図１６Ｃ)が得られるが、この運用時に取得されるMPDファイル(図１５)には、EssentialProperty要素のschemeIdUri属性に、"atsc:ttmlMode:sampleTimeOnlyButTillNext"が指定され、TTML処理モードとして、モード２−２が設定されているので、TTMLファイルのp要素に指定された時間情報(begin属性の"t1"と、end属性の"t3")は無視される。 FIG. 16B schematically illustrates a TTML segment (Segment) in Period (2).
From the TTML sample (sample) stored in the mdat box of this TTML segment, a TTML file (FIG. 16C) is obtained. In the MPD file (FIG. 15) obtained at the time of this operation, the schemeIdUri attribute of the EssentialProperty element includes Since "atsc: ttmlMode: sampleTimeOnlyButTillNext" is specified and mode 2-2 is set as the TTML processing mode, the time information (begin attribute "t1" and end attribute "t1") specified in the p element of the TTML file "t3") is ignored.

すなわち、モード２−２では、TTMLファイルに指定された時間情報を無視して、TTMLセグメントのmoofボックスに格納された時間情報(TTMLサンプルごとの時間情報)を用いることになる。ただし、ここでは、TTMLセグメントにおいて、１つのmoofボックス(に格納された時間情報)に対して、１つのTTMLサンプル(mdatボックスに格納されたTTMLサンプル)であることを前提としている。 That is, in the mode 2-2, the time information specified in the TTML file is ignored, and the time information (time information for each TTML sample) stored in the moof box of the TTML segment is ignored. Here, it is assumed that one TTML sample is one TTML sample (TTML sample stored in the mdat box) with respect to one moof box (time information stored in the TTML segment).

具体的には、図１６に示すように、MPDファイルのavailabilityStartTime属性で指定される開始時刻と、Period(2)の開始時刻との和(MPD/@availabilityStartTime + MPD/Period(2)/@start)に応じたPeriod(2)の先頭の時刻を基準として、対象のTTMLサンプル(対象の字幕を指定するTTMLファイルのTTMLサンプル)を格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間(BMDT×ts)を経過したとき、TTMLファイルのp要素に指定された"text1"である字幕の表示が開始される。 Specifically, as shown in FIG. 16, the sum of the start time specified by the availabilityStartTime attribute of the MPD file and the start time of Period (2) (MPD / @ availabilityStartTime + MPD / Period (2) / @ start According to the BMDT stored in the moof box corresponding to the mdat box that stores the target TTML sample (TTML sample of the TTML file that specifies the target subtitle), based on the start time of Period (2) according to) When the elapsed time (BMDT × ts) has elapsed, the display of the subtitle “text1” specified in the p element of the TTML file is started.

そして、次のTTMLサンプル(次の字幕を指定するTTMLファイルのTTMLサンプル)を格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間(BMDT×ts)を経過するまでの間は、"text1"である字幕の表示が継続され、そのBMDTに応じた時間(BMDT×ts)が経過したとき、"text1"である字幕の表示が終了される。 Then, until the time (BMDT × ts) corresponding to the BMDT stored in the moof box corresponding to the mdat box storing the next TTML sample (TTML sample of the TTML file specifying the next subtitle) is elapsed , The display of the subtitle “text1” is continued, and when the time (BMDT × ts) corresponding to the BMDT has elapsed, the display of the subtitle “text1” ends.

これにより、ATSCクライアント２０においては、テレビ番組などのコンテンツが再生されている場合に、Period(2)の先頭の時刻を基準として、対象のTTMLサンプルを含むTTMLセグメント(のmoofボックス)のBMDTに応じた時間の経過後であって、次のTTMLサンプルを含むTTMLセグメント(のmoofボックス)のBMDTに応じた時間を経過するまでの間は、"text1"である字幕が映像に重畳表示されることになる。 Thereby, in the ATSC client 20, when a content such as a television program is reproduced, the BMDT of the TTML segment (the moof box) including the target TTML sample is set based on the leading time of Period (2). After the elapse of the corresponding time, and until the elapse of the time corresponding to the BMDT of (the moof box of) the TTML segment including the next TTML sample, the subtitle "text1" is superimposed on the video. Will be.

なお、図１６には図示していないが、次のTTMLサンプル(次の字幕を指定するTTMLファイルのTTMLサンプル)が取得されたとき、"text1"である字幕の表示が終了されるとともに、次のTTMLサンプル(TTMLファイル)により指定される字幕(例えば、"text2")の表示が開始される。 Although not shown in FIG. 16, when the next TTML sample (a TTML sample of the TTML file specifying the next subtitle) is obtained, the display of the subtitle “text1” is terminated and the next TTML sample is displayed. The display of the subtitle (for example, "text2") specified by the TTML sample (TTML file) is started.

以上、TTML処理モードとして、モード２−２が設定された場合の運用例について説明した。このモード２−２は、MP4のファイルフォーマットで規定される時間情報(TTMLサンプルごとのBMDT)を利用して、対象のTTMLサンプルのBMDTに応じた時間に字幕の表示を開始して、次のTTMLサンプルのBMDTに応じた時間までその字幕の表示を継続させることで、所望のタイミングで字幕を表示させることができる。なお、モード２−１とモード２−２は、字幕の表示を開始するタイミングを、BMDTに応じた時間で指定する点では共通しているが、表示している字幕を終了させるタイミングを、SampleDurationに応じた時間で指定するのか、あるいはBMDTに応じた時間で指定するのかという点では異なっている。 The operation example in the case where the mode 2-2 is set as the TTML processing mode has been described above. This mode 2-2 uses the time information (BMDT for each TTML sample) specified in the MP4 file format to start displaying subtitles at a time corresponding to the BMDT of the target TTML sample. By continuing the display of the caption until the time corresponding to the BMDT of the TTML sample, the caption can be displayed at a desired timing. The mode 2-1 and the mode 2-2 are common in that the timing of starting the display of subtitles is specified by a time corresponding to the BMDT, but the timing of ending the displayed subtitles is set to SampleDuration. The difference is in whether to specify the time according to the time or the time according to the BMDT.

例えば、コンテンツ事業者等が、映像と音声のパッケージをコンテンツとして制作して、放送事業者等が、当該コンテンツに対して、字幕を後から追加する場合や、字幕の表示タイミングを設定する場合などには、TTMLファイルで字幕の表示タイミングを指定するよりも、MP4のファイルフォーマットで規定される時間情報(TTMLサンプルごとの時間情報)で字幕の表示タイミングを指定したほうが、運用上適当である場合が想定される。このような運用を実現するためには、MPDファイルにおいて、TTML処理モードとして、モード２−１又はモード２−２を設定することで、ATSCクライアント２０では、MP4のファイルフォーマットで規定される時間情報(TTMLサンプルごとの時間情報)に応じたタイミングで、字幕が表示されることになる。 For example, when a content provider creates a video and audio package as content, and a broadcaster adds subtitles to the content later, or sets the display timing of the subtitles, etc. When it is more appropriate for operation to specify the subtitle display timing with the time information (time information for each TTML sample) specified in the MP4 file format, rather than specifying the subtitle display timing in the TTML file Is assumed. In order to realize such an operation, the mode 2-1 or the mode 2-2 is set as the TTML processing mode in the MPD file. (Subtitles are displayed at timings according to (time information for each TTML sample)).

（３）モード３：Asap (3) Mode 3: Asap

次に、TTML処理モードとして、モード３が設定された場合の運用例を、図１７及び図１８を参照して説明する。このモード３では、TTMLファイルに指定された時間情報と、mp4のファイルフォーマットで規定される時間情報(TTMLサンプルごとの時間情報)を無視して、TTMLファイル(TTMLサンプル)を取得後、即時に、字幕を表示するための処理が行われる。 Next, an operation example when mode 3 is set as the TTML processing mode will be described with reference to FIGS. In this mode 3, immediately after acquiring the TTML file (TTML sample), ignoring the time information specified in the TTML file and the time information (time information for each TTML sample) specified in the mp4 file format, , A process for displaying subtitles is performed.

（MPDファイルの記述例）
図１７は、モード３での運用が行われる場合のMPDファイルの記述例を示す図である。 (Example of MPD file description)
FIG. 17 is a diagram illustrating a description example of an MPD file in the case where operation in mode 3 is performed.

図１７のMPDファイルにおいて、ルート要素であるMPD要素のPeriod要素の配下のAdaptationSet要素には、EssentialProperty要素のschemeIdUri属性として、"atsc:ttmlMode:asap"が指定されている。すなわち、このEssentialProperty要素のschemeIdUri属性の属性値により、TTML処理モードとして、モード３が設定されていることになる。 In the MPD file of FIG. 17, “atsc: ttmlMode: asap” is specified as the schemeIdUri attribute of the EssentialProperty element in the AdaptationSet element under the Period element of the MPD element that is the root element. That is, the attribute value of the schemeIdUri attribute of the EssentialProperty element indicates that the mode 3 is set as the TTML processing mode.

（字幕表示タイミング例）
図１８は、モード３での運用が行われる場合の字幕の表示タイミングの例を示す図である。 (Example of subtitle display timing)
FIG. 18 is a diagram illustrating an example of subtitle display timing when operation in mode 3 is performed.

図１８Ａに示すように、MPDファイルには、MPD要素のavailabilityStartTime属性に、ストリーミング配信の開始時刻が指定される。また、MPDファイルには、Period要素として、Period(1)，Period(2)，・・・が指定され、それらのPeriod要素には、start属性として、各Period(期間)の開始時刻が指定される。 As shown in FIG. 18A, in the MPD file, the start time of streaming distribution is specified in the availabilityStartTime attribute of the MPD element. In the MPD file, Period (1), Period (2), ... are specified as Period elements, and the start time of each Period (period) is specified as a start attribute in those Period elements. You.

図１８Ｂには、Period(2)のTTMLセグメント(Segment)が模式的に示されている。このTTMLセグメントのmdatボックスに格納されたTTMLサンプル(sample)から、TTMLファイル(図１８Ｃ)が得られるが、この運用時に取得されるMPDファイル(図１７)には、EssentialProperty要素のschemeIdUri属性に、"atsc:ttmlMode:asap"が指定され、TTML処理モードとして、モード３が設定されているので、TTMLファイルのp要素に指定された時間情報(begin属性の"t1"と、end属性の"t3")は無視される。また、TTML処理モードとして、モード３が設定されている場合には、当該TTMLセグメントのmoofボックスに格納される時間情報(BMDTやSampleDuration)も無視される。 FIG. 18B schematically illustrates a TTML segment (Segment) of Period (2). From the TTML sample (sample) stored in the mdat box of this TTML segment, a TTML file (FIG. 18C) is obtained. In the MPD file (FIG. 17) obtained at the time of this operation, the schemeId Since "atsc: ttmlMode: asap" is specified and mode 3 is set as the TTML processing mode, the time information ("t1" of the begin attribute and "t3" of the end attribute) specified in the p element of the TTML file are set. ") Is ignored. When mode 3 is set as the TTML processing mode, the time information (BMDT and SampleDuration) stored in the moof box of the TTML segment is also ignored.

すなわち、モード３では、TTMLファイルに指定された時間情報と、TTMLサンプルごとの時間情報を共に無視して、TTMLファイル(TTMLサンプル)を取得した後、即時に処理を行い、TTMLファイルで指定される字幕が表示されるようにする。 That is, in mode 3, the time information specified in the TTML file and the time information for each TTML sample are both ignored, the TTML file (TTML sample) is obtained, and the process is immediately performed. Subtitles are displayed.

具体的には、図１８に示すように、ATSCクライアント２０では、TTMLセグメントが取得された場合に、そのTTMLセグメントのTTMLサンプルからTTMLファイルが得られた時点で、即時にそのTTMLファイルが処理され、p要素のbegin属性やend属性の値を無視して、そのp要素に指定された"text1"である字幕の表示が開始される。 Specifically, as shown in FIG. 18, when the TTML segment is obtained, the ATSC client 20 immediately processes the TTML file when the TTML file is obtained from the TTML sample of the TTML segment. , Ignoring the values of the begin attribute and end attribute of the p element, the display of the subtitle "text1" specified in the p element is started.

そして、次のTTMLファイル(TTMLサンプル)を含むTTMLセグメントが取得されるまでの間は、"text1"である字幕の表示が継続され、次のTTMLファイル(TTMLサンプル)を含むTTMLセグメントが取得されたとき、"text1"である字幕の表示が終了される。 Until the TTML segment including the next TTML file (TTML sample) is obtained, the display of the subtitle "text1" is continued, and the TTML segment including the next TTML file (TTML sample) is obtained. Then, the display of the subtitle "text1" ends.

これにより、ATSCクライアント２０においては、テレビ番組などのコンテンツが再生されている場合に、対象のTTMLファイルのTTMLサンプルを含むTTMLセグメントが取得されてから、次のTTMLファイルのTTMLサンプルを含むTTMLセグメントが取得されるまでの間は、対象のTTMLファイルのp属性に指定される"text1"である字幕が映像に重畳表示されることになる。 Thereby, in the ATSC client 20, when a content such as a television program is being reproduced, a TTML segment including a TTML sample of the target TTML file is obtained, and then a TTML segment including a TTML sample of the next TTML file is obtained. Until is acquired, the subtitle “text1” specified in the p attribute of the target TTML file is superimposed on the video.

なお、図１８には図示していないが、次のTTMLファイルが取得されたとき、"text1"である字幕の表示が終了されるとともに、次のTTMLファイルにより指定される字幕(例えば、"text2")の表示が開始される。 Although not shown in FIG. 18, when the next TTML file is obtained, the display of the subtitle “text1” is terminated, and the subtitle specified by the next TTML file (for example, “text2”) The display of ") starts.

以上、TTML処理モードとして、モード３が設定された場合の運用例について説明した。
このモード３は、TTMLファイルに指定された時間情報と、mp4のファイルフォーマットで規定される時間情報(TTMLサンプルごとの時間情報)を無視して、TTMLファイル(TTMLサンプル)を取得したとき、即時に、当該TTMLファイルに指定される字幕を表示させることで、所望のタイミングで字幕を表示させることができる。 The operation example in the case where the mode 3 is set as the TTML processing mode has been described above.
In mode 3, when the time information specified in the TTML file and the time information specified in the mp4 file format (time information for each TTML sample) are ignored and the TTML file (TTML sample) is obtained, Then, by displaying the subtitle specified in the TTML file, the subtitle can be displayed at a desired timing.

例えば、ATSCサーバ１０において、スポーツ中継などのライブ放送のコンテンツを、伝送路３０を介して、複数のATSCクライアント２０に配信する場合には、ライブ映像よりも字幕が遅れることが前提ではあるが、少しでもそのずれを少なくすることが要求されるため、字幕の表示時間を指定するのではなく、ベストエフォートで、ATSCクライアント２０が、TTMLファイルを受信したら、即時に字幕を表示させる運用が適当であると考えられる。このような運用を実現するためには、MPDファイルにおいて、TTML処理モードとして、モード３を設定することで、ATSCクライアント２０では、TTMLファイルが受信された後、即時に、字幕が表示されることになる。 For example, in the case of distributing live broadcast contents such as sports broadcasts to a plurality of ATSC clients 20 via the transmission path 30 in the ATSC server 10, it is premised that subtitles are delayed from live video. Since it is required to reduce the deviation as much as possible, rather than specifying the display time of the subtitles, it is appropriate to operate the ATSC client 20 to display the subtitles immediately after receiving the TTML file with best effort. It is believed that there is. In order to realize such an operation, by setting mode 3 as the TTML processing mode in the MPD file, the ATSC client 20 displays subtitles immediately after the TTML file is received. become.

＜４．各装置の構成＞ <4. Configuration of each device>

次に、図１９乃至図２２を参照して、図１の伝送システム１を構成する各装置の詳細な構成について説明する。 Next, a detailed configuration of each device included in the transmission system 1 of FIG. 1 will be described with reference to FIGS.

（ATSCサーバの構成）
図１９は、図１のATSCサーバ１０の構成例を示す図である。 (Configuration of ATSC server)
FIG. 19 is a diagram showing a configuration example of the ATSC server 10 of FIG.

図１９において、ATSCサーバ１０は、AVサーバ１０１、TTMLサーバ１０２、DASHサーバ１０３、及び、放送サーバ１０４から構成される。例えば、AVサーバ１０１、TTMLサーバ１０２、及び、DASHサーバ１０３は、コンテンツ事業者や放送事業者等の事業者により提供される。また、例えば、放送サーバ１０４は、放送事業者等の事業者により提供される。 In FIG. 19, the ATSC server 10 includes an AV server 101, a TTML server 102, a DASH server 103, and a broadcast server 104. For example, the AV server 101, the TTML server 102, and the DASH server 103 are provided by a provider such as a content provider or a broadcaster. Further, for example, the broadcast server 104 is provided by a business operator such as a broadcast business operator.

AVサーバ１０１は、コンテンツを構成するビデオとオーディオ(のストリーム)のデータを取得して処理し、DASHサーバ１０３に提供する。 The AV server 101 acquires (processes) video and audio data of the content, processes the data, and provides the data to the DASH server 103.

TTMLサーバ１０２は、コンテンツの映像に重畳される字幕情報としてのTTMLファイルを生成して処理し、DASHサーバ１０３に提供する。 The TTML server 102 generates and processes a TTML file as caption information to be superimposed on the video of the content, and provides it to the DASH server 103.

DASHサーバ１０３は、MPDファイルを生成する。また、DASHサーバ１０３は、AVサーバ１０１から提供されるビデオとオーディオのデータと、TTMLサーバ１０２から供給されるTTMLファイルを処理して、セグメント(セグメントデータ)を生成する。DASHサーバ１０３は、MPDファイルとセグメントデータを、放送サーバ１０４に提供する。 The DASH server 103 generates an MPD file. The DASH server 103 processes video and audio data provided from the AV server 101 and a TTML file supplied from the TTML server 102 to generate a segment (segment data). The DASH server 103 provides the MPD file and the segment data to the broadcast server 104.

放送サーバ１０４は、LLSシグナリング情報やSLSシグナリング情報等のシグナリング情報を生成する。また、放送サーバ１０４は、セグメントデータ(TTMLファイルを含む)やシグナリング情報(MPDファイルを含む)を、デジタル放送信号として、アンテナ１０５を介して送信する。 The broadcast server 104 generates signaling information such as LLS signaling information and SLS signaling information. Further, the broadcast server 104 transmits segment data (including TTML files) and signaling information (including MPD files) as digital broadcast signals via the antenna 105.

（ATSCサーバの詳細な構成）
図２０は、図１９のATSCサーバ１０の詳細な構成例を示す図である。 (Detailed configuration of ATSC server)
FIG. 20 is a diagram showing a detailed configuration example of the ATSC server 10 of FIG.

図２０において、ATSCサーバ１０は、ビデオデータ取得部１１１、ビデオエンコーダ１１２、オーディオデータ取得部１１３、オーディオエンコーダ１１４、字幕生成部１１５、字幕エンコーダ１１６、シグナリング生成部１１７、シグナリング処理部１１８、セグメント処理部１１９、マルチプレクサ１２０、及び、送信部１２１から構成される。 20, the ATSC server 10 includes a video data acquisition unit 111, a video encoder 112, an audio data acquisition unit 113, an audio encoder 114, a subtitle generation unit 115, a subtitle encoder 116, a signaling generation unit 117, a signaling processing unit 118, and a segment process. It comprises a unit 119, a multiplexer 120, and a transmission unit 121.

ここで、ATSCサーバ１０が有する機能を実現するための各ブロックは、図１９のAVサーバ１０１乃至放送サーバ１０４のいずれかのサーバの構成に含まれるが、例えば、次のような構成を採用することができる。すなわち、図２０において、ビデオデータ取得部１１１、ビデオエンコーダ１１２、オーディオデータ取得部１１３、及び、オーディオエンコーダ１１４は、AVサーバ１０１の構成に含まれる。 Here, each block for realizing the function of the ATSC server 10 is included in the configuration of any one of the AV server 101 to the broadcast server 104 in FIG. 19. For example, the following configuration is adopted. be able to. That is, in FIG. 20, the video data acquisition unit 111, the video encoder 112, the audio data acquisition unit 113, and the audio encoder 114 are included in the configuration of the AV server 101.

また、図２０において、字幕生成部１１５、及び、字幕エンコーダ１１６は、TTMLサーバ１０２の構成に含まれる。図２０において、シグナリング生成部１１７、シグナリング処理部１１８、及び、セグメント処理部１１９は、DASHサーバ１０３の構成に含まれる。
さらに、図２０において、シグナリング生成部１１７、シグナリング処理部１１８、マルチプレクサ１２０、及び、送信部１２１は、放送サーバ１０４の構成に含まれる。 Also, in FIG. 20, the caption generation unit 115 and the caption encoder 116 are included in the configuration of the TTML server 102. 20, the signaling generation unit 117, the signaling processing unit 118, and the segment processing unit 119 are included in the configuration of the DASH server 103.
Further, in FIG. 20, the signaling generation unit 117, the signaling processing unit 118, the multiplexer 120, and the transmission unit 121 are included in the configuration of the broadcast server 104.

ただし、図２０に示した構成は一例であって、例えば、AVサーバ１０１とTTMLサーバ１０２の両方の機能を有する１台のサーバを構成したり、あるいは、DASHサーバ１０３と放送サーバ１０４の両方の機能を有する１台のサーバを構成したりするなど、各サーバを構成するブロックの組み合わせは任意であって、図２０の構成以外の他の構成を採用することができる。 However, the configuration shown in FIG. 20 is an example, and for example, one server having both functions of the AV server 101 and the TTML server 102 may be configured, or both the DASH server 103 and the broadcast server 104 may be configured. The combination of blocks constituting each server is arbitrary, such as configuring one server having a function, and other configurations other than the configuration in FIG. 20 can be adopted.

ビデオデータ取得部１１１は、外部のサーバ、カメラ、又は記録媒体等から、コンテンツのビデオデータを取得し、ビデオエンコーダ１１２に供給する。ビデオエンコーダ１１２は、ビデオデータ取得部１１１から供給されるビデオデータを、所定の符号化方式に準拠して符号化し、セグメント処理部１１９に供給する。 The video data acquisition unit 111 acquires content video data from an external server, camera, recording medium, or the like, and supplies the video data to the video encoder 112. The video encoder 112 encodes the video data supplied from the video data acquisition unit 111 according to a predetermined encoding method, and supplies the encoded video data to the segment processing unit 119.

オーディオデータ取得部１１３は、外部のサーバ、マイクロフォン、又は記録媒体等から、コンテンツのオーディオデータを取得し、オーディオエンコーダ１１４に供給する。
オーディオエンコーダ１１４は、オーディオデータ取得部１１３から供給されるオーディオデータを、所定の符号化方式に準拠して符号化し、セグメント処理部１１９に供給する。 The audio data acquisition unit 113 acquires audio data of the content from an external server, microphone, recording medium, or the like, and supplies the audio data to the audio encoder 114.
The audio encoder 114 encodes the audio data supplied from the audio data acquisition unit 113 in accordance with a predetermined encoding scheme, and supplies the encoded audio data to the segment processing unit 119.

字幕生成部１１５は、字幕データとして、TTML形式のTTMLファイルを生成し、字幕エンコーダ１１６に供給する。字幕エンコーダ１１６は、字幕生成部１１５から供給される字幕データを、所定の符号化方式に準拠して符号化し、セグメント処理部１１９に供給する。 The caption generation unit 115 generates a TTML file in the TTML format as caption data and supplies the TTML file to the caption encoder 116. The subtitle encoder 116 encodes the subtitle data supplied from the subtitle generation unit 115 according to a predetermined encoding method, and supplies the encoded data to the segment processing unit 119.

シグナリング生成部１１７は、シグナリング情報を生成し、シグナリング処理部１１８に供給する。シグナリング処理部１１８は、シグナリング生成部１１７から供給されるシグナリング情報を処理し、マルチプレクサ１２０に供給する。ここでは、例えば、SLTメタデータ等のLLSシグナリング情報や、USDメタデータやLSIDメタデータ、MPDメタデータ(MPDファイル)等のSLSシグナリング情報が生成され、処理される。ただし、MPDファイルには、TTML処理モードを選択するための選択情報が含まれる。 The signaling generation unit 117 generates signaling information and supplies it to the signaling processing unit 118. The signaling processing unit 118 processes the signaling information supplied from the signaling generation unit 117 and supplies the signaling information to the multiplexer 120. Here, for example, LLS signaling information such as SLT metadata, and SLS signaling information such as USD metadata, LSID metadata, and MPD metadata (MPD file) are generated and processed. However, the MPD file includes selection information for selecting the TTML processing mode.

セグメント処理部１１９は、ビデオエンコーダ１１２から供給されるビデオデータ、オーディオエンコーダ１１４から供給されるオーディオデータ、及び、字幕エンコーダ１１６から供給される字幕データに基づいて、MP4のファイルフォーマットに準拠したセグメント(セグメントデータ)を生成し、マルチプレクサ１２０に供給する。 The segment processing unit 119, based on the video data supplied from the video encoder 112, the audio data supplied from the audio encoder 114, and the subtitle data supplied from the subtitle encoder 116, creates a segment ( Segment data) and supplies it to the multiplexer 120.

マルチプレクサ１２０は、セグメント処理部１１９から供給されるセグメントデータと、シグナリング処理部１１８から供給されるシグナリング情報を多重化して、その結果得られる多重化ストリームを、送信部１２１に供給する。 The multiplexer 120 multiplexes the segment data supplied from the segment processing unit 119 and the signaling information supplied from the signaling processing unit 118, and supplies the resulting multiplexed stream to the transmission unit 121.

送信部１２１は、マルチプレクサ１２０から供給される多重化ストリームを、アンテナ１０５を介して、デジタル放送の放送波(デジタル放送信号)として送信する。 The transmitting unit 121 transmits the multiplexed stream supplied from the multiplexer 120 as a broadcast wave of digital broadcasting (digital broadcasting signal) via the antenna 105.

ATSCサーバ１０は、以上のように構成される。 The ATSC server 10 is configured as described above.

（ATSCクライアントの詳細な構成）
図２１は、図１のATSCクライアント２０の詳細な構成例を示す図である。 (Detailed configuration of ATSC client)
FIG. 21 is a diagram showing a detailed configuration example of the ATSC client 20 of FIG.

図２１において、ATSCクライアント２０は、受信部２１２、デマルチプレクサ２１３、制御部２１４、メモリ２１５、入力部２１６、ビデオデコーダ２１７、ビデオ出力部２１８、オーディオデコーダ２１９、オーディオ出力部２２０、字幕デコーダ２２１、表示部２２２、及び、スピーカ２２３から構成される。なお、図２１の構成では、表示部２２２とスピーカ２２３を含む構成を示しているが、表示部２２２とスピーカ２２３を含めない構成としてもよい。 21, the ATSC client 20 includes a receiving unit 212, a demultiplexer 213, a control unit 214, a memory 215, an input unit 216, a video decoder 217, a video output unit 218, an audio decoder 219, an audio output unit 220, a subtitle decoder 221, It comprises a display unit 222 and a speaker 223. Although the configuration shown in FIG. 21 includes the display unit 222 and the speaker 223, the configuration may not include the display unit 222 and the speaker 223.

受信部２１２は、アンテナ２１１を介して受信されたデジタル放送の放送波(デジタル放送信号)から、ユーザの選局操作に応じた信号を抽出して復調し、その結果得られる多重化ストリームを、デマルチプレクサ２１３に供給する。 The receiving unit 212 extracts and demodulates a signal corresponding to a user's tuning operation from a broadcast wave (digital broadcast signal) of a digital broadcast received via the antenna 211, and demodulates a multiplexed stream obtained as a result. It is supplied to the demultiplexer 213.

デマルチプレクサ２１３は、受信部２１２から供給される多重化ストリームを、オーディオやビデオ、字幕のストリームと、シグナリング情報に分離する。デマルチプレクサ２１３は、ビデオデータをビデオデコーダ２１７に、オーディオデータをオーディオデコーダ２１９に、字幕データを字幕デコーダ２２１に、シグナリング情報を制御部２１４にそれぞれ供給する。なお、ビデオやオーディオ、字幕のデータは、MP4のファイルフォーマットに準拠したセグメント(セグメントデータ)とされる。 The demultiplexer 213 separates the multiplexed stream supplied from the receiving unit 212 into audio, video, subtitle streams, and signaling information. The demultiplexer 213 supplies the video data to the video decoder 217, the audio data to the audio decoder 219, the subtitle data to the subtitle decoder 221, and the signaling information to the control unit 214. Note that the video, audio, and subtitle data are segments (segment data) conforming to the MP4 file format.

制御部２１４は、ATSCクライアント２０の各部の動作を制御する。また、制御部２１４は、デマルチプレクサ２１３から供給されるシグナリング情報に基づいて、コンテンツを再生するために、各部の動作を制御する。 The control unit 214 controls the operation of each unit of the ATSC client 20. Further, the control unit 214 controls the operation of each unit to reproduce the content based on the signaling information supplied from the demultiplexer 213.

メモリ２１５は、NVRAM(Non Volatile RAM)等の不揮発性メモリであって、制御部２１４からの制御に従い、各種のデータを記録する。入力部２１６は、ユーザの操作に応じて、操作信号を制御部２１４に供給する。 The memory 215 is a non-volatile memory such as an NVRAM (Non Volatile RAM), and records various data under the control of the control unit 214. The input unit 216 supplies an operation signal to the control unit 214 according to a user operation.

ビデオデコーダ２１７は、デマルチプレクサ２１３から供給されるビデオデータを、所定の復号方式に準拠して復号し、ビデオ出力部２１８に供給する。ビデオ出力部２１８は、ビデオデコーダ２１７から供給されるビデオデータを、表示部２２２に出力する。これにより、表示部２２２には、ユーザの選局操作に応じたコンテンツの映像が表示される。 The video decoder 217 decodes the video data supplied from the demultiplexer 213 according to a predetermined decoding method, and supplies the video data to the video output unit 218. The video output unit 218 outputs the video data supplied from the video decoder 217 to the display unit 222. As a result, the display unit 222 displays the video of the content according to the user's channel selection operation.

オーディオデコーダ２１９は、デマルチプレクサ２１３から供給されるオーディオデータを所定の復号方式に準拠して復号し、オーディオ出力部２２０に供給する。オーディオ出力部２２０は、オーディオデコーダ２１９から供給されるオーディオデータを、スピーカ２２３に出力する。これにより、スピーカ２２３からは、ユーザの選局操作に応じたコンテンツの音声が出力される。 The audio decoder 219 decodes the audio data supplied from the demultiplexer 213 according to a predetermined decoding method, and supplies the audio data to the audio output unit 220. The audio output unit 220 outputs the audio data supplied from the audio decoder 219 to the speaker 223. As a result, the audio of the content according to the user's channel selection operation is output from the speaker 223.

字幕デコーダ２２１は、デマルチプレクサ２１３から供給される字幕データを所定の復号方式に準拠して復号し、ビデオ出力部２１８に供給する。ビデオ出力部２１８は、字幕デコーダ２２１から供給される字幕データに対応する字幕が、ビデオデコーダ２１７から供給されるビデオデータに対応する映像に重畳して表示されるようにする。これにより、表示部２２２には、ユーザの選局操作に応じたコンテンツの映像に重畳された字幕が表示される。 The subtitle decoder 221 decodes the subtitle data supplied from the demultiplexer 213 according to a predetermined decoding method, and supplies the decoded data to the video output unit 218. The video output unit 218 causes the caption corresponding to the caption data supplied from the caption decoder 221 to be displayed while being superimposed on the video corresponding to the video data supplied from the video decoder 217. As a result, the display unit 222 displays subtitles superimposed on the video of the content according to the user's channel selection operation.

字幕デコーダ２２１は、MP4パーサ２４１及びTTMLパーサ２４２から構成される。MP4パーサ２４１は、デマルチプレクサ２１３からのセグメントデータ(TTMLセグメント)をパースし、その結果得られるTTMLファイルをTTMLパーサ２４２に供給する。TTMLパーサ２４２は、MP4パーサ２４１から供給されるTTMLファイルをパースし、その結果得られる字幕を表示するための情報を、ビデオ出力部２１８に供給する。 The subtitle decoder 221 includes an MP4 parser 241 and a TTML parser 242. The MP4 parser 241 parses the segment data (TTML segment) from the demultiplexer 213, and supplies the resulting TTML file to the TTML parser 242. The TTML parser 242 parses the TTML file supplied from the MP4 parser 241, and supplies information for displaying the resulting subtitle to the video output unit 218.

（ATSCクライアントのソフトウェア構成）
図２２は、図２１のATSCクライアント２０のソフトウェア構成例を示す図である。 (Software configuration of ATSC client)
FIG. 22 is a diagram illustrating a software configuration example of the ATSC client 20 of FIG.

図２２は、図２１に示したATSCクライアント２０の構成を、ソフトウェアの構成として表している。図２２において、ATSCクライアント２０は、放送クライアントミドルウェア２５１及びDASHクライアント２５２から構成される。 FIG. 22 shows the configuration of the ATSC client 20 shown in FIG. 21 as a software configuration. In FIG. 22, the ATSC client 20 includes a broadcast client middleware 251 and a DASH client 252.

放送クライアントミドルウェア２５１は、ATSCサーバ１０から送信されてくる、セグメントデータ(TTMLファイルを含む)やシグナリング情報(MPDファイルを含む)等の各種のデータを取得し、DASHクライアント２５２に提供するための処理を行う。 The broadcast client middleware 251 acquires various data such as segment data (including a TTML file) and signaling information (including an MPD file) transmitted from the ATSC server 10 and provides the DASH client 252 with the acquired data. I do.

DASHクライアント２５２は、放送クライアントミドルウェア２５１から提供されるセグメントデータ(TTMLファイルを含む)やシグナリング情報(MPDファイルを含む)等の各種のデータを処理して、コンテンツを再生するための処理を行う。例えば、DASHクライアント２５２は、MPDファイルに基づいて、TTML処理モードに応じた表示のタイミングで、TTMLファイルに指定される字幕の表示を制御する。 The DASH client 252 processes various data such as segment data (including a TTML file) and signaling information (including an MPD file) provided from the broadcast client middleware 251 to perform processing for reproducing content. For example, the DASH client 252 controls the display of subtitles specified in the TTML file at a display timing according to the TTML processing mode based on the MPD file.

ATSCクライアント２０は、以上のように構成される。 The ATSC client 20 is configured as described above.

＜５．各装置で実行される処理の流れ＞ <5. Process flow executed in each device>

次に、図２３乃至図２６のフローチャートを参照して、図１の伝送システム１を構成する各装置で実行される処理の流れを説明する。 Next, with reference to flowcharts of FIGS. 23 to 26, a flow of processing executed by each device included in the transmission system 1 of FIG. 1 will be described.

（送信処理）
まず、図２３のフローチャートを参照して、図１のATSCサーバ１０により実行される送信処理について説明する。 (Transmission processing)
First, the transmission processing executed by the ATSC server 10 in FIG. 1 will be described with reference to the flowchart in FIG.

ステップＳ１０１においては、コンポーネント・シグナリング処理が行われる。このコンポーネント・シグナリング処理では、AVサーバ１０１で処理されるビデオとオーディオ(のストリーム)のデータや、TTMLサーバ１０２で処理される字幕データ(TTMLファイル)、DASHサーバ１０３で処理されるシグナリング情報(MPDファイル)、放送サーバ１０４で処理されるシグナリング情報(SLTメタデータ、USDメタデータやLSIDメタデータ等)に対する各種の処理が行われ、コンポーネントのデータやシグナリング情報が送信可能とされる。 In step S101, a component signaling process is performed. In this component signaling process, video and audio (stream) data processed by the AV server 101, subtitle data (TTML file) processed by the TTML server 102, and signaling information (MPD File) and signaling information (SLT metadata, USD metadata, LSID metadata, etc.) processed by the broadcast server 104, and component data and signaling information can be transmitted.

なお、ステップＳ１０１のコンポーネント・シグナリング処理の詳細な内容は、図２４のフローチャートを参照して後述する。 The details of the component signaling processing in step S101 will be described later with reference to the flowchart in FIG.

ステップＳ１０２においては、放送サーバ１０４(の送信部１２１等)により送信処理が行われ、ステップＳ１０１の処理で処理されたビデオやオーディオ、字幕のコンポーネントのデータと、シグナリング情報が、デジタル放送信号として、アンテナ１０５を介して送信される。ステップＳ１０２の処理が終了すると、図２３の送信処理は終了される。 In step S102, a transmission process is performed by the broadcast server 104 (the transmission unit 121 and the like), and the video, audio, and subtitle component data and the signaling information processed in the process in step S101 are converted into digital broadcast signals as It is transmitted via the antenna 105. When the processing in step S102 ends, the transmission processing in FIG. 23 ends.

以上、ATSCサーバ１０により実行される送信処理について説明した。 The transmission processing executed by the ATSC server 10 has been described above.

（コンポーネント・シグナリング処理）
ここで、図２４のフローチャートを参照して、図２３のステップＳ１０１の処理に対応するコンポーネント・シグナリング処理の詳細な内容について説明する。なお、図２４においては、説明の簡略化のため、AVサーバ１０１で行われる処理を省略して、TTMLサーバ１０２、DASHサーバ１０３、及び、放送サーバ１０４で行われる処理を中心に説明する。 (Component signaling processing)
Here, the details of the component signaling process corresponding to the process of step S101 in FIG. 23 will be described with reference to the flowchart in FIG. In FIG. 24, for simplification of the description, the processing performed by the AV server 101 is omitted, and the processing performed by the TTML server 102, the DASH server 103, and the broadcast server 104 will be mainly described.

ステップＳ１１１において、TTMLサーバ１０２(の字幕生成部１１５)は、TTMLファイルを生成する。 In step S111, (the caption generation unit 115 of) the TTML server 102 generates a TTML file.

ステップＳ１１２において、TTMLサーバ１０２(の字幕エンコーダ１１６)は、ステップＳ１１１の処理で生成されたTTMLファイルを、MP4のファイルフォーマットに格納する。 In step S112, the TTML server 102 (the subtitle encoder 116 thereof) stores the TTML file generated in the process of step S111 in the MP4 file format.

ステップＳ１１３において、TTMLサーバ１０２は、MP4のファイルフォーマットに格納されたTTMLファイルのセグメント(TTMLセグメント)の生成を、DASHサーバ１０３に要求する。 In step S113, the TTML server 102 requests the DASH server 103 to generate a segment (TTML segment) of the TTML file stored in the MP4 file format.

なお、ここでは、AVサーバ１０１で行われる処理を省略しているが、AVサーバ１０１においても、ビデオとオーディオのデータが、MP4ファイルフォーマットに格納され、そのセグメントの生成要求が、DASHサーバ１０３になされることになる。 Although the processing performed by the AV server 101 is omitted here, the AV server 101 also stores video and audio data in the MP4 file format, and sends a segment generation request to the DASH server 103. Will be done.

ステップＳ１２１において、DASHサーバ１０３は、TTMLサーバ１０２(とAVサーバ１０１)からのセグメントの生成要求を取得する。 In step S121, the DASH server 103 acquires a segment generation request from the TTML server 102 (and the AV server 101).

ステップＳ１２２において、DASHサーバ１０３(のシグナリング生成部１１７)は、MPDファイルを生成する。ここで、MPDファイルには、TTML処理モードを選択するための選択情報として、AdaptationSet要素のEssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性の値に、モード１、モード２−１、モード２−２、又はモード３を識別するための文字列が指定されることになる。 In step S122, (the signaling generation unit 117 of) the DASH server 103 generates an MPD file. Here, in the MPD file, as selection information for selecting the TTML processing mode, the value of the schemeIdUri attribute of the EssentialProperty element of the AdaptationSet element or the schemeIdUri attribute of the SupplementalProperty element is set to mode 1, mode 2-1, mode 2-2, or mode. A character string for identifying 3 is specified.

ステップＳ１２３において、DASHサーバ１０３(のセグメント処理部１１９)は、AVサーバ１０１からのビデオとオーディオ(のストリーム)のデータと、TTMLサーバ１０２からのTTMLファイルを用い、MP4のファイルフォーマットに準拠したセグメント(セグメントデータ)を生成する。 In step S123, the DASH server 103 (segment processing unit 119 thereof) uses the video and audio (stream) data from the AV server 101 and the TTML file from the TTML server 102 to generate a segment conforming to the MP4 file format. Generate (segment data).

ステップＳ１２４において、DASHサーバ１０３は、ステップＳ１２２の処理で生成されたMPDファイルと、ステップＳ１２３の処理で生成されたセグメントデータを、放送サーバ１０４に転送する。 In step S124, the DASH server 103 transfers the MPD file generated in step S122 and the segment data generated in step S123 to the broadcast server 104.

ステップＳ１３１において、放送サーバ１０４は、DASHサーバ１０３からのMPDファイルとセグメントデータを取得する。 In step S131, the broadcast server 104 acquires the MPD file and the segment data from the DASH server 103.

ステップＳ１３２において、放送サーバ１０４(のシグナリング生成部１１７)は、LLSシグナリング情報やSLSシグナリング情報などのシグナリング情報を生成する。 In step S132, (the signaling generation unit 117 of) the broadcast server 104 generates signaling information such as LLS signaling information and SLS signaling information.

そして、ステップＳ１３２の処理が終了すると、処理は、図２３のステップＳ１０１の処理に戻り、それ以降の処理が実行される。すなわち、放送サーバ１０４(の送信部１２１等)では、上述したステップＳ１０２(図２３)の処理が行われ、セグメントデータ(TTMLファイルを含む)やシグナリング情報(MPDファイルを含む)が、デジタル放送信号として送信される。 When the process in step S132 ends, the process returns to the process in step S101 in FIG. 23, and the subsequent processes are executed. That is, in the broadcast server 104 (the transmission unit 121 and the like), the process of step S102 (FIG. 23) described above is performed, and the segment data (including the TTML file) and the signaling information (including the MPD file) are transmitted to the digital broadcast signal. Sent as

以上、ATSCサーバ１０により実行されるコンポーネント・シグナリング処理について説明した。このコンポーネント・シグナリング処理では、ATSCクライアント２０が、コンポーネントのデータやシグナリング情報を用いてコンテンツの再生を行うことができるように、各種の処理が行われる。また、ここでは、コンテンツの映像に、字幕を重畳表示させる場合には、TTML処理モードを選択するための選択情報を含むMPDファイルが生成され、TTMLファイルとともに送信されることになる。 The component signaling processing executed by the ATSC server 10 has been described above. In this component signaling process, various processes are performed so that the ATSC client 20 can reproduce the content using the component data and the signaling information. Also, here, when a caption is superimposed on a video of a content, an MPD file including selection information for selecting a TTML processing mode is generated and transmitted together with the TTML file.

（受信処理）
次に、図２５のフローチャートを参照して、図１のATSCクライアント２０により実行される受信処理について説明する。なお、図２５の受信処理は、例えば、ユーザにより所望のサービスの選局操作が行われた場合に実行される。 (Reception processing)
Next, the reception processing executed by the ATSC client 20 in FIG. 1 will be described with reference to the flowchart in FIG. The reception process in FIG. 25 is executed, for example, when the user performs a channel selection operation of a desired service.

ステップＳ２０１においては、受信部２１２等により受信処理が行われ、アンテナ２１１を介して、ATSCサーバ１０から伝送路３０を介して送信されてくるデジタル放送信号が受信される。 In step S201, reception processing is performed by the reception unit 212 and the like, and a digital broadcast signal transmitted from the ATSC server 10 via the transmission path 30 is received via the antenna 211.

ステップＳ２０２においては、コンポーネント・シグナリング処理が行われる。このコンポーネント・シグナリング処理では、ステップＳ２０１の処理で受信されたデジタル放送信号から得られる、ビデオやオーディオ、字幕のコンポーネントのデータと、シグナリング情報に対する処理が行われ、ユーザの選局操作に応じたコンテンツが再生される。 In step S202, a component signaling process is performed. In the component signaling processing, processing of video, audio, subtitle component data and signaling information obtained from the digital broadcast signal received in the processing of step S201 is performed, and the content according to the user's channel selection operation is performed. Is played.

なお、ステップＳ２０２のコンポーネント・シグナリング処理の詳細な内容は、図２６のフローチャートを参照して後述する。ステップＳ２０２の処理が終了すると、図２５の受信処理は終了される。 The details of the component signaling processing in step S202 will be described later with reference to the flowchart in FIG. When the processing in step S202 ends, the reception processing in FIG. 25 ends.

以上、ATSCクライアント２０により実行される受信処理について説明した。 The reception processing executed by the ATSC client 20 has been described above.

（コンポーネント・シグナリング処理）
ここで、図２６のフローチャートを参照して、図２５のステップＳ２０２の処理に対応するコンポーネント・シグナリング処理の詳細な内容について説明する。なお、図２６においては、図２２の放送クライアントミドルウェア２５１とDASHクライアント２５２で行われる処理を示している。 (Component signaling processing)
Here, the details of the component signaling process corresponding to the process of step S202 in FIG. 25 will be described with reference to the flowchart in FIG. FIG. 26 shows the processing performed by the broadcast client middleware 251 and the DASH client 252 of FIG.

ステップＳ２１１において、放送クライアントミドルウェア２５１は、MPDファイルを取得する。また、ステップＳ２１２において、放送クライアントミドルウェア２５１は、セグメントデータを取得する。 In step S211, the broadcast client middleware 251 acquires an MPD file. In step S212, the broadcast client middleware 251 acquires segment data.

ステップＳ２１３において、放送クライアントミドルウェア２５１は、ステップＳ２１１の処理で取得されたMPDファイルと、ステップＳ２１２の処理で取得されたセグメントデータを、DASHクライアント２５２に転送する。 In step S213, the broadcast client middleware 251 transfers the MPD file obtained in step S211 and the segment data obtained in step S212 to the DASH client 252.

ステップＳ２２１において、DASHクライアント２５２は、放送クライアントミドルウェア２５１から転送されてくるMPDファイルとセグメントデータを取得する。 In step S221, the DASH client 252 acquires the MPD file and the segment data transferred from the broadcast client middleware 251.

ステップＳ２２２において、DASHクライアント２５２は、ステップＳ２２１の処理で取得されたMPDファイルをパースする。 In step S222, the DASH client 252 parses the MPD file obtained in the processing in step S221.

ステップＳ２２３において、DASHクライアント２５２は、ステップＳ２２２の処理でのMPDファイルのパース結果に基づいて、ステップＳ２２１の処理で取得されるビデオとオーディオのセグメントデータのレンダリングを行う。これにより、コンテンツの映像が表示部２２２に表示され、その音声がスピーカ２２３から出力される。 In step S223, the DASH client 252 renders the video and audio segment data obtained in step S221 based on the result of parsing the MPD file in step S222. As a result, the video of the content is displayed on the display unit 222, and the sound is output from the speaker 223.

ステップＳ２２４において、DASHクライアント２５２は、ステップＳ２２２の処理でのMPDファイルのパース結果に基づいて、字幕に対応するAdaptationSet要素のEssentialProperty要素のschemeIdUri属性の値(属性値)をチェックする。 In step S224, the DASH client 252 checks the value (attribute value) of the schemeIdUri attribute of the EssentialProperty element of the AdaptationSet element corresponding to the subtitle based on the result of parsing the MPD file in the process of step S222.

ステップＳ２２４において、EssentialProperty要素のschemeIdUri属性の値として、"ttmlTimeOnly"が指定されていると判定された場合、TTML処理モードとして、モード１が設定されているので、処理は、ステップＳ２２５に進められ、ステップＳ２２５乃至Ｓ２２７の処理が実行される。 In step S224, when it is determined that “ttmlTimeOnly” is specified as the value of the schemeIdUri attribute of the EssentialProperty element, mode 1 is set as the TTML processing mode, and the process proceeds to step S225. Steps S225 to S227 are executed.

ステップＳ２２５において、DASHクライアント２５２は、ステップＳ２２１の処理で取得されるセグメントデータ(TTMLセグメント)をパースする。ただし、モード１では、MP4のファイルフォーマットで規定される時間情報、すなわち、moofボックスに格納されるBMDTやSampleDuration等の時間情報は無視される。 In step S225, the DASH client 252 parses the segment data (TTML segment) acquired in the processing in step S221. However, in mode 1, time information specified in the MP4 file format, that is, time information such as BMDT and SampleDuration stored in the moof box is ignored.

ステップＳ２２６において、DASHクライアント２５２は、ステップＳ２２５の処理でTTMLセグメントをパースすることで、mdatボックスに格納されるTTMLサンプルから得られるTTMLファイルをパースする。ここで、モード１では、TTMLファイルで指定される時間情報、すなわち、body要素内のp要素のbegin属性やend属性により指定される時間情報を尊重して、begin属性により指定される時間に字幕の表示を開始して、end属性により指定される時間にその字幕の表示を終了することになる。 In step S226, the DASH client 252 parses the TTML segment in the process of step S225, thereby parsing the TTML file obtained from the TTML sample stored in the mdat box. Here, in mode 1, the time information specified by the TTML file, that is, the time information specified by the begin attribute and end attribute of the p element in the body element is respected, and the subtitle is added at the time specified by the begin attribute. Is started, and the display of the subtitle is ended at the time specified by the end attribute.

ステップＳ２２７において、DASHクライアント２５２は、ステップＳ２２６の処理のTTMLファイルのパース結果に基づいて、レンダリング処理を行い、begin属性の表示開始時刻から、end属性の表示終了時刻までの間に、body要素内のp要素により指定される文字列としての字幕が表示されるようにする。 In step S227, the DASH client 252 performs a rendering process based on the result of parsing the TTML file in the process in step S226, and performs a rendering process between the display start time of the begin attribute and the display end time of the end attribute. To display subtitles as a character string specified by the p element.

このように、TTML処理モードとして、モード１が設定されている場合、MP4のファイルフォーマットで規定される時間情報を無視して、TTMLファイルで指定される時間情報を尊重することで、所望のタイミングで字幕を表示させることができる。 As described above, when the mode 1 is set as the TTML processing mode, the time information specified in the MP4 file format is ignored, and the time information specified in the TTML file is respected. To display subtitles.

また、ステップＳ２２４において、EssentialProperty要素のschemeIdUri属性の値として、"sampleTimeOnly"が指定されていると判定された場合、TTML処理モードとして、モード２−１が設定されているので、処理は、ステップＳ２２８に進められ、ステップＳ２２８乃至Ｓ２３０の処理が実行される。 If it is determined in step S224 that "sampleTimeOnly" is specified as the value of the schemeIdUri attribute of the EssentialProperty element, the mode is set to the mode 2-1 as the TTML processing mode. And the processing of steps S228 to S230 is executed.

ステップＳ２２８において、DASHクライアント２５２は、ステップＳ２２１の処理で取得されるセグメントデータ(TTMLセグメント)をパースする。ここで、モード２−１では、MP4のファイルフォーマットで規定される時間情報、すなわち、moofボックスに格納されるBMDTやSampleDuration等の時間情報を尊重して、BMDTに応じた時間に字幕の表示を開始して、そのmoofボックスに格納されるSampleDurationに応じた時間の間だけ表示を継続することになる。 In step S228, the DASH client 252 parses the segment data (TTML segment) acquired in the processing in step S221. Here, in the mode 2-1, the time information specified in the MP4 file format, that is, the time information such as the BMDT and the SampleDuration stored in the moof box is respected, and the subtitle is displayed at the time corresponding to the BMDT. After starting, the display is continued only for a time corresponding to the SampleDuration stored in the moof box.

ステップＳ２２９において、DASHクライアント２５２は、ステップＳ２２８の処理でTTMLセグメントをパースすることで、mdatボックスに格納されるTTMLサンプルから得られるTTMLファイルをパースする。ただし、モード２−１では、TTMLファイルで指定される時間情報、すなわち、p要素のbegin属性やend属性により指定される時間情報は無視される。 In step S229, the DASH client 252 parses the TTML segment in the process of step S228, thereby parsing the TTML file obtained from the TTML sample stored in the mdat box. However, in the mode 2-1, the time information specified by the TTML file, that is, the time information specified by the begin attribute and the end attribute of the p element is ignored.

ステップＳ２３０において、DASHクライアント２５２は、ステップＳ２２８の処理とステップＳ２２９の処理のパース結果に基づいて、レンダリング処理を行い、BMDTに応じた時間から、SampleDurationに応じた時間まで間に、TTMLファイルのp要素により指定される文字列としての字幕が表示されるようにする。 In step S230, the DASH client 252 performs a rendering process based on the parsed results of the processes of step S228 and step S229, and performs the p-time processing of the TTML file between the time corresponding to the BMDT and the time corresponding to the SampleDuration. Subtitles are displayed as character strings specified by elements.

さらに、ステップＳ２２４において、EssentialProperty要素のschemeIdUri属性の値として、"sampleTimeOnlyButTillNext"が指定されていると判定された場合、TTML処理モードとして、モード２−２が設定されているので、処理は、ステップＳ２３１に進められ、ステップＳ２３１乃至Ｓ２３３の処理が実行される。 Further, in step S224, when it is determined that “sampleTimeOnlyButTillNext” is specified as the value of the schemeIdUri attribute of the EssentialProperty element, mode 2-2 is set as the TTML processing mode. And the processing of steps S231 to S233 is executed.

ステップＳ２３１において、DASHクライアント２５２は、ステップＳ２２１の処理で取得されるセグメントデータ(TTMLセグメント)をパースする。ここで、モード２−２では、MP4のファイルフォーマットで規定される時間情報、すなわち、moofボックスに格納されるBMDTの時間情報を尊重して、対象のTTMLサンプルに対応するBMDTに応じた時間に字幕の表示を開始して、次のTTMLサンプルに対応するBMDTに応じた時間まで表示を継続することになる。 In step S231, the DASH client 252 parses the segment data (TTML segment) acquired in the processing in step S221. Here, in the mode 2-2, the time information specified in the MP4 file format, that is, the time information of the BMDT stored in the moof box is respected, and the time corresponding to the BMDT corresponding to the target TTML sample is set. The display of the caption is started, and the display is continued until the time corresponding to the BMDT corresponding to the next TTML sample.

ステップＳ２３２において、DASHクライアント２５２は、ステップＳ２３１の処理でTTMLセグメントをパースすることで、mdatボックスに格納されるTTMLサンプルから得られるTTMLファイルをパースする。ただし、モード２−２では、TTMLファイルで指定される時間情報、すなわち、p要素のbegin属性やend属性により指定される時間情報は無視される。 In step S232, the DASH client 252 parses the TTML segment in the process of step S231 to parse the TTML file obtained from the TTML sample stored in the mdat box. However, in the mode 2-2, the time information specified by the TTML file, that is, the time information specified by the begin attribute and the end attribute of the p element is ignored.

ステップＳ２３３において、DASHクライアント２５２は、ステップＳ２３１の処理とステップＳ２３２の処理のパース結果に基づいて、レンダリング処理を行い、対象のTTMLサンプルに対応するBMDTに応じた時間から、次のTTMLサンプルに対応するBMDTに応じた時間までの間に、対象のTTMLサンプルから得られるTTMLファイルのp要素により指定される文字列としての字幕が表示されるようにする。 In step S233, the DASH client 252 performs a rendering process based on the parsed results of the processes of step S231 and step S232, and responds to the next TTML sample from the time corresponding to the BMDT corresponding to the target TTML sample. The subtitle as a character string specified by the p element of the TTML file obtained from the target TTML sample is displayed until the time corresponding to the BMDT to be performed.

なお、次のTTMLサンプルに対応するBMDTに応じた時間を認識するためには、上述したステップＳ２３１の処理に戻り、TTMLセグメントをパースする必要がある。すなわち、ステップＳ２３１乃至Ｓ２３３の処理が繰り返し行われることで、次のTTMLサンプルに対応するBMDTに応じた時間が認識されるとともに、次のTTMLサンプルから得られるTTMLファイルに指定される字幕が表示されることになる。 In order to recognize the time corresponding to the BMDT corresponding to the next TTML sample, it is necessary to return to the above-described processing of step S231 and parse the TTML segment. That is, by repeating the processing of steps S231 to S233, the time corresponding to the BMDT corresponding to the next TTML sample is recognized, and the subtitle specified in the TTML file obtained from the next TTML sample is displayed. Will be.

このように、TTML処理モードとして、モード２(モード２−１，モード２−２)が設定されている場合、MP4のファイルフォーマットで規定される時間情報(TTMLサンプルごとの時間情報)を尊重して、TTMLファイルで指定される時間情報を無視することで、所望のタイミングで字幕を表示させることができる。 As described above, when mode 2 (mode 2-1 and mode 2-2) is set as the TTML processing mode, the time information (time information for each TTML sample) specified in the MP4 file format is respected. By ignoring the time information specified in the TTML file, subtitles can be displayed at a desired timing.

また、ステップＳ２２４において、EssentialProperty要素のschemeIdUri属性の値として、"asap"が指定されていると判定された場合、TTML処理モードとして、モード３が設定されているので、処理は、ステップＳ２３４に進められ、ステップＳ２３４乃至Ｓ２３６の処理が実行される。 If it is determined in step S224 that “asap” is specified as the value of the schemeIdUri attribute of the EssentialProperty element, mode 3 is set as the TTML processing mode, and the process proceeds to step S234. Then, the processing of steps S234 to S236 is executed.

ステップＳ２３４において、DASHクライアント２５２は、ステップＳ２２１の処理で取得されるセグメントデータ(TTMLセグメント)をパースする。ただし、モード３では、MP4のファイルフォーマットで規定される時間情報、すなわち、moofボックスに格納されるBMDTやSampleDuration等の時間情報は無視される。 In step S234, the DASH client 252 parses the segment data (TTML segment) acquired in the processing in step S221. However, in mode 3, time information specified in the MP4 file format, that is, time information such as BMDT and SampleDuration stored in the moof box is ignored.

ステップＳ２３５において、DASHクライアント２５２は、ステップＳ２３４の処理でTTMLセグメントをパースすることで、mdatボックスに格納されるTTMLサンプルから得られるTTMLファイルをパースする。ただし、モード３では、TTMLファイルで指定される時間情報、すなわち、p要素のbegin属性やend属性により指定される時間情報は無視される。 In step S235, the DASH client 252 parses the TTML segment in the process of step S234 to parse the TTML file obtained from the TTML sample stored in the mdat box. However, in mode 3, the time information specified in the TTML file, that is, the time information specified by the begin attribute and end attribute of the p element is ignored.

ステップＳ２３６において、DASHクライアント２５２は、ステップＳ２３５の処理のTTMLファイルのパース結果に基づいて、TTMLファイルのレンダリングを即時に行い、TTMLファイルのp要素により指定される文字列としての字幕が表示されるようにする。なお、このようにして表示された字幕は、次のTTMLファイル(TTMLサンプル)が取得されたときに、その表示が終了されることになる。 In step S236, the DASH client 252 immediately renders the TTML file based on the result of parsing the TTML file in the processing in step S235, and displays subtitles as a character string specified by the p element of the TTML file. To do. The display of the captions displayed in this manner is terminated when the next TTML file (TTML sample) is obtained.

このように、TTML処理モードとして、モード３が設定されている場合、MP4のファイルフォーマットで規定される時間情報(TTMLサンプルごとの時間情報)と、TTMLファイルで指定される時間情報を無視して、即時に字幕が表示されるようにすることで、所望のタイミングで字幕を表示させることができる。 As described above, when the mode 3 is set as the TTML processing mode, the time information specified in the MP4 file format (time information for each TTML sample) and the time information specified in the TTML file are ignored. By displaying the subtitles immediately, the subtitles can be displayed at a desired timing.

以上、ATSCクライアント２０により実行されるコンポーネント・シグナリング処理について説明した。このコンポーネント・シグナリング処理では、ATSCサーバ１０から送信されてくる、コンポーネントのデータやシグナリング情報を用い、コンテンツを再生するための処理が行われる。また、ここでは、コンテンツの映像に、字幕を重畳表示させる場合に、TTML処理モードを選択するための選択情報を含むMPDファイルが取得されるので、TTML処理モードに応じた表示タイミングで、TTMLファイルに指定される字幕が表示されることになる。 The component signaling processing executed by the ATSC client 20 has been described above. In this component signaling process, a process for reproducing content is performed using component data and signaling information transmitted from the ATSC server 10. In addition, here, when the caption is superimposed on the video of the content, the MPD file including the selection information for selecting the TTML processing mode is acquired, so the TTML file is displayed at the display timing according to the TTML processing mode. Will be displayed.

＜６．変形例＞ <6. Modification>

上述した説明としては、デジタル放送の規格として、米国等で採用されている方式であるATSC(例えばATSC3.0)を説明したが、本技術は、日本等が採用する方式であるISDB(Integrated Services Digital Broadcasting)や、欧州の各国等が採用する方式であるDVB(Digital Video Broadcasting)などに適用するようにしてもよい。 As described above, ATSC (for example, ATSC3.0), which is a system adopted in the United States and the like, has been described as a standard for digital broadcasting.However, this technology is based on ISDB (Integrated Services Digital Broadcasting) or DVB (Digital Video Broadcasting) which is a method adopted by European countries.

また、上述したSLTなどのシグナリング情報の名称は、一例であって、他の名称が用いられるようにしてもよい。仮に、シグナリング情報の名称として、他の名称が用いられた場合であっても、単に形式的に名称が変更になっただけであり、そのシグナリング情報の実質的な内容が異なるものではない。例えば、SLTは、FIT(Fast Information Table)などと称される場合がある。 Further, the name of the signaling information such as the SLT described above is an example, and another name may be used. Even if another name is used as the name of the signaling information, the name is merely changed formally, and the substantial content of the signaling information is not different. For example, SLT may be referred to as FIT (Fast Information Table).

また、上述した説明では、TTMLファイルやMPDファイルは、ATSCサーバ１０によりデジタル放送信号で伝送されるものとして説明したが、それらのファイルが、インターネット上のサーバから配信されるようにしてもよい。例えば、放送経由で配信されるコンテンツの映像に対して、通信経由で配信されるTTMLファイルの字幕が重畳表示されるようにしてもよい。また、ビデオやオーディオ(のストリーム)のデータについても、インターネット上のサーバから、適応的にストリーミング配信されるようにしてもよい。ただし、このストリーミング配信は、MPEG-DASHの規格に準拠したものとなる。 In the above description, the TTML file and the MPD file are described as being transmitted by the ATSC server 10 as digital broadcast signals. However, these files may be distributed from a server on the Internet. For example, a caption of a TTML file distributed via communication may be superimposed on a video of a content distributed via broadcasting. Also, video (audio) data may be adaptively streamed from a server on the Internet. However, this streaming distribution conforms to the MPEG-DASH standard.

さらに、上述した説明では、TTML処理モードを指定するために、MPDファイルにおいて、AdaptationSet要素のEssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性の属性値を用いる場合を説明したが、それ以外の要素や属性を用いて、TTML処理モードが指定されるようにしてもよい。また、Representation要素又はSubRepresentation要素において、EssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性の属性値により、TTML処理モードを指定するようにしてもよい。さらに、TTMLファイルの処理時にTTML処理モードが認識されていれば、MPDファイル以外のシグナリング情報などにより、TTML処理モードが指定されるようにしてもよい。 Further, in the above description, in order to specify the TTML processing mode, in the MPD file, the case where the attribute value of the SchemeIdUri attribute of the EssentialProperty element of the AdaptationSet element or the SupplementalProperty element is used, but other elements and attributes are used. Thus, the TTML processing mode may be designated. In the Representation element or SubRepresentation element, the TTML processing mode may be specified by the attribute value of the schemeIdUri attribute of the EssentialProperty element or the SupplementalProperty element. Further, if the TTML processing mode is recognized at the time of processing the TTML file, the TTML processing mode may be designated by signaling information other than the MPD file.

＜７．コンピュータの構成＞ <7. Computer Configuration>

上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。図２７は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示す図である。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software is installed in a computer. FIG. 27 is a diagram illustrating a configuration example of hardware of a computer that executes the series of processes described above by a program.

コンピュータ９００において、CPU(Central Processing Unit)９０１，ROM(Read Only Memory)９０２，RAM(Random Access Memory)９０３は、バス９０４により相互に接続されている。バス９０４には、さらに、入出力インターフェース９０５が接続されている。入出力インターフェース９０５には、入力部９０６、出力部９０７、記録部９０８、通信部９０９、及び、ドライブ９１０が接続されている。 In the computer 900, a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, and a RAM (Random Access Memory) 903 are mutually connected by a bus 904. An input / output interface 905 is further connected to the bus 904. An input unit 906, an output unit 907, a recording unit 908, a communication unit 909, and a drive 910 are connected to the input / output interface 905.

入力部９０６は、キーボード、マウス、マイクロフォンなどよりなる。出力部９０７は、ディスプレイ、スピーカなどよりなる。記録部９０８は、ハードディスクや不揮発性のメモリなどよりなる。通信部９０９は、ネットワークインターフェースなどよりなる。ドライブ９１０は、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリなどのリムーバブルメディア９１１を駆動する。 The input unit 906 includes a keyboard, a mouse, a microphone, and the like. The output unit 907 includes a display, a speaker, and the like. The recording unit 908 includes a hard disk, a non-volatile memory, and the like. The communication unit 909 includes a network interface and the like. The drive 910 drives a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

以上のように構成されるコンピュータ９００では、CPU９０１が、ROM９０２や記録部９０８に記録されているプログラムを、入出力インターフェース９０５及びバス９０４を介して、RAM９０３にロードして実行することにより、上述した一連の処理が行われる。 In the computer 900 configured as described above, the CPU 901 loads a program recorded in the ROM 902 or the recording unit 908 into the RAM 903 via the input / output interface 905 and the bus 904 and executes the program, thereby performing the above-described operation. A series of processing is performed.

コンピュータ９００(CPU９０１)が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブルメディア９１１に記録して提供することができる。また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線又は無線の伝送媒体を介して提供することができる。 The program executed by the computer 900 (CPU 901) can be provided by being recorded on, for example, a removable medium 911 as a package medium or the like. The program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

コンピュータ９００では、プログラムは、リムーバブルメディア９１１をドライブ９１０に装着することにより、入出力インターフェース９０５を介して、記録部９０８にインストールすることができる。また、プログラムは、有線又は無線の伝送媒体を介して、通信部９０９で受信し、記録部９０８にインストールすることができる。その他、プログラムは、ROM９０２や記録部９０８に、あらかじめインストールしておくことができる。 In the computer 900, the program can be installed in the recording unit 908 via the input / output interface 905 by attaching the removable medium 911 to the drive 910. Further, the program can be received by the communication unit 909 via a wired or wireless transmission medium and installed in the recording unit 908. In addition, the program can be installed in the ROM 902 or the recording unit 908 in advance.

ここで、本明細書において、コンピュータがプログラムに従って行う処理は、必ずしもフローチャートとして記載された順序に沿って時系列に行われる必要はない。すなわち、コンピュータがプログラムに従って行う処理は、並列的あるいは個別に実行される処理(例えば、並列処理あるいはオブジェクトによる処理)も含む。また、プログラムは、１のコンピュータ(プロセッサ)により処理されるものであってもよいし、複数のコンピュータによって分散処理されるものであってもよい。 Here, in the present specification, processing performed by a computer according to a program does not necessarily need to be performed in a time series in the order described in the flowchart. That is, the processing performed by the computer according to the program includes processing executed in parallel or individually (for example, parallel processing or processing by an object). Further, the program may be processed by one computer (processor) or may be processed in a distributed manner by a plurality of computers.

なお、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 Note that embodiments of the present technology are not limited to the above-described embodiments, and various changes can be made without departing from the gist of the present technology.

また、本技術は、以下のような構成をとることができる。 Further, the present technology can have the following configurations.

（１）
デジタル放送の放送波を受信する受信部と、
前記放送波で伝送される、字幕に関する字幕情報と、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報を取得する取得部と、
前記制御情報に含まれる前記選択情報に基づいて、前記特定のモードに応じた表示のタイミングで、前記字幕情報に応じた前記字幕の表示を制御する制御部と
を備える受信装置。
（２）
前記字幕情報は、TTML(Timed Text Markup Language)形式のTTMLファイルであって、そのデータは、MP4のファイルフォーマットに準拠しており、
前記制御情報は、XML(Extensible Markup Language)形式のMPD(Media Presentation Description)ファイルであり、
前記TTMLファイルと前記MPDファイルは、ROUTE(Real-Time Object Delivery over Unidirectional Transport)セッションで伝送される
（１）に記載の受信装置。
（３）
前記複数のモードは、前記TTMLファイルで指定される時間情報に応じたタイミングで前記字幕の表示を行う第１のモードを含み、
前記制御部は、前記特定のモードが前記第１のモードである場合、前記TTMLファイルで指定される時間情報に応じたタイミングで、前記TTMLファイルで指定される前記字幕を表示させる
（２）に記載の受信装置。
（４）
前記複数のモードは、前記MP4のファイルフォーマットで規定される時間情報に応じたタイミングで前記字幕の表示を行う第２のモードを含み、
前記制御部は、前記特定のモードが前記第２のモードである場合、前記MP4のファイルフォーマットで規定される時間情報に応じたタイミングで、前記TTMLファイルで指定される前記字幕を表示させる
（２）又は（３）に記載の受信装置。
（５）
前記制御部は、前記MP4のファイルフォーマットで規定されるmoofボックスに格納されるBMDT(BaseMediaDecodeTime)に応じた時間に前記字幕の表示を開始して、前記moofボックスに格納されるSampleDurationに応じた時間の間だけ表示を継続させる
（４）に記載の受信装置。
（６）
前記制御部は、前記MP4のファイルフォーマットで規定される、対象の字幕のデータを格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間に前記字幕の表示を開始して、次の字幕のデータを格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間まで表示を継続させる
（４）に記載の受信装置。
（７）
前記複数のモードは、前記TTMLファイルで指定される時間情報と、前記MP4のファイルフォーマットで規定される時間情報を無視して前記字幕の表示を行う第３のモードを含み、
前記制御部は、前記特定のモードが前記第３のモードである場合に、前記TTMLファイルを取得したとき、即時に、前記TTMLファイルで指定される前記字幕を表示させる
（２）乃至（４）のいずれかに記載の受信装置。
（８）
前記選択情報は、前記MPDファイルの拡張情報として指定される
（２）乃至（７）のいずれかに記載の受信装置。
（９）
前記選択情報は、MPD要素のPeriod要素に配置されるAdaptationSet要素において、EssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性により指定される
（８）に記載の受信装置。
（１０）
デジタル放送の放送波で伝送される、字幕に関する字幕情報と、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報を取得し、
前記制御情報に含まれる前記選択情報に基づいて、前記特定のモードに応じた表示のタイミングで、前記字幕情報に応じた前記字幕の表示を制御する
ステップを含むデータ処理方法。
（１１）
字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報を生成する生成部と、
前記字幕に関する字幕情報とともに、前記制御情報を、デジタル放送の放送波で送信する送信部と
を備える送信装置。
（１２）
前記字幕情報は、TTML形式のTTMLファイルであって、そのデータは、MP4のファイルフォーマットに準拠しており、
前記制御情報は、XML形式のMPDファイルであり、
前記TTMLファイルと前記MPDファイルは、ROUTEセッションで伝送される
（１１）に記載の送信装置。
（１３）
前記複数のモードは、前記TTMLファイルで指定される時間情報に応じたタイミングで前記字幕の表示を行う第１のモードを含む
（１２）に記載の送信装置。
（１４）
前記複数のモードは、前記MP4のファイルフォーマットで規定される時間情報に応じたタイミングで前記字幕の表示を行う第２のモードを含む
（１２）又は（１３）に記載の送信装置。
（１５）
前記第２のモードは、前記MP4のファイルフォーマットで規定されるmoofボックスに格納されるBMDTに応じた時間に前記字幕の表示を開始して、前記moofボックスに格納されるSampleDurationに応じた時間の間だけ表示を継続させるモードである
（１４）に記載の送信装置。
（１６）
前記第２のモードは、前記MP4のファイルフォーマットで規定される、対象の字幕のデータを格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間に前記字幕の表示を開始して、次の字幕のデータを格納するmdatボックスに対応するmoofボックスに格納されるBMDTに応じた時間まで表示を継続させるモードである
（１４）に記載の送信装置。
（１７）
前記複数のモードは、前記TTMLファイルで指定される時間情報と、前記MP4のファイルフォーマットで規定される時間情報を無視して前記字幕の表示を行う第３のモードを含む（１２）乃至（１４）のいずれかに記載の送信装置。
（１８）
前記選択情報は、前記MPDファイルの拡張情報として指定される
（１２）乃至（１７）のいずれかに記載の送信装置。
（１９）
前記選択情報は、MPD要素のPeriod要素に配置されるAdaptationSet要素において、EssentialProperty要素又はSupplementalProperty要素のschemeIdUri属性により指定される
（１８）に記載の送信装置。
（２０）
デジタル放送の放送波で字幕に関する字幕情報とともに伝送される、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報を含む制御情報を生成する
ステップを含むデータ処理方法。 (1)
A receiving unit for receiving broadcast waves of digital broadcasting,
An acquisition unit configured to acquire subtitle information related to subtitles transmitted by the broadcast wave and control information including selection information for selecting a specific mode among a plurality of modes for specifying a timing of displaying the subtitles; When,
A control unit that controls display of the caption according to the caption information at a display timing according to the specific mode based on the selection information included in the control information.
(2)
The subtitle information is a TTML file in TTML (Timed Text Markup Language) format, and the data is in conformity with the MP4 file format.
The control information is an MPD (Media Presentation Description) file in XML (Extensible Markup Language) format,
The receiving device according to (1), wherein the TTML file and the MPD file are transmitted in a ROUTE (Real-Time Object Delivery over Unidirectional Transport) session.
(3)
The plurality of modes include a first mode for displaying the subtitle at a timing according to time information specified in the TTML file,
The control unit, when the specific mode is the first mode, displays the subtitle specified in the TTML file at a timing according to time information specified in the TTML file (2) The receiving device according to the above.
(4)
The plurality of modes include a second mode for displaying the subtitle at a timing according to time information defined by the MP4 file format,
When the specific mode is the second mode, the control unit displays the subtitle specified by the TTML file at a timing according to time information defined by the MP4 file format. Or the receiving device according to (3).
(5)
The control unit starts displaying the subtitles at a time corresponding to a BMDT (BaseMediaDecodeTime) stored in a moof box defined by the MP4 file format, and a time corresponding to a SampleDuration stored in the moof box. The receiving device according to (4), wherein the display is continued only during the period.
(6)
The control unit starts the display of the subtitles at a time corresponding to the BMDT stored in the moof box corresponding to the mdat box storing the data of the target subtitle, which is defined in the MP4 file format, and The display according to (4), wherein the display is continued until a time corresponding to the BMDT stored in the moof box corresponding to the mdat box storing the caption data of the subtitle.
(7)
The plurality of modes include a time information specified in the TTML file and a third mode in which the subtitle is displayed ignoring time information defined in the MP4 file format,
The control unit, when the specific mode is the third mode, immediately displays the subtitle specified by the TTML file when the TTML file is obtained (2) to (4). The receiving device according to any one of the above.
(8)
The receiving device according to any one of (2) to (7), wherein the selection information is specified as extended information of the MPD file.
(9)
The receiving device according to (8), wherein the selection information is specified by a schemeIdUri attribute of an EssentialProperty element or a SupplementalProperty element in an AdaptationSet element arranged in a Period element of an MPD element.
(10)
Obtain control information including caption information related to subtitles, transmitted by digital broadcast waves, and selection information for selecting a specific mode among a plurality of modes for specifying the timing of display of the subtitles. ,
A data processing method, comprising: controlling display of the subtitles according to the subtitle information at a display timing according to the specific mode based on the selection information included in the control information.
(11)
A generation unit that generates control information including selection information for selecting a specific mode among a plurality of modes for specifying the timing of subtitle display;
A transmitting unit that transmits the control information together with the caption information related to the caption in a broadcast wave of a digital broadcast.
(12)
The subtitle information is a TTML file in the TTML format, and the data conforms to the MP4 file format.
The control information is an MPD file in XML format,
The transmission device according to (11), wherein the TTML file and the MPD file are transmitted in a ROUTE session.
(13)
The transmission device according to (12), wherein the plurality of modes include a first mode in which the caption is displayed at a timing according to time information specified in the TTML file.
(14)
The transmission device according to (12) or (13), wherein the plurality of modes include a second mode in which the subtitle is displayed at a timing according to time information defined by the MP4 file format.
(15)
The second mode starts displaying the subtitles at a time corresponding to the BMDT stored in the moof box defined by the MP4 file format, and sets a time corresponding to the SampleDuration stored in the moof box. The transmission device according to (14), wherein the mode is a mode in which display is continued only for a while.
(16)
The second mode starts displaying the subtitle at a time corresponding to the BMDT stored in the moof box corresponding to the mdat box storing the data of the target subtitle, which is defined by the MP4 file format. The transmission device according to (14), wherein the display is continued until a time corresponding to the BMDT stored in the moof box corresponding to the mdat box storing the data of the next subtitle.
(17)
The plurality of modes include a third mode in which the subtitle is displayed while ignoring time information specified in the TTML file and time information specified in the MP4 file format. (12) to (14) The transmission device according to any one of the above.
(18)
The transmission device according to any one of (12) to (17), wherein the selection information is specified as extended information of the MPD file.
(19)
The transmission device according to (18), wherein the selection information is specified by a schemeIdUri attribute of an EssentialProperty element or a SupplementalProperty element in an AdaptationSet element arranged in a Period element of an MPD element.
(20)
Generating control information including selection information for selecting a specific mode among a plurality of modes for specifying the display timing of the subtitle, which is transmitted together with the subtitle information about the subtitle in a broadcast wave of a digital broadcast; A data processing method including:

１伝送システム，１０ ATSCサーバ，２０ ATSCクライアント，３０伝送路，１０１ AVサーバ，１０２ TTMLサーバ，１０３ DASHサーバ，１０４放送サーバ，１１１ビデオデータ取得部，１１２ビデオエンコーダ，１１３オーディオデータ取得部，１１４オーディオエンコーダ，１１５字幕生成部，１１６字幕エンコーダ，１１７シグナリング生成部，１１８シグナリング処理部，１１９セグメント処理部，１２０マルチプレクサ，１２１送信部，２１２受信部，２１３デマルチプレクサ，２１４制御部，２１７ビデオデコーダ，２１８ビデオ出力部，２１９オーディオデコーダ，２２０オーディオ出力部，２２１字幕デコーダ，２４１ MP4パーサ，２４２ TTMLパーサ，２５１放送クライアントミドルウェア，２５２ DASHクライアント，９００コンピュータ，９０１ CPU Reference Signs List 1 transmission system, 10 ATSC server, 20 ATSC client, 30 transmission line, 101 AV server, 102 TTML server, 103 DASH server, 104 broadcast server, 111 video data acquisition unit, 112 video encoder, 113 audio data acquisition unit, 114 audio Encoder, 115 subtitle generation unit, 116 subtitle encoder, 117 signaling generation unit, 118 signaling processing unit, 119 segment processing unit, 120 multiplexer, 121 transmission unit, 212 reception unit, 213 demultiplexer, 214 control unit, 217 video decoder, 218 Video output unit, 219 audio decoder, 220 audio output unit, 221 subtitle decoder, 241 MP4 parser, 242 TTML parser, 2 1 broadcast client middleware, 252 DASH client, 900 computer, 901 CPU

本技術の第１の側面の受信装置は、デジタル放送の放送波を受信する受信部と、前記放送波で伝送される、字幕に関する字幕情報と、前記字幕を表示するための制御情報を取得する取得部と、前記制御情報に基づいて、前記字幕情報に応じた前記字幕の表示を制御する制御部とを備え、前記制御情報は、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報と、制御対象が字幕であることを示す役割情報を含み、前記制御部は、前記制御情報に含まれる前記選択情報と前記役割情報に基づいて、前記特定のモードに応じた表示のタイミングで、前記字幕情報に応じた前記字幕の表示を制御する受信装置である。 A receiving device according to a first aspect of the present technology obtains a receiving unit that receives a broadcast wave of digital broadcasting, caption information related to subtitles transmitted by the broadcast wave, and control information for displaying the subtitles. an acquisition unit, based on the control information, the a control unit for controlling the display of the subtitles according to subtitle information, the control information, the plurality of modes for specifying the timing of the display of the subtitles Selection information for selecting a specific mode among them, including role information indicating that the control target is subtitles, the control unit, based on the selection information and the role information included in the control information, The receiving device controls display of the caption according to the caption information at a display timing according to the specific mode .

本技術の第１の側面の受信装置及びデータ処理方法においては、デジタル放送の放送波で伝送される、字幕に関する字幕情報と、前記字幕を表示するための制御情報が取得され、前記制御情報に基づいて、前記字幕情報に応じた前記字幕の表示が制御される。また、前記制御情報には、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報と、制御対象が字幕であることを示す役割情報が含まれており、前記制御情報に含まれる前記選択情報と前記役割情報に基づいて、前記特定のモードに応じた表示のタイミングで、前記字幕情報に応じた前記字幕の表示が制御される。 In the receiving device and the data processing method according to the first aspect of the present technology, caption information on subtitles transmitted by digital broadcast waves and control information for displaying the subtitles are acquired, and the control information includes based on the display of the subtitle in accordance with the subtitle information that are controlled. Further, the control information includes selection information for selecting a specific mode among a plurality of modes for specifying the timing of displaying the subtitle, and role information indicating that the control target is the subtitle. The display of the caption according to the caption information is controlled at a display timing according to the specific mode based on the selection information and the role information included in the control information.

本技術の第２の側面の送信装置は、字幕を表示するための制御情報を生成する生成部と、前記字幕に関する字幕情報とともに、前記制御情報を、デジタル放送の放送波で送信する送信部とを備え、前記制御情報は、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報と、制御対象が字幕であることを示す役割情報を含む送信装置である。 The transmission device according to the second aspect of the present technology includes a generation unit that generates control information for displaying subtitles, and a transmission unit that transmits the control information , together with subtitle information related to the subtitles, in a digital broadcast wave. The control information includes selection information for selecting a specific mode among a plurality of modes for designating the display timing of the caption, and role information indicating that the control target is the caption It is a transmitting device.

本技術の第２の側面の送信装置及びデータ処理方法においては、字幕を表示するための制御情報が生成され、前記字幕に関する字幕情報とともに、前記制御情報が、デジタル放送の放送波で送信される。また、前記制御情報には、前記字幕の表示のタイミングを指定するための複数のモードのうちの特定のモードを選択するための選択情報と、制御対象が字幕であることを示す役割情報が含まれる。 In the transmitting apparatus and the data processing method of the second aspect of the present technology, the control information for displaying subtitles is created, with the subtitle information on the caption, wherein the control information is transmitted by broadcast wave of digital broadcasting . Further, the control information includes selection information for selecting a specific mode among a plurality of modes for specifying the timing of displaying the subtitle, and role information indicating that the control target is the subtitle. It is.

Claims

A receiving unit for receiving broadcast waves of digital broadcasting,
An acquisition unit configured to acquire subtitle information related to subtitles transmitted by the broadcast wave and control information including selection information for selecting a specific mode among a plurality of modes for specifying a timing of displaying the subtitles; When,
A control unit that controls display of the caption according to the caption information at a display timing according to the specific mode based on the selection information included in the control information.

The subtitle information is a TTML file in TTML (Timed Text Markup Language) format, and the data is in conformity with the MP4 file format.
The control information is an MPD (Media Presentation Description) file in XML (Extensible Markup Language) format,
The receiving device according to claim 1, wherein the TTML file and the MPD file are transmitted in a Real-Time Object Delivery over Unidirectional Transport (ROUTE) session.

The plurality of modes include a first mode for displaying the subtitle at a timing according to time information specified in the TTML file,
The control unit, when the specific mode is the first mode, displays the subtitle specified in the TTML file at a timing according to time information specified in the TTML file. The receiving device according to the above.

The plurality of modes include a second mode for displaying the subtitle at a timing according to time information defined by the MP4 file format,
The control unit, when the specific mode is the second mode, displays the subtitle specified by the TTML file at a timing according to time information defined by the MP4 file format. 3. The receiving device according to 2.

The control unit starts displaying the subtitles at a time corresponding to a BMDT (BaseMediaDecodeTime) stored in a moof box defined by the MP4 file format, and a time corresponding to a SampleDuration stored in the moof box. The receiving device according to claim 4, wherein the display is continued only during the period.

The control unit, specified in the MP4 file format, starts displaying the subtitles at a time corresponding to the BMDT stored in the moof box corresponding to the mdat box that stores target subtitle data, and The receiving device according to claim 4, wherein the display is continued until a time corresponding to the BMDT stored in the moof box corresponding to the mdat box that stores the data of the caption.

The plurality of modes include a time information specified in the TTML file and a third mode in which the subtitle is displayed ignoring time information defined in the MP4 file format,
The reception unit according to claim 2, wherein, when the specific mode is the third mode, the control unit causes the caption specified by the TTML file to be displayed immediately upon acquiring the TTML file. apparatus.

The receiving device according to claim 2, wherein the selection information is specified as extended information of the MPD file.

The receiving device according to claim 8, wherein the selection information is specified by a schemeIdUri attribute of an EssentialProperty element or a SupplementalProperty element in an AdaptationSet element arranged in a Period element of an MPD element.

Obtain control information including caption information related to subtitles, transmitted by digital broadcast waves, and selection information for selecting a specific mode among a plurality of modes for specifying the timing of display of the subtitles. ,
A data processing method comprising: controlling display of the subtitles according to the subtitle information at a display timing according to the specific mode based on the selection information included in the control information.

A generation unit that generates control information including selection information for selecting a specific mode among a plurality of modes for specifying the timing of display of subtitles;
A transmitting unit that transmits the control information together with the caption information related to the caption in a broadcast wave of a digital broadcast.

The subtitle information is a TTML file in the TTML format, and the data conforms to the MP4 file format.
The control information is an MPD file in XML format,
The transmission device according to claim 11, wherein the TTML file and the MPD file are transmitted in a ROUTE session.

The transmission device according to claim 12, wherein the plurality of modes include a first mode in which the subtitle is displayed at a timing according to time information specified in the TTML file.

The transmitting device according to claim 12, wherein the plurality of modes include a second mode in which the subtitle is displayed at a timing according to time information defined by the MP4 file format.

The second mode starts displaying the subtitles at a time corresponding to the BMDT stored in the moof box defined by the MP4 file format, and sets a time corresponding to the SampleDuration stored in the moof box. The transmission device according to claim 14, wherein the mode is a mode in which display is continued only for a while.

The second mode starts displaying the subtitles at a time corresponding to the BMDT stored in the moof box corresponding to the mdat box storing the data of the target subtitle, which is defined by the MP4 file format. The transmitting apparatus according to claim 14, wherein the mode is a mode in which display is continued until a time corresponding to the BMDT stored in the moof box corresponding to the mdat box storing data of the next subtitle.

The method according to claim 12, wherein the plurality of modes include a third mode in which the subtitle is displayed ignoring time information specified in the TTML file and time information specified in the MP4 file format. Transmission device.

The transmission device according to claim 12, wherein the selection information is specified as extended information of the MPD file.

The transmitting device according to claim 18, wherein the selection information is specified by a schemeIdUri attribute of an EssentialProperty element or a SupplementalProperty element in an AdaptationSet element arranged in a Period element of an MPD element.

Generating control information including selection information for selecting a specific mode among a plurality of modes for specifying the display timing of the subtitles, which is transmitted together with the subtitle information about the subtitles in a broadcast wave of a digital broadcast. A data processing method including: