JP6971059B2

JP6971059B2 - Redelivery system, redelivery method, and program

Info

Publication number: JP6971059B2
Application number: JP2017110376A
Authority: JP
Inventors: 成暁加藤; 宗遠藤; 秋継馬場; 清彦石川; 裕紀藤井; 英樹丸山
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2017-06-02
Filing date: 2017-06-02
Publication date: 2021-11-24
Anticipated expiration: 2037-06-02
Also published as: JP2018207288A

Description

本発明は、再配信システム、再配信方法、およびプログラムに関する。 The present invention relates to a redelivery system, a redelivery method, and a program.

インターネット等の通信回線を用いて様々なコンテンツ（映像や音声など）をリアルタイムに配信（ストリーミング）する技術が普及してきている。インターネットを利用した映像や音声等の配信は、比較的低コストの設備や、相対的に低い情報伝送コストで実現できるため、今後、ますます活用されていくことが予想されている。 Technology for delivering (streaming) various contents (video, audio, etc.) in real time using a communication line such as the Internet has become widespread. Distribution of video and audio using the Internet can be realized with relatively low-cost equipment and relatively low information transmission costs, so it is expected that it will be used more and more in the future.

ところで、例えば、第１の事業者によって制作されたコンテンツを、第２の事業者が受信して再配信する場合に、コンテンツを付加する場合がある。典型的な例では、特定の地域向けあるいは特定の言語圏向けにコンテンツを再配信するときに、その地域ないしは言語等に特有のコンテンツを付加することが望まれることがある。そのような場合、従来の技術では、まず第１の事業者が制作したコンテンツ（例えば、映像および音声）を第２の事業者向けに伝送する。そして、第２の事業者は、付加すべきコンテンツ（例えば、特定言語による解説音声や、特定地域向けの解説音声）を付加した後、インターネット配信用の形式にエンコードしていた。 By the way, for example, when the content produced by the first business operator is received and redistributed by the second business operator, the content may be added. In a typical example, when redistributing content for a specific region or a specific language area, it may be desired to add content specific to that region, language, or the like. In such a case, in the conventional technology, the content (for example, video and audio) produced by the first business operator is first transmitted to the second business operator. Then, the second business operator added the content to be added (for example, the commentary voice in a specific language or the commentary voice for a specific area), and then encoded it into a format for Internet distribution.

従来技術によるこのような方法では、第２の事業者がコンテンツを加工し、配信しやすくするため、高品質なコンテンツを専用線などを使用した伝送が必要で、第２の事業者がコンテンツ（上記の音声）を付加するために、多くの機材および工程を要していた。 In such a method using the conventional technology, in order to make it easier for the second company to process and distribute the content, it is necessary to transmit high-quality content using a dedicated line or the like, and the second company needs to transmit the content ( It took a lot of equipment and processes to add the above voice).

具体的には、従来技術を用いた場合、第１の事業者から伝送された映像および音声をデコーダーによりデコードし、映像と音声とをＤｅＭＵＸ（デマルチプレクサー）で分離した後、元の音声に付加すべき音声を付加していた。そして、元の映像と付加された音声とをエンコーダーを用いてエンコードし、映像と音声を再びＭＵＸ（マルチプレクサー）で結合することでインターネット配信用の形式にエンコードしていた。 Specifically, when the conventional technology is used, the video and audio transmitted from the first operator are decoded by a decoder, the video and audio are separated by a DeMUX (demultiplexer), and then the original audio is restored. The voice to be added was added. Then, the original video and the added audio were encoded using an encoder, and the video and audio were combined again with a MUX (multiplexer) to encode into a format for Internet distribution.

例えば、非特許文献１には、大規模なスポーツイベントに関して、放送事業者が、インターネット経由で全競技・全種目の映像を実際にライブストリーミングで配信した際のシステム構成が記載されている。この文献によれば、イベントが開催されている現地都市のセンター（ブラジル）から、国際回線を用いて、ＩＰＶａｎｄＡのＳＤ画質の映像リソースが、東京の放送センターまで伝送された。なお、ＳＤ画質の映像は、約２．５Ｍｂｐｓのビットレートによるものである。そして、その放送センターにおいて、ＩＰＶａｎｄＡの映像をより低ビットレートの映像にコーディングし直して、インターネット経由での配信が行われた。また、一部の競技の映像に関しては、上記の放送センター内に簡易の音声ブースを構築し、ネット配信独自の解説・実況の音声を付加して配信することが行われた。 For example, Non-Patent Document 1 describes a system configuration when a broadcaster actually distributes videos of all competitions and all events by live streaming for a large-scale sporting event. According to this document, IPVandA's SD quality video resources were transmitted from the center of the local city (Brazil) where the event was held to the broadcasting center in Tokyo using an international line. The SD image quality video is based on a bit rate of about 2.5 Mbps. Then, at the broadcasting center, the IPVandA video was recoded into a lower bit rate video and distributed via the Internet. In addition, for the video of some competitions, a simple audio booth was constructed in the above broadcasting center, and the commentary and live audio unique to online distribution were added and distributed.

島西顕司，遠藤宗，小久保幸紀，折下伸也，坂井駿一，前田彩、「リオデジャネイロオリンピックデジタルコンテンツ制作について」、放送技術、２０１６年１１月、ｐ．１０４−１０６．Kenji Shimanishi, Mune Endo, Yuki Kokubo, Shinya Orishita, Shunichi Sakai, Aya Maeda, "Rio de Janeiro Olympic Digital Content Production", Broadcasting Technology, November 2016, p. 104-106.

従来技術において、コンテンツを付加して再配信する際に、元のコンテンツの少なくとも一部を再生する必要がある。また、元のコンテンツと、付加されるコンテンツとの間のタイミングを合わせて、再配信する必要がある。
そのために、元のコンテンツをまずベースバンド信号（非圧縮信号）の状態にして再生し、コンテンツを付加する必要があった。また、そのため、元のコンテンツを伝送するためにベースバンド信号（非圧縮信号）もしく高いビットレートでエンコードされた高解像度な映像を含む信号を伝送する必要があり、広帯域で安定した回線、即ち高コストな通信回線を必要としていた。また、そのようなシステムを構成するためには、多段の工程を必要とし、即ち多くの高価な機材等を必要としていた。特に、複数の拠点から従来技術を用いて同時にコンテンツを配信できるようにすることは、費用面において困難であった。 In the prior art, when adding and redistributing content, it is necessary to reproduce at least a part of the original content. In addition, it is necessary to redistribute the original content and the added content at the same timing.
Therefore, it is necessary to first play the original content in the state of a baseband signal (uncompressed signal) and add the content. Therefore, in order to transmit the original content, it is necessary to transmit a signal including a baseband signal (uncompressed signal) or high-resolution video encoded at a high bit rate, that is, a wideband and stable line, that is, It required a high-cost communication line. Further, in order to configure such a system, a multi-stage process is required, that is, a lot of expensive equipment and the like are required. In particular, it has been difficult in terms of cost to enable simultaneous distribution of content from a plurality of bases using conventional technology.

本発明は、上記の課題認識に基づいて行なわれたものであり、元のコンテンツを受信し、新たなコンテンツを付加し、それらをまとめて再配信する際に、伝送のコストや機材のコストを低くすることのできる、再配信システム、再配信方法、およびプログラムを提供しようとするものである。 The present invention has been made based on the above-mentioned problem recognition, and when receiving the original content, adding new content, and redistributing them all together, the cost of transmission and the cost of equipment are reduced. It seeks to provide redelivery systems, redelivery methods, and programs that can be lowered.

［１］上記の課題を解決するため、本発明の一態様による再配信システムは、ＨＴＴＰストリーミング形式にエンコードされた少なくとも１種類のコンテンツを含む第１パッケージを受信する受信部と、前記受信部が受信した前記第１パッケージに含まれる前記コンテンツのうちの少なくとも一部の種類のコンテンツに基づく新たなコンテンツを生成して出力する編集部と、前記受信部が受信した前記第１パッケージに含まれる前記コンテンツのうちの少なくとも一部のコンテンツと、前記編集部によって生成された前記新たなコンテンツとを、一つの第２パッケージとして統合して出力する統合部と、前記統合部から出力される前記第２パッケージを再配信する配信部と、を具備することを特徴とする。 [1] In order to solve the above problems, in the redistribution system according to one aspect of the present invention, a receiving unit that receives a first package containing at least one type of content encoded in an HTTP streaming format, and the receiving unit. An editorial unit that generates and outputs new content based on at least a part of the contents included in the received first package, and the said unit included in the first package received by the receiving unit. An integrated unit that integrates and outputs at least a part of the content and the new content generated by the editorial unit as one second package, and the second unit that is output from the integrated unit. It is characterized by having a distribution unit for redistributing the package.

［２］また、本発明の一態様は、上記の再配信システムにおいて、前記受信部は、少なくとも１種類の映像のコンテンツと、少なくとも１種類の音声のコンテンツとを含む前記第１パッケージを受信し、前記編集部は、前記第１パッケージに含まれる少なくとも１種類の音声のコンテンツである第１音声を再生するとともに、前記第１音声と、前記第１音声に対応して入力される別の音声とを重畳して得られる第２音声を生成して前記新たなコンテンツとして出力し、前記統合部は、前記第１パッケージに含まれる前記映像のコンテンツおよび前記音声のコンテンツと、前記新たなコンテンツとの間で、再生のタイミングが整合するように統合して出力する、ことを特徴とする。 [2] Further, in one aspect of the present invention, in the above-mentioned redistribution system, the receiving unit receives the first package including at least one kind of video content and at least one kind of audio content. The editorial unit reproduces the first voice, which is the content of at least one type of voice included in the first package, and at the same time, the first voice and another voice input corresponding to the first voice. The second audio obtained by superimposing the above is generated and output as the new content, and the integrated unit includes the video content and the audio content included in the first package, and the new content. It is characterized in that it is integrated and output so that the timing of reproduction is matched between the two.

［３］また、本発明の一態様は、上記の再配信システムにおいて、前記編集部は、前記第１パッケージに含まれるコンテンツが保持するタイミング情報に基づいて、整合するタイミング情報を前記新たなコンテンツに付与するものであり、前記統合部は、前記第１パッケージに含まれるコンテンツが保持するタイミング情報と前記新たなコンテンツに付与されたタイミング情報とに基づいて、再生のタイミングが整合するようにする、ことを特徴とする。 [3] Further, in one aspect of the present invention, in the above-mentioned redistribution system, the editorial unit obtains the matching timing information based on the timing information held by the contents included in the first package, the new contents. The integrated unit makes the timing of reproduction consistent based on the timing information held by the content included in the first package and the timing information assigned to the new content. , Characterized by that.

［４］また、本発明の一態様は、上記の再配信システムにおいて、前記統合部は、前記第１音声の波形と前記第２音声の波形との類似性に基づいて、前記第１音声のコンテンツを含む前記第１パッケージのコンテンツと、前記新たなコンテンツである前記第２音声との、いずれか一方を時間方向に移動させることによって、再生のタイミングが整合するように統合して出力する、ことを特徴とする。 [4] Further, in one aspect of the present invention, in the above-mentioned redistribution system, the integrated unit is based on the similarity between the waveform of the first voice and the waveform of the second voice, and the first voice. By moving one of the content of the first package including the content and the second audio which is the new content in the time direction, the playback timing is integrated and output. It is characterized by that.

［５］また、本発明の一態様は、上記の再配信システムにおいて、前記受信部は、少なくとも１種類の音声のコンテンツを含む前記第１パッケージを受信し、前記編集部は、前記第１パッケージに含まれる少なくとも１種類の音声のコンテンツの音声認識処理を行うことによって前記音声のコンテンツに対応する字幕テキストのコンテンツを前記新たなコンテンツとして生成し、前記統合部は、前記音声のコンテンツに含まれる音声信号と生成された前記字幕テキストとの間の時間方向の対応関係に基づいて、前記音声のコンテンツの再生のタイミングと前記字幕テキストの提示のタイミングが整合するように統合して出力する、ことを特徴とする。 [5] Further, in one aspect of the present invention, in the redistribution system, the receiving unit receives the first package containing at least one type of audio content, and the editorial unit receives the first package. By performing voice recognition processing of at least one type of voice content included in the above, the content of the subtitle text corresponding to the voice content is generated as the new content, and the integrated unit is included in the voice content. Based on the temporal correspondence between the audio signal and the generated subtitle text, the timing of reproduction of the audio content and the timing of presentation of the subtitle text are integrated and output. It is characterized by.

［６］また、本発明の一態様は、ＨＴＴＰストリーミング形式にエンコードされた少なくとも１種類のコンテンツを含む第１パッケージを受信する受信過程、前記受信過程で受信した前記第１パッケージに含まれる前記コンテンツのうちの少なくとも一部の種類のコンテンツに基づく新たなコンテンツを生成して出力する編集過程、前記受信過程で受信した前記第１パッケージに含まれる前記コンテンツのうちの少なくとも一部のコンテンツと、前記編集過程において生成された前記新たなコンテンツとを、一つの第２パッケージとして統合して出力する統合過程、前記統合過程で出力される前記第２パッケージを再配信する配信過程、を含むことを特徴とする再配信方法である。 [6] Further, one aspect of the present invention is a reception process for receiving a first package containing at least one type of content encoded in an HTTP streaming format, and the content included in the first package received in the reception process. An editing process for generating and outputting new content based on at least a part of the contents, at least a part of the contents included in the first package received in the receiving process, and the above. It is characterized by including an integration process of integrating and outputting the new content generated in the editing process as one second package, and a distribution process of redistributing the second package output in the integration process. It is a redelivery method.

［７］また、本発明の一態様は、コンピューターを、ＨＴＴＰストリーミング形式にエンコードされた少なくとも１種類のコンテンツを含む第１パッケージを受信する受信部と、前記受信部が受信した前記第１パッケージに含まれる前記コンテンツのうちの少なくとも一部の種類のコンテンツに基づく新たなコンテンツを生成して出力する編集部と、前記受信部が受信した前記第１パッケージに含まれる前記コンテンツのうちの少なくとも一部のコンテンツと、前記編集部によって生成された前記新たなコンテンツとを、一つの第２パッケージとして統合して出力する統合部と、前記統合部から出力される前記第２パッケージを再配信する配信部と、を具備する再配信システムとして機能させるためのプログラムである。 [7] Further, in one aspect of the present invention, a computer is provided with a receiving unit that receives a first package containing at least one type of content encoded in an HTTP streaming format, and the first package received by the receiving unit. An editorial unit that generates and outputs new content based on at least a part of the included contents, and at least a part of the contents included in the first package received by the receiving unit. Content and the new content generated by the editorial unit are integrated and output as one second package, and the distribution unit that redistributes the second package output from the integrated unit. It is a program for functioning as a redistribution system equipped with.

本発明によれば、低い伝送コスト、低い機器コストで、ストリーミング形式のコンテンツに新たなコンテンツを付加したうえで再配信することが可能となる。 According to the present invention, it is possible to add new content to the streaming format content and redistribute it at a low transmission cost and a low device cost.

本発明の第１実施形態による再配信システム（再配信装置）の概略機能構成と、同システムにおけるコンテンツデータの流れとを示す概略図である。It is a schematic diagram which shows the schematic functional structure of the redistribution system (redistribution apparatus) by 1st Embodiment of this invention, and the flow of the content data in the system. 同実施形態による再配信システムを含む、システム全体の構成例を示すブロック図である。It is a block diagram which shows the configuration example of the whole system including the redistribution system by the same embodiment. 第２実施形態による再配信システム（再配信装置）の概略機能構成と、同システムにおけるコンテンツデータの流れとを示す概略図である。It is a schematic diagram which shows the schematic functional configuration of the redistribution system (redistribution apparatus) by 2nd Embodiment, and the flow of the content data in the system. 同実施形態において、配信サーバー装置から配信されたコンテンツを再配信システムが再配信する際のコンテンツの流れを示す概略図である。In the same embodiment, it is a schematic diagram which shows the flow of the content when the redistribution system redistributes the content distributed from the distribution server device. 同実施形態において配信サーバー装置側からストリーミング配信されるデータの構成例を示す概略図である。It is a schematic diagram which shows the structural example of the data stream-distributed from the distribution server apparatus side in the same embodiment. 同実施形態において再配信システムからストリーミング配信されるデータの構成例を示す概略図である。It is a schematic diagram which shows the structural example of the data stream-delivered from the re-delivery system in the same embodiment. 同実施形態において用いられる最上位層のインデックスファイルの構成例を示す概略図である。It is a schematic diagram which shows the structural example of the index file of the uppermost layer used in the same embodiment. 同実施形態において用いられる、相対的に下位層のインデックスファイルの構成例を示す概略図である。It is a schematic diagram which shows the structural example of the index file of the relatively lower layer used in the same embodiment. 第３実施形態による再配信システム（再配信装置）の概略機能構成と、同システムにおけるコンテンツデータの流れとを示す概略図である。It is a schematic diagram which shows the schematic functional structure of the redistribution system (redistribution apparatus) by 3rd Embodiment, and the flow of the content data in the system.

［第１実施形態］
図１は、本実施形態による再配信システム（再配信装置）の概略機能構成と、同システムにおけるコンテンツデータの流れとを示す概略図である。図示するように、再配信システム１は、受信部１２０と、編集部１４０と、統合部１６０と、配信部１８０とを含んで構成される。 [First Embodiment]
FIG. 1 is a schematic diagram showing a schematic functional configuration of a redistribution system (redistribution device) according to the present embodiment and a flow of content data in the system. As shown in the figure, the redistribution system 1 includes a reception unit 120, an editorial unit 140, an integration unit 160, and a distribution unit 180.

再配信システム１は、例えば外部の配信サーバーからＨＴＴＰストリーミング等で配信されるコンテンツを受信する。なお、ＨＴＴＰは、ハイパーテキスト転送プロトコル（HyperText Transfer Protocol）の略である。再配信システム１が受信するコンテンツは、例えば、映像や音声やテキストなど、複数の種類のコンテンツを含んでいる。また、再配信システム１が受信するコンテンツは、例えば、複数の映像のコンテンツや、複数の音声のコンテンツ等を含んできてもよい。そして、再配信システム１は、受信したコンテンツの少なくとも一部に基づく新たなコンテンツを生成する。そして、再配信システム１は、受信した元のコンテンツと、生成した新たなコンテンツとを、まとめて１つのコンテンツのパッケージとして、再配信するものである。 The redistribution system 1 receives, for example, content distributed by HTTP streaming or the like from an external distribution server. Note that HTTP is an abbreviation for HyperText Transfer Protocol. The content received by the redistribution system 1 includes a plurality of types of content such as video, audio, and text. Further, the content received by the redistribution system 1 may include, for example, a plurality of video contents, a plurality of audio contents, and the like. Then, the redistribution system 1 generates new content based on at least a part of the received content. Then, the redistribution system 1 redistributes the received original content and the generated new content together as one content package.

受信部１２０は、例えばＨＴＴＰストリーミング形式にエンコードされた少なくとも１種類のコンテンツを含む第１パッケージを受信する。受信部１２０は、複数の種類のコンテンツを受信してもよい。図示する例では、Ｃ（１）からＣ（ｍ＋ｎ）までの（ｍ＋ｎ）種類のコンテンツを含んだパッケージを受信する。なお、ここで、ｍは０以上の整数であり、ｎは１以上の整数である。なお、受信部１２０は、例えばＨＬＳによりこれらのコンテンツを受信する。ＨＬＳは、「ＨＴＴＰライブストリーミング」（HTTP Live Streaming）の略であり、インターネット等を介して映像等をストリーミング配信する方法（プロトコル）として知られる。
受信部１２０は、受信したコンテンツであるＣ（１）からＣ（ｍ＋ｎ）を、統合部１６０に渡す。また、受信部１２０は、受信したコンテンツのうちのＣ（ｍ＋１）からＣ（ｍ＋ｎ）を、編集部１４０に渡す。 The receiving unit 120 receives, for example, a first package containing at least one type of content encoded in the HTTP streaming format. The receiving unit 120 may receive a plurality of types of contents. In the illustrated example, a package containing (m + n) types of content from C (1) to C (m + n) is received. Here, m is an integer of 0 or more, and n is an integer of 1 or more. The receiving unit 120 receives these contents by, for example, HLS. HLS is an abbreviation for "HTTP Live Streaming", and is known as a method (protocol) for streaming and distributing video or the like via the Internet or the like.
The receiving unit 120 passes C (m + n), which is the received content, from C (1) to the integrated unit 160. Further, the receiving unit 120 passes C (m + n) to C (m + n) of the received contents to the editing unit 140.

編集部１４０は、受信部１２０が受信した第１パッケージに含まれるコンテンツのうちの少なくとも一部の種類のコンテンツに基づく新たなコンテンツを生成して出力する。より具体的には、編集部１４０は、受信部１２０が受信したコンテンツのうちのＣ（ｍ＋１）からＣ（ｍ＋ｎ）までのｎ種類のコンテンツを、受信部１２０から受け取る。そして、編集部１４０は、受け取ったコンテンツであるＣ（ｍ＋１）からＣ（ｍ＋ｎ）までに基づいて、これらのコンテンツに関連する新たなコンテンツを生成する。編集部１４０が生成する新たなコンテンツは、Ｃ（ｍ＋ｎ＋１）からＣ（ｍ＋ｎ＋ｋ）までのｋ種類のコンテンツである。ただし、ｋは、１以上の整数である。編集部１４０が受け取るコンテンツであるＣ（ｍ＋１）からＣ（ｍ＋ｎ）までと、編集部が生成するコンテンツであるＣ（ｍ＋＋ｎ＋１）からＣ（ｍ＋ｎ＋ｋ）までとの関係は様々であるが、両者はコンテンツとして関係を有している。また、両者は、相互に関連するものであるので、その再生等（より一般的には、提示）においてタイミングを合わせるべきものである。編集部１４０は、生成したコンテンツを、統合部１６０に渡す。 The editorial unit 140 generates and outputs new content based on at least a part of the contents included in the first package received by the receiving unit 120. More specifically, the editorial unit 140 receives n types of contents from C (m + 1) to C (m + n) among the contents received by the receiving unit 120 from the receiving unit 120. Then, the editorial unit 140 generates new contents related to these contents based on the received contents C (m + 1) to C (m + n). The new content generated by the editorial unit 140 is k types of content from C (m + n + 1) to C (m + n + k). However, k is an integer of 1 or more. There are various relationships between the content received by the editorial unit 140 from C (m + 1) to C (m + n) and the content generated by the editorial unit C (m ++ n + 1) to C (m + n + k), but both are contents. Has a relationship as. Moreover, since both are related to each other, the timing should be adjusted in the reproduction or the like (more generally, the presentation). The editorial unit 140 passes the generated content to the integrated unit 160.

統合部１６０は、受信部１２０が受信した第１パッケージに含まれるコンテンツと、編集部１４０によって生成された新たなコンテンツとを、一つの第２パッケージとして統合して出力する。統合部１６０は、受信部１２０から渡されたコンテンツであるＣ（１）からＣ（ｍ＋ｎ）までと、編集部１４０から渡されたコンテンツであるＣ（ｍ＋ｎ＋１）からＣ（ｍ＋ｎ＋ｋ）までとを統合する。なお、統合部１６０は、エンコードされたままの状態でＣ（１）からＣ（ｍ＋ｎ）までを受け取り、そのままコンテンツであるＣ（ｍ＋ｎ＋１）からＣ（ｍ＋ｎ＋ｋ）までとの統合を行う。そして、統合部１６０は、これらのコンテンツの全体を一つのパッケージとして、配信部１８０に渡す。なお、このとき、統合部１６０は、受信部１２０から渡されたコンテンツと編集部１４０から渡されたコンテンツとの間で、再生のタイミングが整合するように統合する。
なお、統合部１６０が、受信部１２０から渡されたコンテンツであるＣ（１）からＣ（ｍ＋ｎ）までの全部ではなく、それらの一部のみを、Ｃ（ｍ＋ｎ＋１）からＣ（ｍ＋ｎ＋ｋ）までと統合するようにしてもよい。この場合、Ｃ（１）からＣ（ｍ＋ｎ）のうちのいずれをＣ（ｍ＋ｎ＋１）からＣ（ｍ＋ｎ＋ｋ）までと統合するかは、適宜、定められる。
つまり、統合部１６０は、受信部１２０が受信した第１パッケージに含まれるコンテンツのうちの少なくとも一部のコンテンツと、編集部１４０によって生成された新たなコンテンツとを、一つの第２パッケージとして統合して出力する。 The integration unit 160 integrates and outputs the content included in the first package received by the reception unit 120 and the new content generated by the editorial unit 140 as one second package. The integration unit 160 integrates the contents C (1) to C (m + n) passed from the reception unit 120 and the contents C (m + n + 1) to C (m + n + k) passed from the editorial unit 140. do. The integration unit 160 receives C (1) to C (m + n) in the encoded state, and integrates the contents C (m + n + 1) to C (m + n + k) as they are. Then, the integration unit 160 passes the entire contents as one package to the distribution unit 180. At this time, the integration unit 160 integrates the content passed from the receiving unit 120 and the content passed from the editing unit 140 so that the reproduction timing is consistent.
It should be noted that the integrated unit 160 does not include all of the contents from C (1) to C (m + n) passed from the receiving unit 120, but only a part of them from C (m + n + 1) to C (m + n + k). It may be integrated. In this case, which of C (1) to C (m + n) is to be integrated with C (m + n + 1) to C (m + n + k) is appropriately determined.
That is, the integration unit 160 integrates at least a part of the content included in the first package received by the reception unit 120 and the new content generated by the editorial unit 140 as one second package. And output.

配信部１８０は、統合部１６０から渡されたコンテンツ（第２パッケージ）を、再配信する。 The distribution unit 180 redistributes the content (second package) passed from the integration unit 160.

図２は、再配信システム１を含む、システム全体の構成例を示すブロック図である。図示するように、本システムは、配信サーバー装置２と、再配信システム１と、クライアント装置３とを含んで構成される。再配信システム１は、インターネット等の通信回線を介して、配信サーバー装置２およびクライアント装置３と接続されている。なお、この図においては、１台のクライアント装置３のみを示しているが、実際には多数のクライアント装置３が再配信システム１に接続されていてもよい。再配信システム１が受信部１２０と編集部１４０と統合部１６０と配信部１８０とを含んで構成される点は、図１を参照しながら説明した通りである。 FIG. 2 is a block diagram showing a configuration example of the entire system including the redistribution system 1. As shown in the figure, this system includes a distribution server device 2, a redistribution system 1, and a client device 3. The redistribution system 1 is connected to the distribution server device 2 and the client device 3 via a communication line such as the Internet. Although only one client device 3 is shown in this figure, a large number of client devices 3 may actually be connected to the redistribution system 1. The point that the redistribution system 1 includes the reception unit 120, the editing unit 140, the integration unit 160, and the distribution unit 180 is as described with reference to FIG.

配信サーバー装置２は、オリジナルのコンテンツを配信するサーバーコンピューターである。配信サーバー装置２が配信するコンテンツは、例えば、映像と音声とで構成されるコンテンツである。なお、配信サーバー装置２は、コンテンツの配信には、例えば、前述のＨＬＳを用いる。
クライアント装置３は、再配信システム１が送出するコンテンツ（再配信されるコンテンツ）を受信する。クライアント装置３は、例えば、パーソナルコンピューター（ＰＣ）や、スマートフォン（スマホ）や、腕時計型の情報端末や、メガネ型の情報端末や、その他の情報機器等を用いて実現される。クライアント装置３は、例えば、ウェブブラウザーの機能を備えており、ウェブブラウザーがＨＴＴＰクライアントとして機能する。これにより、再配信システム１からＨＬＳで再配信されるコンテンツが視聴可能となる。 The distribution server device 2 is a server computer that distributes original contents. The content distributed by the distribution server device 2 is, for example, content composed of video and audio. The distribution server device 2 uses, for example, the above-mentioned HLS for distributing the content.
The client device 3 receives the content (content to be redistributed) transmitted by the redistribution system 1. The client device 3 is realized by using, for example, a personal computer (PC), a smartphone (smartphone), a wristwatch-type information terminal, a glasses-type information terminal, other information devices, and the like. The client device 3 has, for example, the function of a web browser, and the web browser functions as an HTTP client. As a result, the content re-distributed by HLS from the re-distribution system 1 can be viewed.

本実施形態の構成によれば、ベースバンド信号（非圧縮信号）によるコンテンツを受信することなく、ストリーミング形式で受信したコンテンツに関連する新たなコンテンツを付加したうえで、コンテンツの再配信を実現することが可能となる。つまり、工程や機材等を大幅に削減できるため、安価に再配信システムを実現することが可能となる。 According to the configuration of the present embodiment, new content related to the content received in the streaming format is added without receiving the content by the baseband signal (uncompressed signal), and then the content is redistributed. It becomes possible. In other words, since the number of processes and equipment can be significantly reduced, it is possible to realize a redistribution system at low cost.

［第２実施形態］
次に、第２実施形態について説明する。なお、前実施形態において既に説明した事項については以下において説明を省略する場合がある。ここでは、本実施形態に特有の事項を中心に説明する。 [Second Embodiment]
Next, the second embodiment will be described. The matters already described in the previous embodiment may be omitted below. Here, the matters peculiar to the present embodiment will be mainly described.

図３は、本実施形態による再配信システム（再配信装置）の概略機能構成と、同システムにおけるコンテンツデータの流れとを示す概略図である。図示するように、再配信システム１１は、受信部２２０と、編集部２４０と、統合部２６０と、配信部２８０とを含んで構成される。 FIG. 3 is a schematic diagram showing a schematic functional configuration of a redistribution system (redistribution device) according to the present embodiment and a flow of content data in the system. As shown in the figure, the redistribution system 11 includes a reception unit 220, an editorial unit 240, an integration unit 260, and a distribution unit 280.

再配信システム１１は、映像および音声のコンテンツ（「音声Ａ」と呼ぶ）を受信し、受信した音声のコンテンツに基づく別の音声のコンテンツ（「音声Ｂ」と呼ぶ）を生成し、受信した元のコンテンツ（音声Ａをも含む）と、生成した音声のコンテンツ（音声Ｂ）とを統合したコンテンツを、再配信するものである。 The redistribution system 11 receives video and audio content (referred to as "audio A"), generates another audio content (referred to as "audio B") based on the received audio content, and receives the source. The content (including the voice A) and the generated voice content (voice B) are integrated and re-distributed.

受信部２２０は、少なくとも１種類の映像のコンテンツと、少なくとも１種類の音声のコンテンツとを含む第１パッケージを受信する。具体的には、受信部２２０は、インターネット等の通信回線を介して配信されるストリーミング映像および音声（「音声Ａ」と呼ぶ）のコンテンツを受信する。受信部２２０が受信する映像および音声Ａは、エンコードされた状態で、例えば外部の配信サーバー等から送信されたものである。なお、一例として、受信専用のコンピューター装置などを用いて、受信部２２０を実現することが可能である。 The receiving unit 220 receives the first package including at least one type of video content and at least one type of audio content. Specifically, the receiving unit 220 receives streaming video and audio (referred to as “audio A”) content delivered via a communication line such as the Internet. The video and audio A received by the receiving unit 220 are, for example, transmitted from an external distribution server or the like in an encoded state. As an example, it is possible to realize the receiving unit 220 by using a computer device dedicated to receiving.

編集部２４０は、第１パッケージに含まれる少なくとも１種類の音声のコンテンツである第１音声を再生するとともに、その第１音声と、その第１音声に対応して入力される別の音声とを重畳して得られる第２音声を生成して新たなコンテンツとして出力する。つまり、編集部２４０は、受信部２２０が外部から受信したコンテンツのうち、少なくとも音声Ａのコンテンツを受け取り、再生する。なお、編集部２４０が、映像のコンテンツをも受け取って再生するようにしてもよい。そして、編集部２４０は、音声Ａのコンテンツと、編集部２４０に接続されたマイクロホン等から集音された音声とを、音声の帯域において混合し、所定の符号化方式でエンコードして、新たな音声（「音声Ｂ」と呼ぶ）のコンテンツとして出力する。
一例として、アナウンサーや解説者らが、編集部２４０で再生されるコンテンツ（映像および音声Ａ）を視聴しながら、実況あるいは解説等を行う。つまり、アナウンサーや解説者らは、自身の声をマイクロホン等に向けて発し、その声を含む音声Ｂのコンテンツを編集部２４０が生成する。このように、デコードされた第１音声を再生して、アナウンサーや解説者らがその第１音声をリアルタイムで聞きながら自身の声を発する場合には、音声を処理するための遅延時間が生じないか、その遅延時間は無視できるほどに小さい。よって、第１音声と新たな音声とは、適切なタイミングで混合され、音声Ｂが生成される。
なお、他の方法によって音声Ｂを作成してもよい。その場合、音声Ｂの作成にあたっては、必要に応じて、第１音声と別の音声（アナウンサーや解説者らが発する声）とのタイミングが整合するように、タイミング合わせのための適切な処理を行ってもよい。
なお、コンピューターを用いて、編集部２４０を実現することも可能である。一例として、パーソナルコンピューターやスマートフォン（スマホ）などの、個人用の情報機器などを用いて、編集部２４０を実現することも可能である。 The editorial unit 240 reproduces the first voice, which is the content of at least one type of voice included in the first package, and the first voice and another voice input corresponding to the first voice. The second voice obtained by superimposing is generated and output as new content. That is, the editing unit 240 receives and reproduces at least the content of the voice A among the contents received from the outside by the receiving unit 220. The editorial unit 240 may also receive and reproduce the video content. Then, the editorial unit 240 mixes the content of the voice A and the voice collected from the microphone or the like connected to the editorial unit 240 in the voice band, encodes it by a predetermined coding method, and newly encodes it. It is output as audio content (referred to as "audio B").
As an example, announcers and commentators give live commentary or commentary while watching the content (video and audio A) played by the editorial unit 240. That is, the announcer and the commentator emit their own voice toward the microphone or the like, and the editorial unit 240 generates the content of the voice B including the voice. In this way, when the decoded first voice is reproduced and the announcer or the commentator utters his / her own voice while listening to the first voice in real time, there is no delay time for processing the voice. Or, the delay time is negligibly small. Therefore, the first voice and the new voice are mixed at an appropriate timing, and the voice B is generated.
The voice B may be created by another method. In that case, when creating voice B, if necessary, perform appropriate processing for timing adjustment so that the timing of the first voice and another voice (voice emitted by announcers and commentators) matches. You may go.
It is also possible to realize the editorial unit 240 by using a computer. As an example, it is also possible to realize the editorial unit 240 by using a personal information device such as a personal computer or a smartphone (smartphone).

統合部２６０は、第１パッケージに含まれる映像のコンテンツおよび音声のコンテンツと、上記の新たなコンテンツとの間で、再生のタイミングが整合するように統合して出力する。統合部２６０は、元々受信部２２０が受信したコンテンツ（映像と音声Ａ）と、編集部２４０から渡されたコンテンツ（音声Ｂ）とを統合して、配信部２８０に渡す。統合部２６０は、これらのコンテンツを統合する際、受信部２２０が受信したコンテンツ（映像と音声Ａ）と、編集部２４０から渡されたコンテンツ（音声Ｂ）との間で、タイミングが相互に整合するように調整する。また、統合部２６０は、受信部２２０から渡される音声Ａと、編集部２４０から渡される音声Ｂとの間の、レベル調整を行う。なお、統合部２６０は、映像および音声Ａを、エンコードされたままの状態で受信部２２０から受け取る。そして、そのままの状態で、音声Ｂとの統合を行う。
なお、統合部２６０が、受信部２２０から渡されたコンテンツの全部（映像と音声Ａ）ではなく、それらの一部のみを、編集部２４０から渡されるコンテンツ（音声Ｂ）と統合するようにしてもよい。その場合、例えば、統合部２６０は、受信部２２０から渡される映像、および編集部２４０から渡される音声Ｂだけを統合してもよい。また、例えば、受信部２２０から渡される音声Ａ、および編集部２４０から渡される音声Ｂだけを統合してもよい。これらのいずれの場合にも、統合部２６０は、コンテンツを統合する際、受信部２２０が受信したコンテンツと、編集部２４０から渡されたコンテンツとの間で、タイミングが相互に整合するように調整する。
つまり、統合部２６０は、受信部２２０が受信した第１パッケージに含まれるコンテンツのうちの少なくとも一部のコンテンツと、編集部２４０によって生成された新たなコンテンツとを、一つの第２パッケージとして統合して出力する。
なお、統合部２６０に依るタイミングの調整およびレベルの調整の処理は、自動的に行われる。統合部２６０がタイミングを調整する方法の詳細については、後で述べる。 The integration unit 260 integrates and outputs the video content and audio content included in the first package and the above-mentioned new content so that the reproduction timing is matched. The integration unit 260 integrates the content (video and audio A) originally received by the reception unit 220 and the content (audio B) passed from the editorial unit 240, and delivers the content (audio B) to the distribution unit 280. When the integration unit 260 integrates these contents, the timings of the content received by the reception unit 220 (video and audio A) and the content passed from the editorial unit 240 (audio B) are mutually matched. Adjust to do. Further, the integration unit 260 adjusts the level between the voice A passed from the receiving unit 220 and the voice B passed from the editing unit 240. The integrated unit 260 receives the video and audio A from the receiving unit 220 in the encoded state. Then, the integration with the voice B is performed as it is.
It should be noted that the integration unit 260 integrates not all of the content (video and audio A) passed from the reception unit 220, but only a part of them with the content (audio B) passed from the editorial unit 240. May be good. In that case, for example, the integration unit 260 may integrate only the video passed from the receiving unit 220 and the audio B passed from the editing unit 240. Further, for example, only the voice A passed from the receiving unit 220 and the voice B passed from the editing unit 240 may be integrated. In any of these cases, the integration unit 260 adjusts the timing so that the content received by the reception unit 220 and the content passed from the editorial unit 240 are mutually matched when integrating the contents. do.
That is, the integration unit 260 integrates at least a part of the content included in the first package received by the reception unit 220 and the new content generated by the editorial unit 240 as one second package. And output.
The timing adjustment and level adjustment processing by the integration unit 260 is automatically performed. Details of how the integration unit 260 adjusts the timing will be described later.

配信部２８０は、統合部２６０から出力されたコンテンツを、配信する。配信部２８０は、インターネット等を介して、コンテンツを配信する。 The distribution unit 280 distributes the content output from the integration unit 260. The distribution unit 280 distributes the content via the Internet or the like.

図４は、本実施形態において、配信サーバー装置から配信されたコンテンツを再配信システムが再配信する際のコンテンツの流れを示す概略図である。同図において、受信部２２０と編集部２４０と統合部２６０と配信部２８０とは、図３にも示した通り、再配信システム１１を構成する装置（またはその一部の機能）である。また、再配信システム１１を構成するこれらの機能と、配信サーバー装置２と、クライアント装置３とは、それぞれインターネットに接続されており相互に通信可能である。なお、通信のために、インターネット以外の手段を用いてもよい。なお、図４において、クライアント装置３を１台のみ示しているが、実際には、多数のクライアント装置３が配信部２８０からの配信を受けるようにしてよい。 FIG. 4 is a schematic diagram showing the flow of content when the redistribution system redistributes the content distributed from the distribution server device in the present embodiment. In the figure, the receiving unit 220, the editing unit 240, the integrated unit 260, and the distribution unit 280 are devices (or a part of the functions thereof) constituting the redistribution system 11 as shown in FIG. Further, these functions constituting the redistribution system 11, the distribution server device 2 and the client device 3 are each connected to the Internet and can communicate with each other. In addition, a means other than the Internet may be used for communication. Although only one client device 3 is shown in FIG. 4, in reality, a large number of client devices 3 may receive distribution from the distribution unit 280.

図示するように、配信サーバー装置２は、映像および音声を含むコンテンツを、インターネット経由で配信する。コンテンツの配信には、例えば、前述のＨＬＳを用いる。受信部２２０は、配信サーバー装置２から配信された上記コンテンツを受信する。受信部２２０は、受信したコンテンツである映像および音声（音声Ａ）を、インターネット経由で、または他の回線等を経由して、統合部２６０に渡す。また、受信部２２０は、受信したコンテンツのうちの少なくとも音声Ａを（必要に応じて映像をも）、インターネット経由で、または他の回線等を経由して、編集部２４０に渡す。編集部２４０は、受信部２２０から受信したコンテンツに基づいて、音声Ａとは異なる音声コンテンツである音声Ｂを生成する。なお、音声Ｂ内に、音声Ａが混合されていてもよい。典型的な適用例においては、音声Ａはイベント等が行われている現地からの生中継音声であり、音声Ｂは、編集部２４０を用いるアナウンサーや解説者等が、音声Ａに、マッチした発話を混合させたものである。編集部２４０は、音声Ａを有するファイルに含まれるタイミング情報を参照し、音声Ｂに前記タイミング情報を付加してエンコードし、ファイルとして出力する。ここで、タイミング情報とは、例えばＰＴＳ（プレゼンテーションタイムスタンプ）である。さらに、編集部２４０は、音声Ａと音声Ｂとの間で再生タイミングを一致させるためのメタデータ（音声Ａと音声Ｂとの間で対応付けられるファイル名等のデータ）を生成する。そして、編集部２４０は、音声Ｂを、新たな音声のコンテンツとして統合部２６０に渡す。なお、編集部２４０は、この音声のコンテンツ（音声Ｂ）を統合部２６０に渡す際、インターネット経由で送信してもよいし、その他の回線等を経由して送信してもよい。 As shown in the figure, the distribution server device 2 distributes content including video and audio via the Internet. For the distribution of the content, for example, the above-mentioned HLS is used. The receiving unit 220 receives the above-mentioned contents distributed from the distribution server device 2. The receiving unit 220 passes the received video and audio (audio A) to the integrated unit 260 via the Internet or another line or the like. Further, the receiving unit 220 passes at least the audio A (including the video if necessary) of the received content to the editorial unit 240 via the Internet or another line or the like. The editing unit 240 generates audio B, which is audio content different from audio A, based on the content received from the reception unit 220. The voice A may be mixed in the voice B. In a typical application example, the voice A is a live broadcast voice from the site where an event or the like is performed, and the voice B is an utterance matched to the voice A by an announcer or a commentator using the editorial unit 240. Is a mixture of. The editorial unit 240 refers to the timing information included in the file having the voice A, adds the timing information to the voice B, encodes it, and outputs it as a file. Here, the timing information is, for example, PTS (presentation time stamp). Further, the editorial unit 240 generates metadata (data such as a file name associated between the voice A and the voice B) for matching the reproduction timing between the voice A and the voice B. Then, the editorial unit 240 passes the audio B to the integrated unit 260 as new audio content. When the audio content (audio B) is passed to the integrated unit 260, the editorial unit 240 may transmit it via the Internet or may transmit it via another line or the like.

統合部２６０は、受信部２２０から受け取ったコンテンツと、編集部２４０から受け取ったコンテンツとを、再生タイミングを一致させるためのメタデータ（ファイル名の対応関係等のデータ）に基づいて統合する。統合部２６０が行う重要な処理の一つは、受信部２２０側からのコンテンツと編集部２４０側からのコンテンツとの間で、上記のメタデータ（ファイル名の対応関係等）に基づいてタイミングを合わせることである。つまり、統合部２６０がコンテンツ間での同期を取ることにより、編集部２４０で生成された音声Ｂのコンテンツは、受信部２２０側からの映像および音声（音声Ａ）のそれぞれと、整合したタイミングで配信することが可能となる。統合部２６０は、タイミングを整合させる対象となる受信部２２０側からのコンテンツと編集部２４０側からのコンテンツの到達時刻が不一致となる場合を考慮し、コンテンツを蓄積するバッファ領域を備える。統合部２６０は、統合されたコンテンツを、配信部２８０に渡す。そして、配信部２８０は、統合部２６０から渡されたコンテンツの全体を、インターネット経由で配信する。コンテンツの配信には、例えば、前述のＨＬＳを用いる。クライアント装置３は、配信部２８０から再配信されたコンテンツを受信し、デコードして再生する。なお、クライアント装置３は、映像のコンテンツを再生するとともに、適宜、音声Ａあるいは音声Ｂのいずれか一方の音声のコンテンツを再生するようにしてよい。 The integration unit 260 integrates the content received from the reception unit 220 and the content received from the editorial unit 240 based on the metadata (data such as the correspondence of file names) for matching the reproduction timing. One of the important processes performed by the integration unit 260 is to set the timing between the content from the receiving unit 220 side and the content from the editing unit 240 side based on the above metadata (correspondence of file names, etc.). It is to match. That is, the content of the audio B generated by the editorial unit 240 is synchronized with the video and audio (audio A) from the receiving unit 220 side by synchronizing the contents with the integrated unit 260. It will be possible to deliver. The integration unit 260 includes a buffer area for accumulating the content in consideration of the case where the arrival time of the content from the reception unit 220 side and the content from the editing unit 240 side to be matched in timing do not match. The integration unit 260 passes the integrated content to the distribution unit 280. Then, the distribution unit 280 distributes the entire content passed from the integration unit 260 via the Internet. For the distribution of the content, for example, the above-mentioned HLS is used. The client device 3 receives the content redistributed from the distribution unit 280, decodes it, and reproduces it. The client device 3 may reproduce the video content and appropriately reproduce the audio content of either the audio A or the audio B.

次に、本実施形態において配信されるデータの形式等について説明する。
図５は、本実施形態において受信部２２０が受信するストリーミング配信データの構成例を示す概略図である。図示するように、配信サーバー装置２側から配信されるデータは、階層構造で構成されている。同図では、最も左側が最上位の階層、真中が中間の階層、最も右側が最下位の階層を表している。最上位の階層では、１個のインデックスファイルが存在しており、そのファイル名は「ＩｎｄｅｘＦｉｌｅ．ｍ３ｕ８」である。このインデックスファイル「ＩｎｄｅｘＦｉｌｅ．ｍ３ｕ８」は、下位層（中間の階層）の別の３種類のインデックスファイルの所在の情報（ファイル名、パス名等）を保持している。それらの３種類のインデックスファイルは、「Ａｌｔｅｒｎａｔｅ−ＬｏｗＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＭｉｄＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＨｉＩｎｄｅｘ」とである。これらの３種類のインデックスファイルは、適宜、確保可能な通信帯域幅に応じて使い分けることができる。例えば、配信を受けるクライアント装置側のユーザーが、低帯域幅、中帯域幅、高帯域幅の３種類の中から一つを指定できるようにする。「Ａｌｔｅｒｎａｔｅ−ＬｏｗＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＭｉｄＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＨｉＩｎｄｅｘ」のそれぞれは、所定時間長（例えば、６秒など）ごとの動画ファイルの所在情報のリストを保持している。一例として、インデックスファイル「Ａｌｔｅｒｎａｔｅ−ＬｏｗＩｎｄｅｘ」は、「Ｌｏｗ＿０１．ｔｓ」と、「Ｌｏｗ＿０２．ｔｓ」と、「Ｌｏｗ＿０３．ｔｓ」と、「Ｌｏｗ＿０４．ｔｓ」との４つの動画ファイルの所在の情報を保持している。なお、「Ｌｏｗ＿０１．ｔｓ」と、「Ｌｏｗ＿０２．ｔｓ」と、「Ｌｏｗ＿０３．ｔｓ」と、「Ｌｏｗ＿０４．ｔｓ」とは、順次再生されるべき動画ファイルである。なお、インデックスファイル「Ａｌｔｅｒｎａｔｅ−ＬｏｗＩｎｄｅｘ」は、４個に限らず、任意の数の動画ファイルの所在情報を持つことができる。ここではインデックスファイル「Ａｌｔｅｒｎａｔｅ−ＬｏｗＩｎｄｅｘ」を例として説明したが、「Ａｌｔｅｒｎａｔｅ−ＭｉｄＩｎｄｅｘ」と「Ａｌｔｅｒｎａｔｅ−ＨｉＩｎｄｅｘ」のそれぞれもまた、帯域幅に応じた動画ファイルの所在情報を保持する。 Next, the format and the like of the data to be delivered in this embodiment will be described.
FIG. 5 is a schematic diagram showing a configuration example of streaming distribution data received by the receiving unit 220 in the present embodiment. As shown in the figure, the data distributed from the distribution server device 2 side has a hierarchical structure. In the figure, the leftmost layer represents the highest level, the middle level represents the middle level, and the rightmost side represents the lowest level. At the highest level, one index file exists, and its file name is "IndexFile.m3u8". This index file "IndexFile.m3u8" holds information (file name, path name, etc.) of the locations of the other three types of index files in the lower layer (middle layer). These three types of index files are "Alternate-LowIndex", "Alternate-MidIndex", and "Alternate-HiIndex". These three types of index files can be appropriately used according to the available communication bandwidth. For example, the user on the client device side who receives the distribution can specify one of three types of low bandwidth, medium bandwidth, and high bandwidth. Each of "Alternate-LowIndex", "Alternate-MidIndex", and "Alternate-HiIndex" holds a list of location information of a moving image file for each predetermined time length (for example, 6 seconds). As an example, the index file "Alternate-LowIndex" holds information on the locations of four video files, "Low_01.ts", "Low_02.ts", "Low_03.ts", and "Low_04.ts". doing. In addition, "Low_01.ts", "Low_02.ts", "Low_03.ts", and "Low_04.ts" are moving image files to be sequentially played. The index file "Alternate-LowIndex" is not limited to four, and can have the location information of any number of moving image files. Here, the index file "Alternate-LowIndex" has been described as an example, but each of "Alternate-MidIndex" and "Alternate-HiIndex" also retains the location information of the moving image file according to the bandwidth.

なお、図５に示すデータ構成の場合、音声（音声Ａ）は、それぞれの動画ファイル（Ｌｏｗ＿０１．ｔｓや、Ｍｉｄ＿０１．ｔｓや、Ｈｉ＿０１．ｔｓなど）の中に含まれている。
一方、音声（音声Ａ）を独立のファイルとして配信サーバー装置２側から配信し、受信部２２０がその音声ファイルをも受信するようにしてもよい。この場合、音声は、適切な長さに分割されて、時間の経過に沿った複数のファイルとして配信される。また、それらの音声ファイルは、動画ファイルをインデックスしているのと同一のインデックスファイルによってインデックスされている。 In the case of the data structure shown in FIG. 5, the audio (audio A) is included in each moving image file (Low_01.ts, Mid_01.ts, Hi_01.ts, etc.).
On the other hand, the voice (voice A) may be distributed as an independent file from the distribution server device 2 side, and the receiving unit 220 may also receive the voice file. In this case, the audio is divided into appropriate lengths and delivered as a plurality of files over time. Also, those audio files are indexed by the same index file that indexes the video files.

図６は、本実施形態において統合部２６０が出力し、配信部２８０が配信するストリーミング配信データの構成例を示す概略図である。図示するように、統合部２８０が配信するデータもまた、階層構造で構成されている。図５で説明したデータ構成と同様に、最も左側が最上位の階層、真中が中間の階層、最も右側が最下位の階層を表している。最上位の階層では、１個のインデックスファイルが存在しており、そのファイル名は「ＩｎｄｅｘＦｉｌｅ．ｍ３ｕ８」である。このインデックスファイル「ＩｎｄｅｘＦｉｌｅ．ｍ３ｕ８」は、下位層（中間の階層）の別の５種類のインデックスファイルの所在の情報（ファイル名、パス名等）を保持している。それらの５種類のインデックスファイルは、「Ａｌｔｅｒｎａｔｅ−ＬｏｗＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＭｉｄＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＨｉＩｎｄｅｘ」と、「ｍｉｘｅｄ」と、「ｏｒｉｇｉｎａｌ」とである。 FIG. 6 is a schematic diagram showing a configuration example of streaming distribution data output by the integrated unit 260 and distributed by the distribution unit 280 in the present embodiment. As shown in the figure, the data distributed by the integration unit 280 is also configured in a hierarchical structure. Similar to the data structure described with reference to FIG. 5, the leftmost layer represents the highest layer, the middle layer represents the middle layer, and the rightmost layer represents the lowest layer. At the highest level, one index file exists, and its file name is "IndexFile.m3u8". This index file "IndexFile.m3u8" holds information (file name, path name, etc.) of the locations of another five types of index files in the lower layer (middle layer). These five types of index files are "Alternate-LowIndex", "Alternate-MidIndex", "Alternate-HiIndex", "mixed", and "original".

このうち、「Ａｌｔｅｒｎａｔｅ−ＬｏｗＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＭｉｄＩｎｄｅｘ」と、「Ａｌｔｅｒｎａｔｅ−ＨｉＩｎｄｅｘ」との３種類は、図５で説明したデータ構成と同様、動画のファイルに関するインデックスである。これらの３種類のインデックスファイルの下位の動画ファイルも図５で説明したデータ構成と同様のものである。 Of these, three types, "Alternate-LowIndex", "Alternate-MidIndex", and "Alternate-HiIndex", are indexes related to moving image files, similar to the data structure described with reference to FIG. The lower moving image files of these three types of index files have the same data structure as described with reference to FIG.

また、中間階層の上記５種類のインデックスファイルのうち、「ｍｉｘｅｄ」と、「ｏｒｉｇｉｎａｌ」との２種類は、それぞれ、音声のファイルをインデックスする。「ｍｉｘｅｄ」と「ｏｒｉｇｉｎａｌ」とのそれぞれは、所定時間長（例えば、６秒など）ごとの音声ファイルの所在情報のリストを保持している。一例として、インデックスファイル「ｍｉｘｅｄ」は、「ｍｉｘｅｄ＿０１．ｔｓ」と、「ｍｉｘｅｄ＿０２．ｔｓ」と、「ｍｉｘｅｄ＿０３．ｔｓ」と、「ｍｉｘｅｄ＿０４．ｔｓ」との４つの音声ファイルの所在の情報を保持している。なお、「ｍｉｘｅｄ＿０１．ｔｓ」と、「ｍｉｘｅｄ＿０２．ｔｓ」と、「ｍｉｘｅｄ＿０３．ｔｓ」と、「ｍｉｘｅｄ＿０４．ｔｓ」とは、順次再生されるべき音声ファイルである。なお、インデックスファイル「ｍｉｘｅｄ」は、４個に限らず、任意の数の音声ファイルの所在情報を持つことができる。「ｍｉｘｅｄ」と全く同様に、「ｏｒｉｇｉｎａｌ」も、所定時間長（例えば、６秒など）ごとの別の音声ファイルの所在情報のリストを保持している。つまり、「ｏｒｉｇｉｎａｌ」は、「ｏｒｉｇｉｎａｌ＿０１．ｔｓ」と、「ｏｒｉｇｉｎａｌ＿０２．ｔｓ」と、「ｏｒｉｇｉｎａｌ＿０３．ｔｓ」と、「ｏｒｉｇｉｎａｌ＿０４．ｔｓ」との４つの、順次再生されるべき音声ファイルの所在の情報を保持する。 Further, among the above five types of index files in the middle layer, two types, "mixed" and "original", respectively, index audio files. Each of "mixed" and "original" holds a list of the location information of the audio file for each predetermined time length (for example, 6 seconds). As an example, the index file "mixed" holds information on the locations of four audio files, "mixed_01.ts", "mixed_02.ts", "mixed_03.ts", and "mixed_04.ts". There is. In addition, "mixed_01.ts", "mixed_02.ts", "mixed_03.ts", and "mixed_04.ts" are audio files to be sequentially reproduced. The index file "mixed" is not limited to four, and can have the location information of any number of audio files. Just like "mixed", "original" holds a list of the location information of another audio file for each predetermined time length (for example, 6 seconds). That is, "original" refers to information on the locations of four audio files to be sequentially reproduced, "original_01.ts", "original_02.ts", "original_03.ts", and "original_04.ts". Hold.

なお、上記のインデックスファイル「ｍｉｘｅｄ」がインデックスする音声ファイル（ｍｉｘｅｄ＿０１．ｔｓなど）は、編集部２４０によって出力される音声（音声Ｂ）を含むものである。また、インデックスファイル「ｏｒｉｇｉｎａｌ」がインデックスする音声フィアル（ｏｒｉｇｉｎａｌ＿０１．ｔｓなど）は、受信部２２０が配信サーバー装置２側から受信したオリジナルの音声（音声Ａ）を含むものである。 The audio file (mixed_01.ts, etc.) indexed by the index file "mixed" includes the audio (audio B) output by the editorial unit 240. Further, the voice file (original_01.ts or the like) indexed by the index file "original" includes the original voice (voice A) received by the receiving unit 220 from the distribution server device 2 side.

元の配信サーバー装置２から音声Ａの独立のファイルが配信される場合には、統合部２６０は、そのファイルをそのまま「ｏｒｉｇｉｎａｌ」によってインデックスされる音声ファイルとして出力すればよい。
元の配信サーバー装置２から配信される音声Ａが、配信される動画ファイル内にしか存在しない場合には、統合部２６０は、それらの動画ファイルから音声を抽出して音声ファイルを生成する。そして、統合部２６０は、生成された音声ファイルを、「ｏｒｉｇｉｎａｌ」によってインデックスされる音声ファイルとして出力すればよい。 When an independent file of audio A is distributed from the original distribution server device 2, the integration unit 260 may output the file as it is as an audio file indexed by "original".
When the audio A distributed from the original distribution server device 2 exists only in the video files to be distributed, the integration unit 260 extracts audio from those video files to generate an audio file. Then, the integration unit 260 may output the generated audio file as an audio file indexed by "original".

図７は、本実施形態が用いるインデックスファイルの構成例を示す概略図である。なお、ここに例示するファイルは、階層構造における最上位のインデックスファイルである。このインデックスファイルのファイル名は「ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。図示するように、インデックスファイル「ｐｌａｙｌｉｓｔ．ｍ３ｕ８」は、拡張Ｍ３Ｕ形式のファイルであり、その内部にはインデックス情報を表すテキストを含んでいる。なお、図７において、便宜的にテキストの各行に対応する行番号を付している。以下、インデックスファイル「ｐｌａｙｌｉｓｔ．ｍ３ｕ８」の内容を説明する。 FIG. 7 is a schematic diagram showing a configuration example of the index file used in this embodiment. The file exemplified here is the top-level index file in the hierarchical structure. The file name of this index file is "playlist.m3u8". As shown in the figure, the index file "playlist.m3u8" is an extended M3U format file, and contains text representing index information inside the index file "playlist.m3u8". In FIG. 7, line numbers corresponding to each line of the text are added for convenience. Hereinafter, the contents of the index file "playlist.m3u8" will be described.

第１行目は、当ファイルが拡張Ｍ３Ｕ形式のファイルであることを示すヘッダーである。
第２行目と第３行目は、音声のコンテンツに関する情報を保持する。第２行目と第３行目は、ともに「ＴＹＰＥ＝ＡＵＤＩＯ」という記述を含んでおり、これは、第２行目と第３行目がそれぞれ音声のコンテンツのインデックスであることを示す。また、第２行目と第３行目は、ともに「ＧＲＯＵＰ−ＩＤ＝”ａｕｄｉｏ”」という記述を含んでおり、これは、第２行目と第３行目がともに「ａｕｄｉｏ」という識別情報によって識別されるグループに属することを示す。
これらのうち、第２行目は、「ＮＡＭＥ＝”ｍｉｘｅｄ”」という記述を含んでおり、これは、混合音声であること、即ち編集部２４０において付加音声が付加されたもの（つまり、音声Ｂ）であることを示すものである。また、第２行目は、「ＤＥＦＡＵＬＴ＝ＹＥＳ」という記述を含んでおり、これは、デフォルトの音声であることを示している。また、第２行目は、当該音声に関する下位のインデックスファイルの所在情報を保持している。「ＵＲＩ＝”ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８”」という記述がその所在情報にあたる。
一方で、第３行目は、「ＮＡＭＥ＝”ｏｒｉｇｉｎａｌ”」という記述を含んでおり、これは、混合される前のオリジナルの音声であることを示している。即ち、付加音声が付加されていない、受信部２２０が受信した音声（音声Ａ）であることを示すものである。また、第３行目は、「ＤＥＦＡＵＬＴ＝ＮＯ」という記述を含んでおり、これは、デフォルトの音声ではないことを示している。また、第３行目は、当該音声に関する下位のインデックスファイルの所在情報を保持している。「ＵＲＩ＝”ｏｒｉｇｉｎａｌ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８”」という記述がその所在情報にあたる。 The first line is a header indicating that this file is an extended M3U format file.
The second and third lines hold information about the audio content. The second and third lines both contain the description "TYPE = AUDIO", which indicates that the second and third lines are indexes of audio content, respectively. Further, the second line and the third line both include the description "GROUP-ID =" audio "", which is the identification information that both the second line and the third line are "audio". Indicates that it belongs to the group identified by.
Of these, the second line includes the description "NAME =" mixed "", which is a mixed voice, that is, a voice to which an additional voice is added in the editorial unit 240 (that is, voice B). ). The second line also includes the description "DEFAULT = YES", which indicates that it is the default voice. In addition, the second line holds the location information of the lower index file related to the voice. "URI =" mixed / playlist. The description "m3u8" corresponds to the location information.
On the other hand, the third line contains the description "NAME =" original "", which indicates that it is the original audio before being mixed. That is, it indicates that the voice (voice A) is received by the receiving unit 220 without the additional voice added. Further, the third line includes the description "DEFAULT = NO", which indicates that it is not the default voice. In addition, the third line holds the location information of the lower index file related to the voice. "URI =" original / playlist. The description "m3u8" corresponds to the location information.

第４行目は、コンテンツの当該セグメントが、独立セグメントであることを表す情報である。つまり、当該セグメントのコンテンツをデコードするために他のセグメントからの情報を必要としないことを表す。 The fourth line is information indicating that the segment of the content is an independent segment. That is, it means that information from other segments is not required to decode the content of the segment.

第５行目から第１６行目までは、６種類の映像ファイルのインデックスの情報を含むものである。
第５行目および第６行目は、第１の映像のインデックスの情報を保持する。第１の映像は、帯域幅（BANDWIDTH）および平均帯域幅（AVERAGE-BANDWIDTH）がともに「５４５６００」（単位は、ビット毎秒）である。また、この映像ストリームをデコードするためのコーデック（codec）は「ａｖｃ１．６６．３０」と「ｍｐ４ａ．４０．２」である。また、この映像の解像度は「４８０ｘ２７０」である。また、この映像のインデックスファイルは、「ｓｔｒｅａｍ１／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。
第７行目および第８行目は、第２の映像のインデックスの情報を保持する。第２の映像は、帯域幅（BANDWIDTH）および平均帯域幅（AVERAGE-BANDWIDTH）がともに「７６５６００」（ビット毎秒）である。また、この映像ストリームをデコードするためのコーデック（codec）は「ａｖｃ１．６６．３０」と「ｍｐ４ａ．４０．２」である。また、この映像の解像度は「６４０ｘ３６０」である。また、この映像のインデックスファイルは、「ｓｔｒｅａｍ２／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。
第９行目および第１０行目は、第３の映像のインデックスの情報を保持する。第３の映像は、帯域幅（BANDWIDTH）および平均帯域幅（AVERAGE-BANDWIDTH）がともに「１４２５６００」（ビット毎秒）である。また、この映像ストリームをデコードするためのコーデック（codec）は「ａｖｃ１．４２ｃ０１ｆ」と「ｍｐ４ａ．４０．２」である。また、この映像の解像度は「６４０ｘ３６０」である。また、この映像のインデックスファイルは、「ｓｔｒｅａｍ３／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。 The fifth to sixteenth lines include index information of six types of video files.
The fifth and sixth lines hold the information of the index of the first video. In the first video, the bandwidth (BANDWIDTH) and the average bandwidth (AVERAGE-BANDWIDTH) are both "545600" (unit: bits per second). The codecs for decoding this video stream are "avc1.66.30" and "mp4a.40.2". The resolution of this video is "480x270". The index file of this video is "stream1 / playlist.m3u8".
The seventh line and the eighth line hold the information of the index of the second video. In the second video, the bandwidth (BANDWIDTH) and the average bandwidth (AVERAGE-BANDWIDTH) are both "765600" (bits per second). The codecs for decoding this video stream are "avc1.66.30" and "mp4a.40.2". The resolution of this video is "640x360". The index file of this video is "stream2 / playlist.m3u8".
The 9th and 10th lines hold the information of the index of the third video. In the third video, the bandwidth (BANDWIDTH) and the average bandwidth (AVERAGE-BANDWIDTH) are both "1425600" (bits per second). The codecs for decoding this video stream are "avc1.42c01f" and "mp4a.40.2". The resolution of this video is "640x360". The index file of this video is "stream3 / playlist.m3u8".

第１１行目および第１２行目は、第４の映像のインデックスの情報を保持する。第４の映像は、帯域幅（BANDWIDTH）および平均帯域幅（AVERAGE-BANDWIDTH）がともに「３９５５６００」（ビット毎秒）である。また、この映像ストリームをデコードするためのコーデック（codec）は「ａｖｃ１．４ｄ４０１ｆ」と「ｍｐ４ａ．４０．２」である。また、この映像の解像度は「９６０ｘ５４０」である。また、この映像のインデックスファイルは、「ｓｔｒｅａｍ４／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。
第１３行目および第１４行目は、第５の映像のインデックスの情報を保持する。第５の映像は、帯域幅（BANDWIDTH）および平均帯域幅（AVERAGE-BANDWIDTH）がともに「５６４０８００」（ビット毎秒）である。また、この映像ストリームをデコードするためのコーデック（codec）は「ａｖｃ１．４ｄ４０１ｆ」と「ｍｐ４ａ．４０．２」である。また、この映像の解像度は「１２８０ｘ７２０」である。また、この映像のインデックスファイルは、「ｓｔｒｅａｍ５／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。
第１５行目および第１６行目は、第６の映像のインデックスの情報を保持する。第６の映像は、帯域幅（BANDWIDTH）および平均帯域幅（AVERAGE-BANDWIDTH）がともに「７２９０８００」（ビット毎秒）である。また、この映像ストリームをデコードするためのコーデック（codec）は「ａｖｃ１．４ｄ４０１ｆ」と「ｍｐ４ａ．４０．２」である。また、この映像の解像度は「１２８０ｘ７２０」である。また、この映像のインデックスファイルは、「ｓｔｒｅａｍ６／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。 The eleventh line and the twelfth line hold the information of the index of the fourth video. In the fourth video, the bandwidth (BANDWIDTH) and the average bandwidth (AVERAGE-BANDWIDTH) are both "3955600" (bits per second). The codecs for decoding this video stream are "avc1.4d401f" and "mp4a.40.2". The resolution of this video is "960x540". The index file of this video is "stream4 / playlist.m3u8".
The thirteenth line and the fourteenth line hold the information of the index of the fifth video. In the fifth video, the bandwidth (BANDWIDTH) and the average bandwidth (AVERAGE-BANDWIDTH) are both "5640800" (bits per second). The codecs for decoding this video stream are "avc1.4d401f" and "mp4a.40.2". The resolution of this video is "1280x720". The index file of this video is "stream5 / playlist.m3u8".
The 15th and 16th lines hold the information of the index of the sixth video. The sixth video has both a bandwidth (BANDWIDTH) and an average bandwidth (AVERAGE-BANDWIDTH) of "7290800" (bits per second). The codecs for decoding this video stream are "avc1.4d401f" and "mp4a.40.2". The resolution of this video is "1280x720". The index file of this video is "stream6 / playlist.m3u8".

なお、上記の第１の映像から第６の映像までに共通して、フレームレート（FRAME-RATE
）は「３０．０００」と定義されている。また、第１の映像から第６の映像までの映像に関してすべて「ＡＵＤＩＯ＝”ａｕｄｉｏ”」という記述が含まれている。これは、各映像に関連付けられる音声のコンテンツは、”ａｕｄｉｏ”というグループＩＤで識別されるものであることを表す。つまり、各映像に関連付けられる音声のコンテンツは、第２行目または第３行目で定義されているものである。 It should be noted that the frame rate (FRAME-RATE) is common to the above first to sixth images.
) Is defined as "30.000". Further, all the images from the first image to the sixth image include the description "AUDIO =" audio "". This means that the audio content associated with each video is identified by the group ID "audio". That is, the audio content associated with each video is defined in the second or third line.

図８は、本実施形態が用いるインデックスファイルの例を示す概略図である。ここに示すファイルは、図７で示した最上位のインデックスファイルから参照される下位のインデックスファイルである。このインデックスファイルのファイル名は「ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。図７で示した最上位のインデックスファイルの第２行目の記述における「ＵＲＩ＝”ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８”」という記述が、この図８のファイルの所在を示している。ここで「ｍｉｘｅｄ」はディレクトリ名であり、このディレクトリは混合音声（音声Ｂ）用のファイルを格納するディレクトリである。つまり、このインデックスファイル「ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」は、混合音声に関するインデックスの情報を保持する。このインデックスファイル「ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」もまた、拡張Ｍ３Ｕ形式のファイルである。なお、図８においても、テキストの各行に対応する行番号を付している。以下、インデックスファイル「ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」の内容を説明する。 FIG. 8 is a schematic diagram showing an example of an index file used in this embodiment. The file shown here is a lower index file referenced from the highest index file shown in FIG. 7. The file name of this index file is "mixed / playlist.m3u8". In the description of the second line of the top-level index file shown in FIG. 7, "URI =" mixed / playlist. The description "m3u8" "indicates the location of the file in FIG. Here, "mixed" is a directory name, and this directory is a directory for storing files for mixed voice (voice B). That is, this index file "mixed / playlist.m3u8" holds the index information regarding the mixed voice. This index file "mixed / playlist.m3u8" is also an extended M3U format file. Also in FIG. 8, line numbers corresponding to each line of the text are assigned. Hereinafter, the contents of the index file "mixed / playlist.m3u8" will be described.

第１行目は、当ファイルが拡張Ｍ３Ｕ形式のファイルであることを示すヘッダーである。
第２行目は、ファイル形式のバージョン情報である。具体的には、このファイル形式のバージョンが「３」であることを示している。
第３行目の「#ＥＸＴ-Ｘ-ＴＡＲＧＥＴＤＵＲＡＴＩＯＮ」は、次に追加される予定のメディアファイルの予測時間長を示すものである。本データの例では、予測時間長は６秒である。
第４行目の「#ＥＸＴ-Ｘ-ＭＥＤＩＡ-ＳＥＱＵＥＮＣＥ」は、本インデックスファイルが含む最初のメディアファイルのシーケンス番号を表す。本データ例では、最初のシーケンス番号は「４１７５５４」（第８行目で指定されているファイルのファイル名に、この番号が含まれている）である。
第５行目の「#ＥＸＴ-Ｘ-ＤＩＳＣＯＮＴＩＮＵＩＴＹ-ＳＥＱＵＥＮＣＥ」については、説明を省略する。 The first line is a header indicating that this file is an extended M3U format file.
The second line is the version information of the file format. Specifically, it indicates that the version of this file format is "3".
The third line "# EXT-X-TARGETDURATION" indicates the estimated time length of the media file to be added next. In the example of this data, the predicted time length is 6 seconds.
The fourth line "# EXT-X-MEDIA-SEQUENCE" represents the sequence number of the first media file included in this index file. In this data example, the first sequence number is "417554" (this number is included in the file name of the file specified in the eighth line).
The description of "# EXT-X-DISCONTINUITY-SEQUENCE" on the fifth line will be omitted.

第６行目から第３５行目までにおいて、３行ずつのまとまりを持つ組が、１０回（計３０行）繰り返されている。各組における第１行は、メディアファイルを日付・時刻に関連付ける。また、第２行は、そのメディアセグメントの長さを秒単位で表す。また、第３行は、メディアファイルそのものを参照するための情報である。 From the 6th line to the 35th line, a set having a group of 3 lines is repeated 10 times (30 lines in total). The first line in each set associates the media file with the date / time. The second line represents the length of the media segment in seconds. The third line is information for referring to the media file itself.

ここでは、例として、第６行目から第８行目までの組について説明する。
第６行目の「＃ＥＸＴ-Ｘ-ＰＲＯＧＲＡＭ-ＤＡＴＥ-ＴＩＭＥ」は、参照されるメディアファイルを、日時に関連付ける。本データ例では、最初のメディアファイルは「２０１７-０５-１１Ｔ１６:１９:０２．８６６+０９:００」（年月日・時分秒および千分の一秒の表記）で示される日時（世界標準時から９時間先行する時間帯における日時）に関連付けられる。
第７行目の「＃ＥＸＴＩＮＦ」は、この組に対応するメディアセグメントの長さを表す。具体的には、その長さは６．０００秒であることが指定されている。なお、「６．０００」に後続するコンマの次には、タイトルを指定可能であるが、本データではタイトルの記述が省略されている。
第８行目は、この組のメディアファイル（ここでは、混合音声（音声Ｂ）の音声ファイル）のファイル名を記述している。本データでは、具体的には、「ｔｅｓｔ２＿２７０＿４１７５５４.ｔｓ」である。 Here, as an example, the set from the 6th line to the 8th line will be described.
The sixth line "# EXT-X-PROGRAM-DATE-TIME" associates the referenced media file with the date and time. In this data example, the first media file is the date and time (world) indicated by "2017-05-11T16: 19: 02.866 + 09: 00" (year, month, day, hour, minute, second, and one-thousandth of a second notation). It is associated with the date and time in the time zone 9 hours ahead of standard time).
The 7th line "#EXTINF" represents the length of the media segment corresponding to this set. Specifically, the length is specified to be 6,000 seconds. A title can be specified after the comma following "6.00", but the description of the title is omitted in this data.
The eighth line describes the file name of this set of media files (here, the audio file of mixed audio (audio B)). In this data, specifically, it is "test2_270_417554.ts".

この組に後続する９組においても、同様に、日時の情報と、メディアセグメントの長さの情報と、そのメディアセグメントにおけるメディアファイルのファイル名の情報とが記述されている。具体的な日時、メディアセグメントの長さ、ファイル名は、図面に記載されている通りであるため、ここでは説明を省略する。 Similarly, in the nine sets following this set, information on the date and time, information on the length of the media segment, and information on the file name of the media file in the media segment are described. Since the specific date and time, the length of the media segment, and the file name are as described in the drawings, the description thereof is omitted here.

以上のように、ここに例示したインデックスファイルは、混合音声のファイルについて、１０セグメント分の情報を保持している。また、各セグメントの長さは６秒であり、１０セグメント分の合計の長さは６０秒である。 As described above, the index file exemplified here holds information for 10 segments of the mixed voice file. The length of each segment is 6 seconds, and the total length of 10 segments is 60 seconds.

統合部２６０は、上の図７に例示したインデックスファイルを生成して出力する。つまり、統合部２６０は、音声Ａ（ＮＡＭＥ＝”ｏｒｉｇｉｎａｌ”）と音声Ｂ（ＮＡＭＥ＝”ｍｉｘｅｄ”）の両方を含むコンテンツを、配信部２８０に渡す。配信部２８０は、そのように統合部２６０によって統合されたコンテンツを、クライアント装置３に配信する。 The integration unit 260 generates and outputs the index file illustrated in FIG. 7 above. That is, the integration unit 260 passes the content including both the voice A (NAME = "original") and the voice B (NAME = "mixed") to the distribution unit 280. The distribution unit 280 distributes the content thus integrated by the integration unit 260 to the client device 3.

次に、統合部２６０が、編集部２４０によって生成された（音声）音声Ｂのタイミングを、受信部２２０からわたされた映像および音声（音声Ａ）のタイミングに合わせる方法の詳細について説明する。本実施形態の方法では、ファイルに含まれる提示時刻情報を利用する。つまり、受信部２２０が受信する映像および音声（音声Ａ）のファイルには、再生のタイミング情報（ＰＴＳ，プレゼンテーションタイムスタンプ）と、再生時間の長さの情報とが含まれている。ＨＬＳを用いる場合は、受信部２２０は、映像・音声データを含むＴＳ（ＴｒａｎｓｐｏｒｔＳｔｒｅａｍ）ファイルからタイミング情報（ＰＴＳ）を取得できる。また、配信サーバー装置２から配信されるインデックスファイル（Ｍ３Ｕ８ファイル）の「＃ＥＸＴＩＮＦ」の記述から、再生時間の長さの情報を取得することができる。編集部２４０は、元の音声Ａを再生しながら音声Ｂ（混合音声）を生成するが、その際、音声Ａのファイルに含まれていたタイミング情報および再生時間の長さの情報を、そのまま音声Ｂに埋め込む。例えば、音声の入力が開始した時点のタイミング情報（ＰＴＳ−１）を取得し、音声Ｂを生成する際に出力ストリームの先頭のタイミング情報を、前記ＰＴＳ−１とするように出力する。さらに、Ｍ３Ｕ８ファイルから取得した再生時間の長さが５秒の場合は、出力ストリームを５秒ごとのファイルに分割して生成する。つまり、編集部２４０は、音声Ａを構成する個々のファイルと同一のタイミング情報および再生時間の長さの情報を有する、音声Ｂを生成し出力する。そして、統合部２６０は、音声Ａと音声Ｂのファイルにおけるタイミング情報および再生時間の長さの情報が同一であることを確認して、受信部２２０からわたされた映像および音声（音声Ａ）と生成した音声Ｂの再生タイミングが整合するように、映像、音声Ａ、音声Ｂの情報を含む新たなＭ３Ｕ８ファイルを生成し、ＨＬＳコンテンツとして、Ｍ３Ｕ８ファイル、映像のＴＳファイル、音声ＡのＴＳファイル、音声ＢのＴＳファイルを配信する。 Next, the integration unit 260 will explain in detail a method of matching the timing of the (audio) audio B generated by the editorial unit 240 with the timing of the video and audio (audio A) delivered from the reception unit 220. In the method of this embodiment, the presentation time information included in the file is used. That is, the video and audio (audio A) files received by the receiving unit 220 include reproduction timing information (PTS, presentation time stamp) and reproduction time length information. When HLS is used, the receiving unit 220 can acquire timing information (PTS) from a TS (Transport Stream) file including video / audio data. In addition, information on the length of the playback time can be obtained from the description of "#EXTINF" in the index file (M3U8 file) distributed from the distribution server device 2. The editorial unit 240 generates the voice B (mixed voice) while playing the original voice A, and at that time, the timing information and the information of the length of the playback time included in the file of the voice A are used as they are. Embed in B. For example, the timing information (PTS-1) at the time when the voice input is started is acquired, and when the voice B is generated, the timing information at the head of the output stream is output so as to be the PTS-1. Further, when the length of the reproduction time acquired from the M3U8 file is 5 seconds, the output stream is divided into files every 5 seconds and generated. That is, the editorial unit 240 generates and outputs the voice B having the same timing information and information of the length of the reproduction time as the individual files constituting the voice A. Then, the integration unit 260 confirms that the timing information and the playback time length information in the audio A and audio B files are the same, and the video and audio (audio A) passed from the reception unit 220. A new M3U8 file containing video, audio A, and audio B information is generated so that the playback timing of the generated audio B matches, and the HLS contents include the M3U8 file, the video TS file, and the audio A TS file. Deliver the TS file of voice B.

つまり、編集部２４０は、第１パッケージに含まれるコンテンツが保持するタイミング情報に基づいて、整合するタイミング情報を、生成する新たなコンテンツに付与するものである。また、統合部２６０は、第１パッケージに含まれるコンテンツが保持するタイミング情報と、上記の新たなコンテンツに付与されたタイミング情報とに基づいて、再生のタイミングが整合するようにする。 That is, the editorial unit 240 adds matching timing information to the new content to be generated based on the timing information held by the content included in the first package. Further, the integration unit 260 makes the timing of reproduction consistent based on the timing information held by the content included in the first package and the timing information given to the new content.

これにより、再配信システム１１が受信したオリジナルのコンテンツと、再配信システム１１が付加したコンテンツとの間でタイミングが合った状態で、コンテンツの再配信を行うことが可能となる。 This makes it possible to redistribute the content in a state where the original content received by the redistribution system 11 and the content added by the redistribution system 11 are in time with each other.

［第２実施形態：変形例］
次に、第２実施形態の変形例について説明する。この変形例の基本的な構成は、第２実施形態におけるそれと同一であるが、統合部２６０が音声Ａと音声Ｂとの間のタイミングを合わせる方法の部分が第２実施形態とは異なる。 [Second Embodiment: Modification Example]
Next, a modified example of the second embodiment will be described. The basic configuration of this modification is the same as that in the second embodiment, but the part of the method in which the integration unit 260 adjusts the timing between the voice A and the voice B is different from the second embodiment.

この変形例において、統合部２６０は、次の通り、音声Ａと音声Ｂとのタイミングを合わせる。編集部２４０は、オリジナルの音声（音声Ａ）にアナウンサー等の発話などを混合した混合音声（音声Ｂ）を生成する。つまり、編集部２４０が生成する音声Ｂのデータには、音声Ａの情報も含まれている。統合部２６０は、音声Ａ（「比較用音声」とも呼ぶ）と音声Ｂ（発話によるコメントが付加されているため「コメント音声」とも呼ぶ）とを取得する。なお、統合部２６０は、音声Ａを、受信部２２０から直接取得してもよいし、編集部２４０から取得してもよい。統合部２６０は、音声Ｂの中に音声Ａの信号が含まれていることを利用して、音声Ａと音声Ｂのタイミングを合わせるための処理を実行する。 In this modification, the integration unit 260 adjusts the timing of the voice A and the voice B as follows. The editorial unit 240 generates a mixed voice (voice B) in which the original voice (voice A) is mixed with the utterance of an announcer or the like. That is, the data of the voice B generated by the editorial unit 240 also includes the information of the voice A. The integration unit 260 acquires voice A (also referred to as "comparison voice") and voice B (also referred to as "comment voice" because a comment by utterance is added). The integrated unit 260 may acquire the voice A directly from the receiving unit 220 or may acquire the voice A from the editing unit 240. The integration unit 260 executes a process for matching the timing of the voice A and the voice B by utilizing the fact that the signal of the voice A is included in the voice B.

その一例として、統合部２６０は、次の計算を行う。音声Ａおよび音声Ｂを、それぞれ、Ｓ_Ａ（ｔ）およびＳ_Ｂ（ｔ）で表す。Ｓ_Ａ（ｔ）およびＳ_Ｂ（ｔ）は、それぞれ、時刻ｔにおける信号値（例えば、音声信号の振幅）である。統合部２６０に音声Ａと音声Ｂとが届くとき、その時点までのプロセスの経路の違いにより、両者のタイミングがずれている可能性がある。そのずれ量をΔｔ（デルタ・ｔ）とする。図３等に示す処理を装置として構成した場合の音声Ａと音声Ｂとの間のタイミングのずれ量は、通常は最大でも１秒未満、特殊なケースでもせいぜい数秒以内と想定することは妥当である。そして、統合部２６０は、時刻ｔを含む所定の時間区間において、信号Ｓ_Ａ（ｔ）と信号Ｓ_Ｂ（ｔ＋Δｔ）との相互相関値を算出する。その相互相関値はｃ＝ｃｏｒｒ（Ｓ_Ａ（ｔ），Ｓ_Ｂ（ｔ＋Δｔ））と表される。ここで、ｃｏｒｒ（）は、２つの信号の相互相関値を求める関数である。そして、統合部２６０は、上記の相互相関値ｃを最大化するようなずれ量Δｔを求める。そして、統合部２６０は、求められたずれ量Δｔに基づいてタイミング情報（ＰＴＳ）の値を変更し、音声Ａと音声Ｂのコンテンツのタイミングを合わせて、出力する。 As an example, the integration unit 260 performs the following calculation. The audio A and audio B, and _is represented by _S A (t) and _S B (t). S _A (t) and _S B (t) are respectively the signal value at time t (for example, the amplitude of the audio signal). When the voice A and the voice B reach the integrated unit 260, the timings of the two may be different due to the difference in the process route up to that point. Let the deviation amount be Δt (delta t). It is reasonable to assume that the amount of timing deviation between voice A and voice B when the process shown in FIG. 3 or the like is configured as a device is usually less than 1 second at the maximum, and within several seconds at the most even in a special case. be. Then, integrating unit 260, at a predetermined time interval including the time t, it calculates a cross-correlation value of the signal _S A (t) and the signal _S B (t + _Δt). Its cross-correlation value c = corr represented as _{_{(S A (t), S}} B (t + Δt)). Here, corr () is a function for obtaining the cross-correlation value of the two signals. Then, the integration unit 260 obtains a deviation amount Δt that maximizes the cross-correlation value c. Then, the integration unit 260 changes the value of the timing information (PTS) based on the obtained deviation amount Δt, matches the timing of the contents of the voice A and the voice B, and outputs the timing.

なお、上記の関数ｃｏｒｒ（）により相互相関値を算出する際、音声Ａの信号レベルと音声Ｂの信号レベルとを、適宜、調整するようにしてもよい。また、ここでの信号レベルの調整量を、例えば機械学習等に基づいて、自動的に求めるようにしてもよい。
また、ここでは相互相関値を用いて音声Ａと音声Ｂのタイミングを合わせる方法を例として挙げたが、統合部２６０が他の方法によって両者のタイミングを合わせるようにしてもよい。例えば、音声Ａと音声Ｂの信号波形を、画像処理によって比較し、両者の波形の一致度が最も高くなるずれ量Δｔを求めてもよい。 When calculating the cross-correlation value by the above function corr (), the signal level of the voice A and the signal level of the voice B may be adjusted as appropriate. Further, the adjustment amount of the signal level here may be automatically obtained based on, for example, machine learning.
Further, although the method of matching the timings of the voice A and the voice B using the cross-correlation value is given as an example, the integration unit 260 may adjust the timings of both by another method. For example, the signal waveforms of the voice A and the voice B may be compared by image processing to obtain the deviation amount Δt at which the degree of coincidence between the two waveforms is the highest.

整理すると、統合部２６０は、第１音声の波形と第２音声の波形との類似性に基づいて、第１音声のコンテンツを含む第１パッケージのコンテンツと、編集部２４０によって生成された新たなコンテンツである第２音声との、いずれか一方を時間方向に移動させることによって、再生のタイミングが整合するように第１音声と第２音声とを統合して出力する。 To summarize, the integration unit 260 is based on the similarity between the waveform of the first voice and the waveform of the second voice, the content of the first package including the content of the first voice, and the new content generated by the editorial unit 240. By moving either one of the second audio as the content in the time direction, the first audio and the second audio are integrated and output so that the timing of reproduction is matched.

［第３実施形態］
次に、第３実施形態について説明する。なお、前実施形態以前において既に説明した事項については以下において説明を省略する場合がある。ここでは、本実施形態に特有の事項を中心に説明する。 [Third Embodiment]
Next, the third embodiment will be described. The matters already explained before the previous embodiment may be omitted below. Here, the matters peculiar to the present embodiment will be mainly described.

図９は、本実施形態による再配信システム（再配信装置）の概略機能構成と、同システムにおけるコンテンツデータの流れとを示す概略図である。図示するように、再配信システム１２は、受信部３２０と、編集部３４０と、統合部３６０と、配信部３８０とを含んで構成される。 FIG. 9 is a schematic diagram showing a schematic functional configuration of a redistribution system (redistribution device) according to the present embodiment and a flow of content data in the system. As shown in the figure, the redistribution system 12 includes a reception unit 320, an editorial unit 340, an integration unit 360, and a distribution unit 380.

再配信システム１２は、映像および音声のコンテンツを受信する。そして、再配信システム１２は、受信した音声のコンテンツに基づいて、字幕テキストのコンテンツを生成する。そして、再配信システム１２は、受信したオリジナルのコンテンツと、生成した字幕テキストのコンテンツとを、再生・提示するタイミングがあった状態で、再配信するものである。 The redistribution system 12 receives video and audio content. Then, the redistribution system 12 generates the content of the subtitle text based on the content of the received audio. Then, the redistribution system 12 redistributes the received original content and the generated subtitle text content at the timing of reproduction / presentation.

受信部３２０は、少なくとも１種類の音声のコンテンツを含む第１パッケージを受信する。具体的には、例えば、受信部３２０は、外部の配信サーバー装置から、映像および音声で構成されるコンテンツを、ストリーミングの形式で受信する。受信部３２０は、受信した映像のファイルおよび音声のファイルを、統合部３６０に送信する。また、受信部３２０は、受信した音声のファイルを、編集部３４０に送信する。 The receiving unit 320 receives the first package containing at least one type of audio content. Specifically, for example, the receiving unit 320 receives content composed of video and audio from an external distribution server device in a streaming format. The reception unit 320 transmits the received video file and audio file to the integration unit 360. Further, the receiving unit 320 transmits the received audio file to the editing unit 340.

編集部３４０は、第１パッケージに含まれる少なくとも１種類の音声のコンテンツの音声認識処理を行うことによってその音声のコンテンツに対応する字幕テキストのコンテンツを、新たなコンテンツとして生成する。編集部３４０は、音声認識エンジンを内部に備えており、入力された音声を文字列に変換する機能を有する。また、編集部３４０は、音声から変換された文字列を、さらに字幕テキストデータの形式に整形し、ライブストリーミングにおける映像の一部として表示可能な形態のファイルとして出力する。このとき、編集部３４０は、元の音声のファイルに含まれているタイミング情報（ＰＴＳ，プレゼンテーションタイムスタンプ）と、ファイル内での時刻の相対位置等に基づいて、字幕テキストデータの断片ごとにタイミング情報を付与する。なお、編集部３４０は、例えば、タイムド・テキスト・マークアップ言語（ＴＴＭＬ，Timed Text Markup Language）等の、タイミング情報を付加することのできるデータ形式で、字幕テキストを出力することができる。編集部３４０は、音声に基づいて生成された字幕テキストデータのファイルを、統合部３６０に送信する。
なお、音声認識エンジン自体には、既存の技術を適用することができる。音声認識エンジンは、基本的な処理として、入力される音声の音響的特徴を抽出し、必要に応じて言語としての特徴を考慮に入れながら、統計的に確からしい文字列を音声認識結果のテキストとして出力するものである。 The editorial unit 340 generates the content of the subtitle text corresponding to the content of the voice as new content by performing the voice recognition processing of the content of at least one type of voice included in the first package. The editorial unit 340 has a voice recognition engine inside, and has a function of converting the input voice into a character string. Further, the editorial unit 340 further formats the character string converted from the audio into the format of subtitle text data, and outputs the file as a file that can be displayed as a part of the video in live streaming. At this time, the editorial unit 340 determines the timing for each fragment of the subtitle text data based on the timing information (PTS, presentation time stamp) contained in the original audio file and the relative position of the time in the file. Give information. The editorial unit 340 can output the subtitle text in a data format to which timing information can be added, such as, for example, a timed text markup language (TTML, Timed Text Markup Language). The editorial unit 340 transmits the file of the subtitle text data generated based on the voice to the integrated unit 360.
The existing technology can be applied to the speech recognition engine itself. As a basic process, the speech recognition engine extracts the acoustic characteristics of the input speech, and if necessary, takes into account the characteristics as a language, and produces a statistically probable character string as the text of the speech recognition result. Is output as.

統合部３６０は、音声のコンテンツに含まれる音声信号と生成された字幕テキストとの間の時間方向の対応関係に基づいて、音声のコンテンツの再生のタイミングと字幕テキストの提示のタイミングが整合するように統合して出力する。つまり、統合部３６０は、受信部３２０から受け取った映像および音声のコンテンツのファイルと、編集部３４０から受け取った字幕テキストのファイルとを、パッケージとして統合して、配信部３８０に渡す。より具体的には、統合部３６０は、音声のコンテンツと字幕テキストのコンテンツとの間でのタイミング情報が整合している状態で、コンテンツのデータを出力する。なお、統合部３６０は、映像および音声のコンテンツを、エンコードされたままの状態で受信部３２０から受け取る。そして、そのままの状態で、字幕テキストのコンテンツとの統合を行う。 The integration unit 360 ensures that the timing of playing the audio content and the timing of presenting the subtitle text are aligned based on the temporal correspondence between the audio signal contained in the audio content and the generated subtitle text. It is integrated into and output. That is, the integration unit 360 integrates the video and audio content files received from the reception unit 320 and the subtitle text file received from the editorial unit 340 as a package and passes them to the distribution unit 380. More specifically, the integration unit 360 outputs the content data in a state where the timing information between the audio content and the subtitle text content is consistent. The integrated unit 360 receives the video and audio contents from the receiving unit 320 in the encoded state. Then, as it is, the subtitle text is integrated with the content.

配信部３８０は、インターネット等を経由して、統合部３６０から渡されたコンテンツのファイルを配信する。具体的には、配信部３８０は、映像と音声と字幕テキストのコンテンツを配信する。 The distribution unit 380 distributes a file of the content passed from the integration unit 360 via the Internet or the like. Specifically, the distribution unit 380 distributes video, audio, and subtitle text content.

［第３実施形態：変形例１］
次に、第３実施形態の変形例１について説明する。この変形例の基本的な構成は、第２実施形態におけるそれと同一であるが、統合部３６０が、さらに言語翻訳を行う点が、特徴的な構成である。 [Third Embodiment: Modification 1]
Next, a modification 1 of the third embodiment will be described. The basic configuration of this modification is the same as that in the second embodiment, but the characteristic configuration is that the integration unit 360 further performs language translation.

第３実施形態の変形例１において、編集部３４０は、言語翻訳エンジンを備える。言語翻訳エンジンは、自然言語によるテキストの他国語への翻訳を行う。例えば、統合部３６０は、音声認識処理の結果として得られた日本語のテキストを、英語に翻訳し、英語の字幕テキストデータを出力する。あるいは、編集部３４０は、音声認識処理の結果として得られたフランス語のテキストを、日本語に翻訳し、日本語の字幕テキストデータを出力する。なお、翻訳元と翻訳先の言語は、ここに例示したもの以外であってもよい。なお、元の音声に付加されていたタイミング情報に基づいて、翻訳後の字幕テキストにもタイミング情報が付与される。編集部３４０は、翻訳後の字幕テキストを、統合部３６０に送信する。その後の処理は、既に述べた形態における処理と同様である。 In the first modification of the third embodiment, the editorial unit 340 includes a language translation engine. The language translation engine translates texts in natural language into other languages. For example, the integration unit 360 translates the Japanese text obtained as a result of the voice recognition process into English and outputs English subtitle text data. Alternatively, the editorial unit 340 translates the French text obtained as a result of the voice recognition process into Japanese and outputs the Japanese subtitle text data. The languages of the translation source and the translation destination may be other than those exemplified here. The timing information is also added to the translated subtitle text based on the timing information added to the original voice. The editorial unit 340 transmits the translated subtitle text to the integration unit 360. Subsequent processing is the same as the processing in the above-described form.

［第３実施形態：変形例２］
次に、第３実施形態の変形例２について説明する。この変形例の基本的な構成は、第２実施形態におけるそれと同一であるが、統合部３６０が、さらに手話への翻訳を行う点が、特徴的な構成である。
なお、言語翻訳の機能自体には、既存の技術を適用すれば良い。 [Third Embodiment: Modification 2]
Next, a modification 2 of the third embodiment will be described. The basic configuration of this modification is the same as that in the second embodiment, but the characteristic configuration is that the integration unit 360 further translates into sign language.
The existing technology may be applied to the language translation function itself.

第３実施形態の変形例２において、編集部３４０は、手話への翻訳機能を備える。言語翻訳エンジンは、音声認識処理の結果得られたテキストデータを、手話表現に翻訳する。そして、編集部３４０は、翻訳後の手話表現に対応する映像のコンテンツを生成し、出力する。手話は、例えば、コンピューターグラフィクス（ＣＧ）を用いて映像として表される。なお、元の音声に付加されていたタイミング情報に基づいて、出力される手話の映像にもタイミング情報が付与される。編集部３４０は、生成された手話の映像のデータを、統合部３６０に送信する。統合部３６０は、第３実施形態で説明した字幕テキストデータの代わりに、手話の映像のデータを、配信部３８０に渡す。配信部３８０は、元の映像および音声のコンテンツと、編集部３４０によって生成された手話の映像とを、配信する。 In the second modification of the third embodiment, the editorial unit 340 has a function of translating into sign language. The language translation engine translates the text data obtained as a result of the speech recognition process into a sign language expression. Then, the editorial unit 340 generates and outputs video content corresponding to the translated sign language expression. Sign language is represented as an image using, for example, computer graphics (CG). The timing information is also added to the output sign language video based on the timing information added to the original audio. The editorial unit 340 transmits the generated sign language video data to the integrated unit 360. The integration unit 360 passes the sign language video data to the distribution unit 380 instead of the subtitle text data described in the third embodiment. The distribution unit 380 distributes the original video and audio content and the sign language video generated by the editorial unit 340.

なお、上述した実施形態およびその変形例における再配信システムの機能や、再配信システムを構成する一部の装置の機能をコンピューターで実現するようにしても良い。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 It should be noted that the functions of the redistribution system in the above-described embodiment and its modifications and the functions of some of the devices constituting the redistribution system may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. The "computer-readable recording medium" is a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, a DVD-ROM, or a USB memory, or a storage device such as a hard disk built in a computer system. That means. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that is a server or a client in that case. Further, the above-mentioned program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system.

以上、説明した各実施形態またはその変形例のいずれかによれば、再配信システムは、インターネット等を介して、例えばＨＬＳ等の手段を用いて配信されるコンテンツを受信する。言い換えれば、再配信システムは、ベースバンド信号（非圧縮信号）で構成されるコンテンツを受信しない。そして、再配信システムは、受信したコンテンツの少なくとも一部に基づいて、別の新たなコンテンツを生成する。そして、再配信システムは、受信したオリジナルのコンテンツと、生成した新たなコンテンツとを統合したうえで、再配信する。再配信もまた、例えば、ＨＬＳ等を用いる。これにより、クライアント装置は、新たなコンテンツが付加された状態でコンテンツのストリーミング配信を受けることが可能となる。 According to any of the above-described embodiments or variations thereof, the redistribution system receives the content distributed by means such as HLS via the Internet or the like. In other words, the redelivery system does not receive content composed of baseband signals (uncompressed signals). The redelivery system then generates another new content based on at least a portion of the received content. Then, the redistribution system integrates the received original content and the generated new content, and then redistributes the content. Redelivery also uses, for example, HLS and the like. As a result, the client device can receive the streaming distribution of the content with the new content added.

そして、各実施形態またはその変形例によれば、最小限の工程および機材により、再配信システムを実現することが可能となり、システムを構築したり運用したりするコストを抑えられる。また、例えば、インターネットに接続できる環境さえあれば基本的にどこにおいても、配信形式のストリーミング映像に対して、音声等の新たなコンテンツを付加して再配信するサービスを実現することができる。 Then, according to each embodiment or a modification thereof, it is possible to realize a redistribution system with a minimum number of processes and equipment, and it is possible to suppress the cost of constructing and operating the system. Further, for example, it is possible to realize a service of adding new contents such as audio to a distribution format streaming video and redistributing it basically anywhere as long as there is an environment that can connect to the Internet.

コストに関して言えば、ベースバンド信号（非圧縮信号）のプロセッシングを行う高価な特殊機器が不要であり、インターネットにより映像の伝送が可能となるため、伝送コストの大幅な削減が期待できる。さらに、汎用的なコンピューターと、その上で稼働するソフトウェアのみでの処理が可能となるため、インターネット接続可能な場所であればどこからも、コンテンツを付加するサービスを実現することができる。また、元のコンテンツ（映像や音声等）と、付加するコンテンツ（たとえば、音声等）のタイミングを再配信システム内で自動的に同期させることができる。これにより、既存のストリーム映像音声にリアルタイムで新たなコンテンツ（音声等）を付加するという流れを１つにし、サービスの容易な実現が可能となる。コンテンツ配信等のサービスにおいて上の実施形態等で説明した構成を適用することにより、多様で、機動力に富んだサービスを提供することができるようになる。 In terms of cost, there is no need for expensive special equipment that processes baseband signals (uncompressed signals), and video can be transmitted via the Internet, so a significant reduction in transmission costs can be expected. Furthermore, since processing can be performed only with a general-purpose computer and software running on it, it is possible to realize a service for adding content from any place where an Internet connection is possible. Further, the timing of the original content (video, audio, etc.) and the content to be added (for example, audio, etc.) can be automatically synchronized in the redistribution system. This makes it possible to easily realize the service by unifying the flow of adding new contents (audio, etc.) in real time to the existing stream video / audio. By applying the configuration described in the above embodiment to services such as content distribution, it becomes possible to provide various and agile services.

なお、再配信システムが新たに付加するコンテンツは、音声のコンテンツに限られない。既に説明した例では、テキスト（いわゆる字幕テキストを含む）や、映像（一例として手話の映像）を生成して付加することができる。また、ここに例示したもの以外のコンテンツを、生成して付加することも可能となる。 The content newly added by the redistribution system is not limited to audio content. In the example already described, text (including so-called subtitle text) and video (sign language video as an example) can be generated and added. It is also possible to generate and add content other than those illustrated here.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではない。さらなる変形例で実施するようにしてもよい。また、この発明の要旨を逸脱しない範囲の設計等を行ってもよい。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment. It may be carried out by a further modification. Further, the design may be performed within a range that does not deviate from the gist of the present invention.

例えば、上記の実施形態では、映像や音声のコンテンツを配信するための形式としてＨＬＳを用いたが、他の形式によって配信するようにしてもよい。例えば、ＭＰＥＧ−ＤＡＳＨや、ＨＤＳや、ＭＳＳｍｏｏｔｈＳｔｒｅａｍｉｎｇなどといった形式も、使用することができる。 For example, in the above embodiment, HLS is used as a format for delivering video or audio content, but it may be delivered in another format. For example, formats such as MPEG-DASH, HDS, and MS Streaming can also be used.

本発明は、例えばコンテンツを配信する事業等に利用することができる。ただし、産業上の利用可能性は、ここに例示した分野には限定されない。 The present invention can be used, for example, in a business of distributing contents. However, industrial applicability is not limited to the fields exemplified here.

１再配信システム（再配信装置）
２配信サーバー装置
３クライアント装置
１１，１２再配信システム（再配信装置）
１２０受信部
１４０編集部
１６０統合部
１８０配信部
２２０受信部
２４０編集部
２６０統合部
２８０配信部
３２０受信部
３４０編集部
３６０統合部
３８０配信部 1 Redelivery system (redelivery device)
2 Distribution server device 3 Client device 11,12 Redelivery system (redelivery device)
120 Receiving unit 140 Editing unit 160 Integration unit 180 Distribution unit 220 Reception unit 240 Editing unit 260 Integration unit 280 Distribution unit 320 Reception unit 340 Editing unit 360 Integration unit 380 Distribution unit

Claims

A receiver for receiving a first package comprising at least one content encoded in the HTTP Live Streaming (HLS) format,
An editorial unit that generates and outputs new content based on at least a part of the contents included in the first package received by the receiving unit, and an editorial unit.
At least a part of the contents included in the first package received by the receiving unit and the new contents generated by the editorial unit are integrated and output as one second package. With the integration department
A distribution unit that redistributes the second package output from the integration unit, and a distribution unit.
Equipped with
The receiver receives the first package including at least one type of video content, at least one type of audio content, and an index file.
The editorial unit reproduces the first voice, which is the content of at least one kind of voice included in the first package, and inputs the first voice and the sound collected by a microphone corresponding to the first voice. A second voice obtained by superimposing another voice to be generated is generated and output as the new content.
The integrated unit integrates and outputs the video content and the audio content included in the first package and the new content so that the reproduction timing is matched .
The editing unit, wherein the presentation time stamp is a timing information content holds included in the first package, based on the length of the information written playtime in the index file, included in the first package The presentation time stamp held by the content is added to the new content as matching timing information , and the new content is divided into lengths according to the length of the playback time described in the index file. Content is generated
The integrated unit includes a length of the presentation time stamp and the playback time contents included in the first package held, the length of the timing information added to the new content and and its playback time are the same By confirming that there is, and outputting the content included in the first package, the new content, and a new index file containing information on the new content , the playback timing is made consistent. ,
Re-delivery system that is characterized in that.

At least one receiving process of receiving a first package including a content encoded in the HTTP Live Streaming (HLS) format,
An editing process of generating and outputting new content based on at least a part of the contents included in the first package received in the receiving process.
At least a part of the contents included in the first package received in the receiving process and the new contents generated in the editing process are integrated and output as one second package. Integration process,
The distribution process of redistributing the second package output in the integration process,
The A including re-distribution method,
The receiving process receives the first package containing at least one type of video content, at least one type of audio content, and an index file.
In the editing process, the first voice, which is the content of at least one kind of voice included in the first package, is reproduced, and the first voice and the first voice are collected and input by a microphone corresponding to the first voice. A second voice obtained by superimposing another voice to be generated is generated and output as the new content.
The integration process integrates and outputs the video content and the audio content included in the first package and the new content so that the reproduction timing is matched.
The editing process is included in the first package based on the presentation time stamp which is the timing information held by the content included in the first package and the information on the length of the playback time described in the index file. The presentation time stamp held by the content is added to the new content as matching timing information, and the new content is divided into lengths according to the length of the playback time described in the index file. Content is generated
In the integration process, the presentation time stamp and the length of the playback time held by the content included in the first package, the timing information given to the new content, and the length of the playback time are the same. By confirming that, by outputting the content included in the first package, the new content, and the new index file containing the information of the new content, the playback timing is made consistent.
Redelivery method.

Computer,
A receiver for receiving a first package comprising at least one content encoded in the HTTP Live Streaming (HLS) format,
An editorial unit that generates and outputs new content based on at least a part of the contents included in the first package received by the receiving unit, and an editorial unit.
At least a part of the contents included in the first package received by the receiving unit and the new contents generated by the editorial unit are integrated and output as one second package. With the integration department
A distribution unit that redistributes the second package output from the integration unit, and a distribution unit.
Equipped with
The receiver receives the first package including at least one type of video content, at least one type of audio content, and an index file.
The editorial unit reproduces the first voice, which is the content of at least one kind of voice included in the first package, and inputs the first voice and the sound collected by a microphone corresponding to the first voice. A second voice obtained by superimposing another voice to be generated is generated and output as the new content.
The integrated unit integrates and outputs the video content and the audio content included in the first package and the new content so that the reproduction timing is matched.
The editorial unit is included in the first package based on the presentation time stamp which is the timing information held by the content included in the first package and the information of the length of the reproduction time described in the index file. The presentation time stamp held by the content is added to the new content as matching timing information, and the new content is divided into lengths according to the length of the playback time described in the index file. Content is generated
In the integrated unit, the presentation time stamp and the length of the playback time held by the content included in the first package, the timing information given to the new content, and the length of the playback time are the same. By confirming that, by outputting the content included in the first package, the new content, and the new index file containing the information of the new content, the playback timing is made consistent.
A program to function as a redelivery system.