JP7096732B2

JP7096732B2 - Content distribution equipment and programs

Info

Publication number: JP7096732B2
Application number: JP2018150817A
Authority: JP
Inventors: 壮田中; 均伊藤; 克幸杉森
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2018-08-09
Filing date: 2018-08-09
Publication date: 2022-07-06
Anticipated expiration: 2038-08-09
Also published as: JP2020027984A

Description

本発明は、インターネットを介して、字幕データを含む映像のライブストリーミングを行うコンテンツ配信装置及びプログラムに関する。 The present invention relates to a content distribution device and a program for live streaming video including subtitle data via the Internet.

従来、テレビ放送では、聴覚障碍者向け放送サービスとして、放送番組の音声を文字で画面上に表示する字幕放送を提供している。生放送番組において送出される字幕（以下、「生字幕」という。）は、生放送番組の音声から、人手による書き起こしにて制作される。このため、生字幕は、書き起こしの時間だけ遅延することとなり、生放送番組の音声に対して遅れて画面表示される。 Conventionally, in television broadcasting, as a broadcasting service for the hearing impaired, subtitled broadcasting is provided in which the sound of a broadcast program is displayed in characters on the screen. Subtitles transmitted in a live broadcast program (hereinafter referred to as "live subtitles") are produced by hand transcription from the audio of the live broadcast program. Therefore, the live subtitles are delayed by the transcription time, and are displayed on the screen with a delay with respect to the audio of the live broadcast program.

この生字幕の遅延を抑制するために、人手による書き起こしにて生字幕を制作する際には、音声認識技術または高速入力用キーボードの活用等の取り組みが行われている。一般に、字幕を制作する方式には、放送番組の音声から直接制作する方式、音声認識の精度を高めるために放送番組の音声を改めて雑音の少ない部屋で話し直す方式等がある。これらの方式の違いによって、字幕制作の遅延、放送番組の音声に対する字幕の再現性等が異なるのが現状である。 In order to suppress this delay in live subtitles, efforts are being made to utilize voice recognition technology or a high-speed input keyboard when producing live subtitles by manual transcription. In general, the method of producing subtitles includes a method of producing directly from the sound of a broadcast program, a method of re-speaking the sound of a broadcast program in a room with less noise in order to improve the accuracy of voice recognition, and the like. At present, the delay in subtitle production, the reproducibility of subtitles with respect to the audio of broadcast programs, etc. differ depending on the difference between these methods.

一方、近年のスマートフォン及び動画配信技術の普及により、放送番組を放送だけでなくインターネットでも同時に提供する需要が高まっている。 On the other hand, with the spread of smartphones and video distribution technologies in recent years, there is an increasing demand for simultaneously providing broadcast programs not only on the air but also on the Internet.

国外のいくつかの放送局においては、既に、番組を放送しながら同時に同じ番組をインターネットでも提供しており、このようなサービスは今後、日本国内でも展開されることが想定される。日本国内で同じサービスを提供するためには、放送と同等のサービスレベルをインターネットにおいても実現することが必要とされ、字幕サービスについても放送と同等のサービスレベルを実現することが必要とされる。 Some overseas broadcasting stations have already broadcast the same program on the Internet at the same time as broadcasting the program, and it is expected that such a service will be expanded in Japan in the future. In order to provide the same service in Japan, it is necessary to realize the same service level as broadcasting on the Internet, and it is also necessary to realize the same service level as broadcasting for the subtitle service.

また、近年の動画配信において広く使われている技術として、アダプティブストリーミングがある。アダプティブストリーミングは、マルチビットレートのコンテンツを、端末の通信速度に応じて配信する動画品質を変化させることによって、途切れ難い動画配信を実現する技術である。 In addition, adaptive streaming is a technique widely used in video distribution in recent years. Adaptive streaming is a technology that realizes uninterrupted video distribution by changing the video quality of multi-bit rate content according to the communication speed of the terminal.

具体的には、配信側は、コンテンツを複数のビットレートでエンコードし、数秒単位に分割したファイルを生成する。ストリーミングを受信する端末側は、端末自体の通信速度に合わせたビットレートのファイルを配信側から順次取得し、ファイルを繋ぎ合わせて再生を行う。これにより、通信速度が変動する端末においても、コンテンツの再生を継続することができ、中断し難い動画配信を実現することができる（例えば、非特許文献１を参照）。 Specifically, the distribution side encodes the content at a plurality of bit rates and generates a file divided into several seconds. The terminal side that receives the streaming sequentially acquires files with a bit rate that matches the communication speed of the terminal itself from the distribution side, and connects the files for playback. As a result, even in a terminal whose communication speed fluctuates, it is possible to continue playing the content and realize video distribution that is not interrupted (see, for example, Non-Patent Document 1).

しかしながら、アダプティブストリーミングにおいては、配信側は、数秒毎にファイル化するため、入力した映像音声データのコンテンツを一旦バッファに格納し、ファイルを生成することから、少なくとも数秒の遅延が発生する。 However, in adaptive streaming, since the distribution side creates a file every few seconds, the content of the input video / audio data is temporarily stored in the buffer and the file is generated, so that a delay of at least several seconds occurs.

生放送番組の映像コンテンツを、インターネットを介してライブストリーミングする場合、配信側は、生放送番組と同じ信号を使用して、アダプティブストリーミング用のファイル生成処理であるエンコードを行う。 When the video content of a live broadcast program is live-streamed via the Internet, the distribution side uses the same signal as the live broadcast program to perform encoding, which is a file generation process for adaptive streaming.

この場合も、生字幕は、生放送番組の音声から人手による書き起こしにて制作されることから、放送の場合と同様に、音声に対して遅れて画面表示される。 In this case as well, since the live subtitles are produced by manually transcribing the audio of the live broadcast program, the live subtitles are displayed on the screen later than the audio, as in the case of broadcasting.

このように、映像音声に対応する生字幕は遅れて画面表示されるが、この生字幕の表示遅延が小さいほど、番組内容への理解が容易になる。特に聴覚障碍者にとっては、生字幕が番組内容への理解の材料として大きな役割を果たすため、その効果が大きい。 In this way, the live subtitles corresponding to the video and audio are displayed on the screen with a delay, but the smaller the display delay of the live subtitles, the easier it is to understand the program content. Especially for hearing-impaired people, live subtitles play a major role as a material for understanding the contents of the program, so the effect is great.

ここで、番組内容と生字幕とを同期させる技術が提案されている（例えば、特許文献１を参照）。このコンテンツ配信装置は、入力した映像音声の音声に対し、音声認識処理によりテキストデータを生成すると共に、映像音声に対応した字幕データのテキストデータを入力する。そして、コンテンツ配信装置は、これらのテキストデータを比較し、両者が同じであると判定した場合、当該部分の生字幕の時刻を、音声認識処理により得た時刻に修正する。 Here, a technique for synchronizing program contents and live subtitles has been proposed (see, for example, Patent Document 1). This content distribution device generates text data for the input video / audio by voice recognition processing, and inputs text data of subtitle data corresponding to the video / audio. Then, the content distribution device compares these text data, and when it is determined that the two are the same, the content distribution device corrects the time of the live subtitle of the portion to the time obtained by the voice recognition process.

これにより、映像音声の番組内容に生字幕の時刻を合わせることができ、番組内容と生字幕とを同期させることができる。 As a result, the time of the live subtitles can be adjusted to the video and audio program contents, and the program contents and the live subtitles can be synchronized.

特開２０１７－５４４２号公報Japanese Unexamined Patent Publication No. 2017-5442

A.Zambelli，“IIS Smooth Streaming Technical Overview”，Mar.2009A.Zambelli, “IIS Smooth Streaming Technical Overview”, Mar.2009

前述の特許文献１の技術は、番組内容と生字幕とを同期させるものであるが、主にオフラインでの処理を想定しており、必ずしもオンラインでのライブストリーミングの処理に適用できるとは限らない。 The technique of Patent Document 1 described above synchronizes the program content with the live subtitles, but it is mainly intended for offline processing and is not always applicable to online live streaming processing. ..

なぜならば、ライブストリーミングでは、エンコードを行いながらストリーミングを行うことが必要となるが、前述の特許文献１には、ライブストリーミングにおいて重要な要素である処理遅延について記載されていないからである。 This is because, in live streaming, it is necessary to perform streaming while encoding, but the above-mentioned Patent Document 1 does not describe the processing delay, which is an important factor in live streaming.

また、前述の特許文献１の技術では、生字幕に対応する映像音声の時点から生字幕の遅延の補正が完了する時点までの生字幕補正処理時間が長くなり、リアルタイム性が低下してしまう。 Further, in the above-mentioned technique of Patent Document 1, the raw subtitle correction processing time from the time of the video / audio corresponding to the raw subtitle to the time of the completion of the correction of the delay of the raw subtitle becomes long, and the real-time property is deteriorated.

このように、前述の特許文献１の技術は、リアルタイム処理を想定したものではないため、リアルタイム性が必要となるライブストリーミングに適用することができない。 As described above, the technique of Patent Document 1 described above is not intended for real-time processing, and therefore cannot be applied to live streaming that requires real-time performance.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、インターネット配信によるライブストリーミングにおいて、エンコード処理が完了する時刻（エンコード処理完了時刻）を超える遅延を発生させずにリアルタイム性を確保しつつ、番組内容に対する生字幕の遅延を抑制するコンテンツ配信装置及びプログラムを提供することにある。 Therefore, the present invention has been made to solve the above problems, and an object thereof is real-time without causing a delay exceeding the time when the encoding process is completed (encoding process completion time) in live streaming by Internet distribution. It is an object of the present invention to provide a content distribution device and a program that suppresses delays in live subtitles with respect to program contents while ensuring the characteristics.

前記課題を解決するために、請求項１のコンテンツ配信装置は、放送用送出信号を入力し、放送番組のコンテンツをライブストリーミングにてインターネット配信するための配信データを生成すると共に、前記放送用送出信号に含まれる生字幕データを補正するコンテンツ配信装置において、前記放送用送出信号をエンコードし、所定時間単位の前記配信データを生成するエンコーダと、前記放送用送出信号に含まれる音声に対して音声認識処理を施し、前記音声が出力される時刻に関する音声時刻情報を含む音声認識データを生成し、当該音声認識データに対応する前記生字幕データの遅延時間をカウントして字幕遅延経過時刻を求め、前記放送用送出信号に含まれる前記生字幕データを抽出し、前記音声認識データと前記生字幕データとの間のマッチングを行い、前記字幕遅延経過時刻と、前記エンコーダにより前記音声認識データの前記音声及び映像を含む前記放送用送出信号のエンコードが完了するエンコード処理完了時刻とを比較し、前記字幕遅延経過時刻が前記エンコード処理完了時刻以前であり、かつ、前記音声認識データに対応する前記生字幕データのマッチングが完了した場合、当該マッチングが完了したタイミングで、前記生字幕データに含まれる、当該生字幕データが画面表示される時刻に関する字幕時刻情報を、前記音声認識データに含まれる前記音声時刻情報、予め設定された固定値、または前記音声認識データに対応する前記生字幕データの遅延時間の統計値に基づいて補正し、補正後の前記生字幕データを新たな生字幕データとして出力し、前記字幕遅延経過時刻が前記エンコード処理完了時刻以前でない場合、前記エンコード処理完了時刻のタイミングで、前記音声認識データに基づいて新たな生字幕データを生成し、当該新たな生字幕データを出力する字幕処理部と、を備えたことを特徴とする。 In order to solve the above problem, the content distribution device according to claim 1 inputs a transmission signal for broadcasting, generates distribution data for distributing the content of the broadcast program to the Internet by live streaming, and transmits the broadcast program. In a content distribution device that corrects raw subtitle data included in a signal, an encoder that encodes the broadcast transmission signal and generates the distribution data in predetermined time units, and audio for audio included in the broadcast transmission signal. Recognition processing is performed to generate voice recognition data including voice time information regarding the time when the voice is output, and the delay time of the raw subtitle data corresponding to the voice recognition data is counted to obtain the subtitle delay elapsed time. The raw subtitle data included in the broadcast transmission signal is extracted, matching is performed between the voice recognition data and the live subtitle data, and the subtitle delay elapsed time and the voice of the voice recognition data by the encoder are used. Compared with the encoding processing completion time at which the encoding of the broadcast transmission signal including the video is completed, the subtitle delay elapsed time is before the encoding processing completion time, and the live subtitle corresponding to the voice recognition data. When the data matching is completed, at the timing when the matching is completed, the subtitle time information regarding the time when the raw subtitle data is displayed on the screen, which is included in the raw subtitle data, is the audio time included in the voice recognition data. Corrected based on information, a preset fixed value, or a statistical value of the delay time of the raw subtitle data corresponding to the voice recognition data, and the corrected raw subtitle data is output as new raw subtitle data. When the subtitle delay elapsed time is not before the encoding processing completion time, new raw subtitle data is generated based on the voice recognition data at the timing of the encoding processing completion time, and the new raw subtitle data is output. It is characterized by having a processing unit.

また、請求項２のコンテンツ配信装置は、請求項１に記載のコンテンツ配信装置において、前記字幕処理部が、前記放送用送出信号から前記生字幕データを抽出する字幕抽出部と、前記放送用送出信号に含まれる前記音声に対して前記音声認識処理を施し、前記音声認識データを生成する音声認識部と、前記音声認識部により生成された前記音声認識データに対応する、前記字幕抽出部により抽出される前記生字幕データの前記遅延時間をカウントし、前記字幕遅延経過時刻を求め、前記音声認識データと前記生字幕データとのマッチングを行い、当該マッチングが完了した場合、前記音声認識データに含まれる前記音声時刻情報と、前記生字幕データに含まれる前記字幕時刻情報との間の差分を算出し、当該差分に基づいて字幕遅延確定時刻を求めるマッチング部と、前記生字幕データが抽出されて前記新たな生字幕データが出力されるまでの間の所定のマッチング補正処理時間を前記字幕遅延確定時刻に加算した時刻をマッチング補正完了予定時刻として、前記字幕遅延経過時刻が前記エンコード処理完了時刻以前であり、かつ、前記マッチング補正完了予定時刻が前記エンコード処理完了時刻以前である場合、前記マッチング部により前記マッチングが完了したタイミングで、前記音声認識データに含まれる前記音声時刻情報に基づいて、前記生字幕データに含まれる前記字幕時刻情報を補正し、補正後の前記生字幕データを前記新たな生字幕データとして出力し、前記字幕遅延経過時刻が前記エンコード処理完了時刻以前であり、かつ、前記マッチング補正完了予定時刻が前記エンコード処理完了時刻以前でない場合、前記マッチング部により前記マッチングが完了したタイミングで、前記固定値または前記統計値に基づいて、前記生字幕データに含まれる前記字幕時刻情報を補正し、補正後の前記生字幕データを前記新たな生字幕データとして出力し、前記字幕遅延経過時刻が前記エンコード処理完了時刻以前でない場合、前記エンコード処理完了時刻のタイミングで、前記音声認識データに基づいて前記新たな生字幕データを生成し、当該新たな生字幕データを出力する字幕補正部と、を備えたことを特徴とする。 Further, in the content distribution device according to claim 2, in the content distribution device according to claim 1, the subtitle processing unit has a subtitle extraction unit that extracts the raw subtitle data from the broadcast transmission signal, and the broadcast transmission. The voice recognition process is performed on the voice included in the signal, and the voice recognition unit generates the voice recognition data, and the subtitle extraction unit corresponding to the voice recognition data generated by the voice recognition unit extracts the voice. The delay time of the raw subtitle data is counted, the subtitle delay elapsed time is obtained, the voice recognition data and the raw subtitle data are matched, and when the matching is completed, it is included in the voice recognition data. The matching unit that calculates the difference between the audio time information and the subtitle time information included in the raw subtitle data and obtains the subtitle delay confirmation time based on the difference, and the raw subtitle data are extracted. The time when the predetermined matching correction processing time until the new raw subtitle data is output is added to the subtitle delay confirmation time is set as the matching correction completion scheduled time, and the subtitle delay elapsed time is before the encoding processing completion time. If the scheduled matching correction completion time is before the encoding processing completion time, the matching unit completes the matching, and the matching correction completion time is based on the voice time information included in the voice recognition data. The subtitle time information included in the raw subtitle data is corrected, the corrected raw subtitle data is output as the new raw subtitle data, the subtitle delay elapsed time is before the encoding processing completion time, and the said. If the scheduled matching correction completion time is not before the encoding processing completion time, the subtitle time information included in the raw subtitle data is input based on the fixed value or the statistical value at the timing when the matching is completed by the matching unit. The corrected raw subtitle data is output as the new raw subtitle data, and if the subtitle delay elapsed time is not before the encoding processing completion time, the voice recognition data is displayed at the timing of the encoding processing completion time. Based on the above, the new raw subtitle data is generated, and the caption correction unit for outputting the new raw subtitle data is provided.

また、請求項３のコンテンツ配信装置は、請求項２に記載のコンテンツ配信装置において、前記字幕補正部が、前記マッチング部により求めた前記字幕遅延経過時刻が前記エンコード処理完了時刻以前であり、かつ、前記マッチング補正完了予定時刻が前記エンコード処理完了時刻以前である場合、第１補正タイプを判断し、前記字幕遅延経過時刻が前記エンコード処理完了時刻以前であり、かつ、前記マッチング補正完了予定時刻が前記エンコード処理完了時刻以前でない場合、第２補正タイプを判断し、前記字幕遅延経過時刻が前記エンコード処理完了時刻以前でない場合、第３補正タイプを判断する補正タイプ判定部と、前記補正タイプ判定部により前記第１補正タイプが判断された場合、前記生字幕データに含まれる前記字幕時刻情報を前記音声認識データに含まれる前記音声時刻情報に補正し、補正後の前記生字幕データを前記新たな生字幕データとして出力し、前記第２補正タイプが判断された場合、前記生字幕データに含まれる前記字幕時刻情報の時刻から前記固定値または前記統計値を減算して減算結果を求め、前記生字幕データに含まれる前記字幕時刻情報を前記減算結果に補正し、補正後の前記生字幕データを前記新たな生字幕データとして出力し、前記第３補正タイプが判断された場合、前記音声認識データに基づいて新たな生字幕データを生成し、当該新たな生字幕データを出力する字幕時刻補正部と、を備えたことを特徴とする。 Further, in the content distribution device according to claim 3, in the content distribution device according to claim 2, the subtitle delay elapsed time obtained by the subtitle correction unit by the matching unit is before the encoding processing completion time, and If the scheduled matching correction completion time is before the encoding processing completion time, the first correction type is determined, the subtitle delay elapsed time is before the encoding processing completion time, and the matching correction completion scheduled time is A correction type determination unit that determines the second correction type if it is not before the encoding processing completion time, and determines a third correction type if the subtitle delay elapsed time is not before the encoding processing completion time, and the correction type determination unit. When the first correction type is determined, the subtitle time information included in the raw subtitle data is corrected to the voice time information included in the voice recognition data, and the corrected raw subtitle data is used as the new raw caption data. When the second correction type is determined by outputting as raw subtitle data, the fixed value or the statistical value is subtracted from the time of the subtitle time information included in the raw subtitle data to obtain the subtraction result, and the raw subtitle data is obtained. When the subtitle time information included in the subtitle data is corrected to the subtraction result, the corrected raw subtitle data is output as the new live subtitle data, and the third correction type is determined, the voice recognition data. It is characterized in that it is provided with a subtitle time correction unit that generates new raw subtitle data based on the above and outputs the new raw subtitle data.

また、請求項４のコンテンツ配信装置は、請求項３に記載のコンテンツ配信装置において、前記字幕時刻補正部が、前記第２補正タイプが判断された場合、前記放送番組の種類に対応したそれぞれの前記固定値または前記統計値が格納されたテーブルを用いて、前記放送用送出信号の前記放送番組に対応した前記固定値または前記統計値を前記テーブルから読み出し、前記生字幕データに含まれる前記字幕時刻情報の時刻から前記固定値または前記統計値を減算して減算結果を求め、前記生字幕データに含まれる前記字幕時刻情報を前記減算結果に補正し、補正後の前記生字幕データを前記新たな生字幕データとして出力する、ことを特徴とする。 Further, in the content distribution device according to claim 4, in the content distribution device according to claim 3, when the subtitle time correction unit determines the second correction type, each of the content distribution devices corresponds to the type of the broadcast program. Using the table in which the fixed value or the statistical value is stored, the fixed value or the statistical value corresponding to the broadcast program of the broadcast transmission signal is read from the table, and the subtitle included in the live subtitle data. The fixed value or the statistical value is subtracted from the time of the time information to obtain the subtraction result, the subtitle time information included in the raw subtitle data is corrected to the subtraction result, and the corrected raw subtitle data is newly used. The feature is that it is output as raw subtitle data.

さらに、請求項５のプログラムは、コンピュータを、請求項１から４までのいずれか一項に記載のコンテンツ配信装置として機能させることを特徴とする。 Further, the program of claim 5 is characterized in that the computer functions as the content distribution device according to any one of claims 1 to 4.

以上のように、本発明によれば、インターネット配信によるライブストリーミングにおいて、エンコード処理完了時刻を超える遅延を発生させずにリアルタイム性を確保しつつ、番組内容に対する生字幕の遅延を抑制することができる。 As described above, according to the present invention, in live streaming by Internet distribution, it is possible to suppress the delay of live subtitles with respect to the program content while ensuring real-time performance without causing a delay exceeding the encoding processing completion time. ..

本発明の実施形態によるコンテンツ配信装置を含むコンテンツ配信システムの全体構成例を示す概略図、及びコンテンツ配信装置の構成例を示すブロック図である。It is a schematic diagram which shows the whole configuration example of the content distribution system including the content distribution apparatus by embodiment of this invention, and is the block diagram which shows the configuration example of the content distribution apparatus. 字幕処理部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the subtitle processing part. 字幕補正部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the subtitle correction part. 補正タイプ判定部の処理例を示すフローチャートである。It is a flowchart which shows the processing example of the correction type determination part. （１）字幕遅延経過時刻ｔ≦エンコード処理完了時刻Ｅｔ、かつ、字幕遅延確定時刻Ｊｔ＋マッチング補正処理時間Ｒ≦エンコード処理完了時刻Ｅｔの場合（完全同期が可能な場合）を説明する図である。(1) It is a figure explaining the case of subtitle delay elapsed time t ≦ encoding processing completion time Et, and subtitle delay confirmation time Jt + matching correction processing time R ≦ encoding processing completion time Et (when complete synchronization is possible). （２）字幕遅延経過時刻ｔ≦エンコード処理完了時刻Ｅｔ、かつ、字幕遅延確定時刻Ｊｔ＋マッチング補正処理時間Ｒ＞エンコード処理完了時刻Ｅｔの場合（完全同期が困難な場合）を説明する図である。(2) It is a figure explaining the case of subtitle delay elapsed time t ≦ encoding processing completion time Et, and subtitle delay confirmation time Jt + matching correction processing time R> encoding processing completion time Et (when complete synchronization is difficult). （３）字幕遅延経過時刻ｔ＞エンコード処理完了時刻Ｅｔの場合（完全同期が困難な場合）を説明する図である。(3) It is a figure explaining the case of subtitle delay elapsed time t> encoding process completion time Et (when perfect synchronization is difficult). 字幕時刻補正部の処理例を示すフローチャートである。It is a flowchart which shows the processing example of the subtitle time correction part.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。本発明は、インターネット配信によるライブストリーミングにおいて、番組内容に対する生字幕の遅延度合いに応じて、生字幕の時刻補正処理を変更することを特徴とし、生字幕の時刻補正処理を、エンコード処理完了時刻までに行うようにする。これにより、リアルタイム性を確保しつつ、番組内容に対する生字幕の遅延を抑制することができる。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. The present invention is characterized in that the time correction process of live subtitles is changed according to the degree of delay of live subtitles with respect to the program content in live streaming by Internet distribution, and the time correction process of live subtitles is performed until the encoding process completion time. To do it. As a result, it is possible to suppress the delay of live subtitles with respect to the program content while ensuring real-time performance.

〔コンテンツ配信装置〕
まず、本発明の実施形態によるコンテンツ配信装置について説明する。図１は、本発明の実施形態によるコンテンツ配信装置を含むコンテンツ配信システムの全体構成例を示す概略図、及びコンテンツ配信装置の構成例を示すブロック図である。このコンテンツ配信システムは、インターネットを介して、コンテンツのライブストリーミングを行うシステムであり、コンテンツ配信装置１、配信サーバ２及び端末装置３を備えて構成される。 [Content distribution device]
First, the content distribution device according to the embodiment of the present invention will be described. FIG. 1 is a schematic diagram showing an overall configuration example of a content distribution system including a content distribution device according to an embodiment of the present invention, and a block diagram showing a configuration example of the content distribution device. This content distribution system is a system for live streaming content via the Internet, and includes a content distribution device 1, a distribution server 2, and a terminal device 3.

コンテンツ配信装置１は、コンテンツの放送用送出信号を受信し、放送用送出信号をエンコードして複数のファイルに分割し、複数のファイルの配信データＤをそれぞれ生成すると共に、プレイリストを生成する。放送用送出信号としては、例えばＳＤＩ（Serial Digital Interface：シリアルデジタルインターフェース）信号が用いられる。 The content distribution device 1 receives the broadcast transmission signal of the content, encodes the broadcast transmission signal and divides it into a plurality of files, generates distribution data D of the plurality of files, and generates a playlist. As the broadcast signal, for example, an SDI (Serial Digital Interface) signal is used.

コンテンツ配信装置１は、放送用送出信号に含まれる生字幕データの時刻を補正し、生字幕データを映像及び音声であるコンテンツの番組内容に同期させ、プレイリストを編集する。そして、コンテンツ配信装置１は、配信データＤ、同期後（補正後）の生字幕データａ’及びプレイリストを配信サーバ２へ送信する。 The content distribution device 1 corrects the time of the live subtitle data included in the broadcast transmission signal, synchronizes the live subtitle data with the program content of the content which is video and audio, and edits the playlist. Then, the content distribution device 1 transmits the distribution data D, the synchronized (corrected) raw subtitle data a', and the playlist to the distribution server 2.

コンテンツ配信装置１が入力する放送用送出信号は、映像、音声及び生字幕データ等から構成される。映像、音声及び生字幕データのそれぞれには、共通の時刻を基準とした時刻情報が含まれる。前述のとおり、生字幕データは、生放送番組の音声から人手による書き起こしにて制作されたデータであるから、映像及び音声の番組内容よりも遅延している。番組内容に対する生字幕データの遅延時間は、制作するオペレータ及び制作される生字幕データ自体に応じて変動する。また、プレイリストは、エンコードの処理により生成された配信データＤ（映像、音声及び生字幕データ）の取得先、構成内容、時刻情報等が記載されたメタデータである。 The broadcast transmission signal input by the content distribution device 1 is composed of video, audio, live subtitle data, and the like. Each of the video, audio, and live subtitle data includes time information based on a common time. As described above, since the live subtitle data is data produced by manually transcribing the audio of the live broadcast program, it is delayed from the video and audio program contents. The delay time of the raw subtitle data with respect to the program content varies depending on the operator to be produced and the raw subtitle data to be produced. Further, the playlist is metadata in which the acquisition destination, configuration content, time information, etc. of the distribution data D (video, audio, and live subtitle data) generated by the encoding process are described.

配信サーバ２は、コンテンツ配信装置１からコンテンツの配信データＤ、同期後の生字幕データａ’及びプレイリストを受信し、メモリに蓄積する。 The distribution server 2 receives the content distribution data D, the synchronized raw subtitle data a', and the playlist from the content distribution device 1 and stores them in the memory.

端末装置３は、例えばスマートフォン等の動画視聴プレーヤであり、コンテンツ配信装置１からインターネットを介して、プレイリストを取得し、プレイリストに基づいてファイル構造を把握する。そして、端末装置３は、プレイリストに基づいて、配信データＤ及び同期後の生字幕データａ’を、インターネットを介してＨＴＴＰ（Hypertext Transfer Protocol：ハイパーテキスト転送プロトコル）にて取得する。 The terminal device 3 is, for example, a video viewing player such as a smartphone, acquires a playlist from the content distribution device 1 via the Internet, and grasps the file structure based on the playlist. Then, the terminal device 3 acquires the distribution data D and the synchronized raw subtitle data a'based on the playlist by HTTP (Hypertext Transfer Protocol) via the Internet.

端末装置３は、プレイリストの時刻に従い、配信データＤ及び同期後の生字幕データａ’を繋ぎ合わせ、映像及び字幕を画面表示すると共に、音声を出力することで、コンテンツを再生する。 The terminal device 3 connects the distribution data D and the synchronized raw subtitle data a'according to the time of the playlist, displays the video and the subtitle on the screen, and outputs the audio to reproduce the content.

これにより、端末装置３は、映像及び音声に対する字幕の表示の遅延が小さいコンテンツを再生することができ、この字幕の表示遅延が小さいほど、ユーザは番組内容への理解が容易になる。特に聴覚障碍者にとっては、生字幕が番組内容への理解の材料として大きな役割を果たすため、その効果が大きい。尚、端末装置３は、字幕を画面表示する際に、表示行数及び表示列数を自由に設定することができるから、番組内容に同期した字幕表示が可能となる。 As a result, the terminal device 3 can reproduce the content having a small delay in displaying the subtitles for the video and audio, and the smaller the display delay of the subtitles, the easier it is for the user to understand the program contents. Especially for hearing-impaired people, live subtitles play a major role as a material for understanding the contents of the program, so the effect is great. Since the terminal device 3 can freely set the number of display lines and the number of display columns when displaying the subtitles on the screen, the subtitles can be displayed in synchronization with the program content.

図１において、コンテンツ配信装置１は、分配部１０、エンコーダ１１及び字幕処理部１２を備えている。分配部１０は、コンテンツの放送用送出信号を受信し、放送用送出信号を分配し、放送用送出信号をエンコーダ１１及び字幕処理部１２に出力する。 In FIG. 1, the content distribution device 1 includes a distribution unit 10, an encoder 11, and a subtitle processing unit 12. The distribution unit 10 receives the broadcast transmission signal of the content, distributes the broadcast transmission signal, and outputs the broadcast transmission signal to the encoder 11 and the subtitle processing unit 12.

エンコーダ１１は、分配部１０から放送用送出信号を入力し、放送用送出信号をエンコードすることで数秒単位のファイルに分割し、配信データＤを生成すると共にプレイリストを生成する。そして、エンコーダ１１は、エンコードに伴ってエンコード処理時間Ｅを求めると共に、プレイリストから番組情報を抽出する。エンコード処理時間Ｅは、エンコーダ１１が放送用送出信号を入力してから、当該放送用送出信号に対応する配信データＤを出力するまでの処理時間である。 The encoder 11 inputs a broadcast transmission signal from the distribution unit 10, encodes the broadcast transmission signal, divides it into a file in units of several seconds, generates distribution data D, and generates a playlist. Then, the encoder 11 obtains the encoding processing time E along with the encoding, and extracts the program information from the playlist. The encoding processing time E is the processing time from the input of the broadcast transmission signal by the encoder 11 to the output of the distribution data D corresponding to the broadcast transmission signal.

エンコーダ１１は、配信データＤを配信サーバ２へ送信すると共に、エンコード処理時間Ｅを含むエンコードステータス、番組情報及びプレイリストを字幕処理部１２に出力する。エンコードステータス、番組情報及びプレイリストの出力頻度は任意であり、数秒単位、番組単位または１日単位であってもよい。 The encoder 11 transmits the distribution data D to the distribution server 2, and outputs the encoding status including the encoding processing time E, the program information, and the playlist to the subtitle processing unit 12. The encoding status, program information, and playlist output frequency are arbitrary and may be in seconds, programs, or days.

尚、エンコーダ１１がエンコードステータスを字幕処理部１２に出力することにより、字幕処理部１２が、配信データＤと生字幕データａとの同期処理を行う。これに対し、エンコーダ１１がエンコードステータスを配信サーバ２へ送信し、配信サーバ２がこのような同期処理を行うようにしてもよい。 The encoder 11 outputs the encoding status to the subtitle processing unit 12, and the subtitle processing unit 12 performs synchronization processing between the distribution data D and the raw subtitle data a. On the other hand, the encoder 11 may transmit the encoding status to the distribution server 2, and the distribution server 2 may perform such synchronization processing.

字幕処理部１２は、分配部１０から放送用送出信号を入力すると共に、エンコーダ１１からエンコードステータス、番組情報及びプレイリストを入力する。 The subtitle processing unit 12 inputs a broadcast transmission signal from the distribution unit 10, and inputs an encoding status, program information, and a playlist from the encoder 11.

字幕処理部１２は、放送用送出信号から生字幕データａを抽出すると共に、放送用送出信号に対して音声認識処理を施し、音声認識データｂを生成する。字幕処理部１２は、音声認識データｂの音声を含む放送用送出信号を入力したタイミングの時刻を基準として時間をカウントし、字幕遅延経過時刻ｔを求める。ここで、時刻は、時の経過の中における瞬間の時点を意味し、時間は、時の経過の長さを意味する。 The subtitle processing unit 12 extracts the raw subtitle data a from the broadcast transmission signal, performs voice recognition processing on the broadcast transmission signal, and generates voice recognition data b. The subtitle processing unit 12 counts the time based on the time at which the broadcast transmission signal including the voice of the voice recognition data b is input, and obtains the subtitle delay elapsed time t. Here, the time means the time point of the moment in the passage of time, and the time means the length of the passage of time.

字幕遅延経過時刻ｔは、放送用送出信号に含まれる音声の時刻を基準とした時間経過の時刻を示す。音声認識データｂの音声を含む放送用送出信号を入力したタイミングの基準の時刻から字幕遅延経過時刻ｔまでの間の時間期間を、字幕遅延時間Ｊとする。字幕遅延経過時刻ｔは、音声認識データｂに対応する生字幕データａ（音声認識データｂと同じ内容の生字幕データａ）が抽出されるまでカウントされる。 The subtitle delay elapsed time t indicates the time elapsed with respect to the time of the audio included in the broadcast transmission signal. The time period from the reference time of the timing at which the broadcast transmission signal including the voice of the voice recognition data b is input to the subtitle delay elapsed time t is defined as the subtitle delay time J. The subtitle delay elapsed time t is counted until the raw subtitle data a corresponding to the voice recognition data b (raw subtitle data a having the same content as the voice recognition data b) is extracted.

字幕処理部１２は、エンコードステータスに含まれるエンコード処理時間Ｅを、エンコードステータスを入力する毎に積算して平均を求め、音声認識データｂの音声を含む放送用送出信号を入力したタイミングの時刻にエンコード処理時間Ｅの平均を加算する。そして、字幕処理部１２は、加算後の時刻をエンコード処理完了時刻Ｅｔに設定する。 The subtitle processing unit 12 integrates the encoding processing time E included in the encoding status each time the encoding status is input to obtain an average, and at the time when the broadcast transmission signal including the voice of the voice recognition data b is input. Add the average of the encoding processing time E. Then, the subtitle processing unit 12 sets the time after addition to the encoding processing completion time Et.

字幕処理部１２は、生字幕データａと音声認識データｂとの間でマッチングを行う。そして、字幕処理部１２は、音声認識データｂに対応する生字幕データａを抽出すると（マッチングが完了すると）、生字幕データａに含まれる時刻情報（字幕時刻情報）と音声認識データｂに含まれる時刻情報（音声時刻情報）との間の差分を算出する。字幕処理部１２は、音声認識データｂの音声を含む放送用送出信号を入力したタイミングの時刻に、その差分を加算し、加算後の時刻を字幕遅延確定時刻Ｊｔに設定する。これらの時刻情報の差分に相当する時間は、生放送番組の音声から人手による書き起こしにて生字幕データａが制作された際の遅延時間に相当する。 The subtitle processing unit 12 performs matching between the raw subtitle data a and the voice recognition data b. Then, when the subtitle processing unit 12 extracts the raw subtitle data a corresponding to the voice recognition data b (when matching is completed), the subtitle processing unit 12 includes the time information (subtitle time information) included in the raw subtitle data a and the voice recognition data b. Calculate the difference from the time information (voice time information). The subtitle processing unit 12 adds the difference to the time at which the broadcast transmission signal including the voice of the voice recognition data b is input, and sets the time after the addition to the subtitle delay confirmation time Jt. The time corresponding to the difference in these time information corresponds to the delay time when the live subtitle data a is produced by manually transcribing from the audio of the live broadcast program.

ここで、字幕処理部１２は、生字幕データａを含む放送用送出信号を入力してから、当該生字幕データａの処理が完了するまでの間の処理時間を、生字幕データａ毎に積算して平均を求め、これをマッチング補正処理時間Ｒに設定して保持しているものとする。マッチング補正処理時間Ｒは、字幕処理部１２が生字幕データａを含む放送用送出信号を入力してから、当該生字幕データａの抽出、マッチング及び補正の各処理を行い、生字幕データａ’を出力するまでの処理時間である。 Here, the subtitle processing unit 12 integrates the processing time from the input of the broadcast transmission signal including the raw subtitle data a to the completion of the processing of the raw subtitle data a for each raw subtitle data a. The average is obtained, and it is assumed that this is set and held in the matching correction processing time R. In the matching correction processing time R, after the subtitle processing unit 12 inputs the broadcast transmission signal including the raw subtitle data a, the raw subtitle data a is extracted, matched, and corrected, and the raw subtitle data a'. Is the processing time until the output of.

字幕処理部１２は、字幕遅延経過時刻ｔ、字幕遅延確定時刻Ｊｔ、エンコード処理完了時刻Ｅｔ及びマッチング補正処理時間Ｒ等に基づいて、生字幕データａの補正タイプを判断して補正処理を行い、生字幕データａ’を生成すると共に、プレイリストを編集する。字幕処理部１２は、生字幕データａ’及びプレイリストを配信サーバ２へ送信する。 The subtitle processing unit 12 determines the correction type of the raw subtitle data a and performs correction processing based on the subtitle delay elapsed time t, the subtitle delay confirmation time Jt, the encoding processing completion time Et, the matching correction processing time R, and the like. Generate raw subtitle data a'and edit the playlist. The subtitle processing unit 12 transmits the raw subtitle data a'and the playlist to the distribution server 2.

図２は、図１に示した字幕処理部１２の構成例を示すブロック図である。この字幕処理部１２は、字幕抽出部２０、音声認識部２１、マッチング部２２及び字幕補正部２３を備えている。 FIG. 2 is a block diagram showing a configuration example of the subtitle processing unit 12 shown in FIG. The subtitle processing unit 12 includes a subtitle extraction unit 20, a voice recognition unit 21, a matching unit 22, and a subtitle correction unit 23.

字幕抽出部２０は、分配部１０から放送用送出信号を入力し、放送用送出信号から生字幕データａを抽出し、生字幕データａをマッチング部２２に出力する。生字幕データａには、生字幕が画面表示される時刻ｔ_aに関する時刻情報が含まれる。 The subtitle extraction unit 20 inputs a broadcast transmission signal from the distribution unit 10, extracts raw subtitle data a from the broadcast transmission signal, and outputs the raw subtitle data a to the matching unit 22. The raw subtitle data a includes time information regarding the time t _a when the raw subtitle is displayed on the screen.

音声認識部２１は、分配部１０から放送用送出信号を入力し、放送用送出信号に含まれる音声に対して既知の音声認識処理を施し、音声認識データｂを生成し、音声認識データｂをマッチング部２２に出力する。音声認識データｂには、音声が出力される時刻ｔ_bに関する時刻情報が含まれる。 The voice recognition unit 21 inputs a broadcast transmission signal from the distribution unit 10, performs known voice recognition processing on the voice included in the broadcast transmission signal, generates voice recognition data b, and generates voice recognition data b. Output to the matching unit 22. The voice recognition data b includes time information regarding the time t _b at which the voice is output.

マッチング部２２は、字幕抽出部２０から生字幕データａを入力すると共に、音声認識部２１から音声認識データｂを入力する。そして、マッチング部２２は、生字幕データａと音声認識データｂとをマッチングし、マッチングにより同一であると判定した生字幕データａ及び音声認識データｂを特定する。 The matching unit 22 inputs the raw subtitle data a from the subtitle extraction unit 20, and also inputs the voice recognition data b from the voice recognition unit 21. Then, the matching unit 22 matches the raw subtitle data a and the voice recognition data b, and identifies the raw subtitle data a and the voice recognition data b that are determined to be the same by the matching.

この場合、マッチング部２２は、まず、音声認識データｂを入力し、その後、当該音声認識データｂに対応する生字幕データａを入力する。マッチング部２２は、音声認識データｂとこれに対応する生字幕データａとをマッチングすることで、両者は同一である（対応している）と判定する。 In this case, the matching unit 22 first inputs the voice recognition data b, and then inputs the raw subtitle data a corresponding to the voice recognition data b. The matching unit 22 determines that the voice recognition data b and the corresponding raw subtitle data a are the same (corresponding to each other) by matching the voice recognition data b and the corresponding raw subtitle data a.

マッチング部２２は、音声認識部２１により音声認識データｂの音声を含む放送用送出信号が入力された時刻を基準として時間をカウントし、基準の時刻にカウント値を加算し、字幕遅延経過時刻ｔを求める。 The matching unit 22 counts the time based on the time when the broadcast transmission signal including the voice of the voice recognition data b is input by the voice recognition unit 21, adds the count value to the reference time, and the subtitle delay elapsed time t. Ask for.

マッチング部２２は、マッチングにより、音声認識データｂに対応する生字幕データａを判定すると、字幕遅延経過時刻ｔのカウントを停止する。そして、マッチング部２２は、生字幕データａに含まれる時刻ｔ_aと音声認識データｂに含まれる時刻ｔ_bとの間の差分を算出し、基準の時刻に差分を加算した時刻を、字幕遅延確定時刻Ｊｔに設定する。 When the matching unit 22 determines the raw subtitle data a corresponding to the voice recognition data b by matching, the matching unit 22 stops counting the subtitle delay elapsed time t. Then, the matching unit 22 calculates the difference between the time t a included in the raw subtitle data a and the time t _b included in the voice recognition data _b , and adds the difference to the reference time to set the subtitle delay. Set to the fixed time Jt.

マッチング部２２は、字幕抽出部２０の処理時間、当該マッチング部２２の処理時間及び字幕補正部２３の処理時間を取得し、これらの処理時間を加算した加算時間を求める。そして、マッチング部２２は、字幕補正部２３による処理が完了する毎に、加算時間を積算して平均を求め、これをマッチング補正処理時間Ｒに設定する。 The matching unit 22 acquires the processing time of the subtitle extraction unit 20, the processing time of the matching unit 22, and the processing time of the subtitle correction unit 23, and obtains an addition time obtained by adding these processing times. Then, each time the processing by the subtitle correction unit 23 is completed, the matching unit 22 integrates the addition time to obtain an average, and sets this as the matching correction processing time R.

マッチング部２２は、音声認識データｂを入力してから、当該音声認識データｂに対応する生字幕データａのマッチングを完了し、字幕遅延確定時刻Ｊｔを設定する直前までの間、音声認識データｂ、字幕遅延経過時刻ｔ及びマッチング補正処理時間Ｒを含むマッチングデータを字幕補正部２３に出力する。 The matching unit 22 completes matching of the raw subtitle data a corresponding to the voice recognition data b after inputting the voice recognition data b, and until immediately before setting the subtitle delay confirmation time Jt, the voice recognition data b. , The matching data including the subtitle delay elapsed time t and the matching correction processing time R is output to the subtitle correction unit 23.

また、マッチング部２２は、マッチングが完了して字幕遅延確定時刻Ｊｔを設定したときに、音声認識データｂに対応する生字幕データａ、音声認識データｂ、字幕遅延確定時刻Ｊｔ及びマッチング補正処理時間Ｒを含むマッチングデータを字幕補正部２３に出力する。 Further, when the matching is completed and the subtitle delay confirmation time Jt is set, the matching unit 22 sets the raw subtitle data a, the voice recognition data b, the subtitle delay confirmation time Jt, and the matching correction processing time corresponding to the voice recognition data b. The matching data including R is output to the subtitle correction unit 23.

マッチングデータは、文章単位であってもよく、文字単位、単語単位または複数の文章単位であってもよい。 The matching data may be in sentence units, character units, word units, or a plurality of sentence units.

字幕補正部２３は、マッチング部２２からマッチングデータを入力すると共に、エンコーダ１１からエンコードステータス、番組情報及びプレイリストを入力する。そして、字幕補正部２３は、エンコードステータスに含まれるエンコード処理時間Ｅを積算して平均を求め、音声認識データｂの音声を含む放送用送出信号を入力したタイミングの時刻にその平均を加算し、加算結果の時刻をエンコード処理完了時刻Ｅｔに設定する。 The subtitle correction unit 23 inputs matching data from the matching unit 22, and also inputs an encoding status, program information, and a playlist from the encoder 11. Then, the subtitle correction unit 23 integrates the encoding processing time E included in the encoding status to obtain an average, and adds the average to the time at which the broadcast transmission signal including the voice of the voice recognition data b is input. The time of the addition result is set to the encoding process completion time Et.

ここで、エンコーダ１１は、エンコード処理時間Ｅを含むエンコードステータスを字幕補正部２３に出力する。これに対し、後述するように、実際に算出されたエンコード処理時間Ｅの代わりに固定値のエンコード処理時間Ｅを用いる場合には、エンコーダ１１は、エンコード処理時間Ｅを求めなくてもよく、また、エンコード処理時間Ｅを含むエンコードステータスを出力しなくてもよい。この場合、配信データＤ及び生字幕データａの同期は、後段の配信サーバ２にて可能である。 Here, the encoder 11 outputs the encoding status including the encoding processing time E to the subtitle correction unit 23. On the other hand, as will be described later, when a fixed value encoding processing time E is used instead of the actually calculated encoding processing time E, the encoder 11 does not have to obtain the encoding processing time E, and the encoder 11 does not have to obtain the encoding processing time E. , It is not necessary to output the encoder status including the encoder processing time E. In this case, the distribution data D and the raw subtitle data a can be synchronized with the distribution server 2 in the subsequent stage.

字幕補正部２３は、マッチングデータに含まれる字幕遅延経過時刻ｔ、字幕遅延確定時刻Ｊｔ及びマッチング補正処理時間Ｒ、並びにエンコード処理完了時刻Ｅｔ等に基づいて、補正タイプを判断する。 The subtitle correction unit 23 determines the correction type based on the subtitle delay elapsed time t, the subtitle delay confirmation time Jt, the matching correction processing time R, the encoding processing completion time Et, and the like included in the matching data.

字幕補正部２３は、補正タイプに応じて生字幕データａの補正処理を行い、生字幕データａ’を生成すると共に、プレイリストを編集する。字幕補正部２３は、補正後の生字幕データａ’及びプレイリストを配信サーバ２へ送信する。 The subtitle correction unit 23 corrects the raw subtitle data a according to the correction type, generates the raw subtitle data a', and edits the playlist. The subtitle correction unit 23 transmits the corrected raw subtitle data a'and the playlist to the distribution server 2.

図３は、図２に示した字幕補正部２３の構成例を示すブロック図である。この字幕補正部２３は、入力部３０、通信部３１、補正タイプ判定部３２及び字幕時刻補正部３３を備えている。 FIG. 3 is a block diagram showing a configuration example of the subtitle correction unit 23 shown in FIG. The subtitle correction unit 23 includes an input unit 30, a communication unit 31, a correction type determination unit 32, and a subtitle time correction unit 33.

入力部３０は、マッチング部２２からマッチングデータを入力し、マッチングデータを補正タイプ判定部３２に出力する。 The input unit 30 inputs matching data from the matching unit 22, and outputs the matching data to the correction type determination unit 32.

通信部３１は、エンコーダ１１からエンコードステータス、番組情報及びプレイリストを入力（受信）し、エンコードステータスを補正タイプ判定部３２に出力すると共に、番組情報及びプレイリストを字幕時刻補正部３３に出力する。 The communication unit 31 inputs (receives) the encoding status, program information, and playlist from the encoder 11, outputs the encoding status to the correction type determination unit 32, and outputs the program information and playlist to the subtitle time correction unit 33. ..

尚、通信部３１は、エンコードステータス、番組情報及びプレイリストをエンコーダ１１から受信する代わりに、外部のシステム（エンコードステータス、番組情報及びプレイリスト等を管理しているシステム）から受信するようにしてもよい。また、エンコードステータス、番組情報及びプレイリストの受信頻度は任意であり、数秒単位、番組単位または１日単位であってもよい。 The communication unit 31 receives the encode status, the program information, the playlist, and the like from an external system (a system that manages the encode status, the program information, the playlist, and the like) instead of receiving the encode status, the program information, and the playlist from the encoder 11. May be good. Further, the encoding status, the program information, and the reception frequency of the playlist are arbitrary, and may be in units of several seconds, units of programs, or units of one day.

〔補正タイプ判定部３２〕
補正タイプ判定部３２は、入力部３０からマッチングデータを入力すると共に、通信部３１からエンコードステータスを入力する。そして、補正タイプ判定部３２は、マッチングデータ及びエンコードステータスに基づいて、生字幕データａの遅延度合いを求める。補正タイプ判定部３２は、生字幕データａの遅延度合いに応じて補正タイプを判断し、補正タイプ及びマッチングデータを字幕時刻補正部３３に出力する。 [Correction type determination unit 32]
The correction type determination unit 32 inputs matching data from the input unit 30, and also inputs the encoding status from the communication unit 31. Then, the correction type determination unit 32 obtains the degree of delay of the raw subtitle data a based on the matching data and the encoding status. The correction type determination unit 32 determines the correction type according to the degree of delay of the raw subtitle data a, and outputs the correction type and matching data to the subtitle time correction unit 33.

図４は、補正タイプ判定部３２の処理例を示すフローチャートである。補正タイプ判定部３２は、入力部３０からマッチングデータを入力すると共に（ステップＳ４０１）、通信部３１からエンコードステータスを入力する（ステップＳ４０２）。 FIG. 4 is a flowchart showing a processing example of the correction type determination unit 32. The correction type determination unit 32 inputs matching data from the input unit 30 (step S401) and inputs the encoding status from the communication unit 31 (step S402).

補正タイプ判定部３２は、前述のとおり、エンコードステータスに含まれるエンコード処理時間Ｅを用いてエンコード処理完了時刻Ｅｔを求める（ステップＳ４０３）。ステップＳ４０１～Ｓ４０３の処理は、後述するステップＳ４０５のとおり、図４の処理例において常に更新されるものとする。 As described above, the correction type determination unit 32 obtains the encoding processing completion time Et using the encoding processing time E included in the encoding status (step S403). The processing of steps S401 to S403 shall be constantly updated in the processing example of FIG. 4, as in step S405 described later.

ここで、マッチング部２２において音声認識データｂに対応する生字幕データａのマッチングが完了していない場合、マッチングデータには、音声認識データｂ、字幕遅延経過時刻ｔ及びマッチング補正処理時間Ｒが含まれる。また、マッチング部２２においてマッチングが完了している場合、マッチングデータには、生字幕データａ、音声認識データｂ、字幕遅延確定時刻Ｊｔ及びマッチング補正処理時間Ｒが含まれる。 Here, when the matching of the raw subtitle data a corresponding to the voice recognition data b is not completed in the matching unit 22, the matching data includes the voice recognition data b, the subtitle delay elapsed time t, and the matching correction processing time R. Is done. When the matching is completed in the matching unit 22, the matching data includes the raw subtitle data a, the voice recognition data b, the subtitle delay confirmation time Jt, and the matching correction processing time R.

補正タイプ判定部３２は、字幕の文章番号をｉとして、字幕文章番号ｉを初期化する（ｉ＝０、ステップＳ４０４）。字幕文章番号ｉの字幕遅延経過時刻ｔ、字幕遅延確定時刻Ｊｔ、マッチング補正処理時間Ｒ及びエンコード処理完了時刻Ｅｔを、それぞれｔ_i，Ｊｔ_i，Ｒ_i，Ｅｔ_iとする。以下の処理により、字幕文章番号ｉの字幕について補正タイプが判断される。 The correction type determination unit 32 initializes the subtitle sentence number i with the subtitle sentence number as i (i = 0, step S404). The subtitle delay elapsed time t, the subtitle delay confirmation time Jt, the matching correction processing time R, and the encoding processing completion time Et of the subtitle sentence number _i are set to ti, Jt _i , _{Ri, and Et i} _, respectively. By the following processing, the correction type is determined for the subtitle of the subtitle sentence number i.

補正タイプ判定部３２は、ステップＳ４０１，Ｓ４０２の入力処理を更新すると共に、ステップＳ４０３の処理を更新する（ステップＳ４０５）。 The correction type determination unit 32 updates the input processing of steps S401 and S402, and also updates the processing of step S403 (step S405).

補正タイプ判定部３２は、字幕遅延経過時刻ｔ_iがエンコード処理完了時刻Ｅｔ_i以前（よりも早いまたは同じ）（ｔ_i≦Ｅｔ_i）であるか否かを判定する（ステップＳ４０６）。補正タイプ判定部３２は、ステップＳ４０６において、字幕遅延経過時刻ｔ_iがエンコード処理完了時刻Ｅｔ_i以前である（ｔ_i≦Ｅｔ_i）と判定した場合（ステップＳ４０６：Ｙ）、字幕遅延の時刻が確定しているか否か、すなわちマッチングが完了して字幕遅延確定時刻Ｊｔ_iを入力済みであるか否かを判定する（ステップＳ４０７）。 The correction type determination unit 32 determines whether or not the subtitle delay elapsed time ti is before (earlier or the same as _{) the encoding processing completion time Et i (ti ≤ Et i} ₎ ₍ _step S406). When the correction type determination unit 32 determines in step S406 that the subtitle delay elapsed time ti is before the encoding processing completion time Et _i (ti ≤ Et _i ₎ (step S406: Y ₎ , the subtitle delay time is set. It is determined whether or not the matching is completed, that is, whether or not the subtitle delay confirmation time Jt _i has been input after the matching is completed (step S407).

補正タイプ判定部３２は、ステップＳ４０７において、字幕遅延の時刻が確定していないと判定した場合（ステップＳ４０７：Ｎ）、ステップＳ４０５へ移行する。一方、補正タイプ判定部３２は、ステップＳ４０７において、字幕遅延の時刻が確定していると判定した場合（ステップＳ４０７：Ｙ）、ステップＳ４０８へ移行する。 When the correction type determination unit 32 determines in step S407 that the subtitle delay time has not been determined (step S407: N), the process proceeds to step S405. On the other hand, when the correction type determination unit 32 determines in step S407 that the time of the subtitle delay is fixed (step S407: Y), the process proceeds to step S408.

補正タイプ判定部３２は、ステップＳ４０７（Ｙ）から移行して、字幕遅延確定時刻Ｊｔ_iにマッチング補正処理時間Ｒ_iを加算する。補正タイプ判定部３２は、加算結果の時刻をマッチング補正完了予定時刻（Ｊｔ_i＋Ｒ_i）として、マッチング補正完了予定時刻（Ｊｔ_i＋Ｒ_i）がエンコード処理完了時刻Ｅｔ_i以前（よりも早いまたは同じ）（Ｊｔ_i＋Ｒ_i≦Ｅｔ_i）であるか否かを判定する（ステップＳ４０８）。 The correction type determination unit 32 shifts from step S407 (Y) and adds the matching correction processing time R _i to the subtitle delay confirmation time Jt _i . The correction type determination unit 32 sets the time of the addition result as the matching correction completion scheduled time (Jt _i + R _i ), and the matching correction completion scheduled time (Jt _i + R _i ) is before (earlier or the same as) the encoding processing completion time Et _i . ) (Jt _i + R _i ≤ Et _i ) (step S408).

マッチング補正完了予定時刻（Ｊｔ_i＋Ｒ_i）は、字幕処理部１２が音声認識データｂの音声を含む放送用送出信号を入力した時刻を基準として、音声認識データｂに対応する生字幕データａが抽出されてマッチングが完了し、そして、生字幕データａの補正が完了する予定の時刻に相当する。 The scheduled matching completion time (Jt _i + _Ri ) is the raw subtitle data a corresponding to the voice recognition data b, based on the time when the subtitle processing unit 12 inputs the broadcast transmission signal including the voice of the voice recognition data b. It corresponds to the time when the extraction is completed, the matching is completed, and the correction of the raw subtitle data a is scheduled to be completed.

尚、ステップＳ４０６における字幕遅延経過時刻ｔ_i及びエンコード処理完了時刻Ｅｔ_iを用いた比較処理、及び、ステップＳ４０８における字幕遅延確定時刻Ｊｔ_i、マッチング補正処理時間Ｒ_i及びエンコード処理完了時刻Ｅｔ_iを用いた比較処理の技術的意義については、後述する図５～図７にて説明する。 It should be noted that the comparison processing using the subtitle delay elapsed time t _i and the encoding processing completion time Et _i in step S406, and the subtitle delay confirmation time Jt _i , the matching correction processing time R _i , and the encoding processing completion time Et _i in step S408 are set. The technical significance of the comparative process used will be described with reference to FIGS. 5 to 7 described later.

補正タイプ判定部３２は、ステップＳ４０８において、マッチング補正完了予定時刻（Ｊｔ_i＋Ｒ_i）がエンコード処理完了時刻Ｅｔ_i以前である（Ｊｔ_i＋Ｒ_i≦Ｅｔ_i）と判定した場合（ステップＳ４０８：Ｙ）、生字幕データａの遅延時間が短く、エンコード処理完了時刻Ｅｔ_iが経過する前に生字幕データａの補正処理が完了すると判断し、補正タイプをＡと判断する（ステップＳ４０９）。 In step S408, the correction type determination unit 32 determines that the scheduled matching correction completion time (Jt _i + Ri) is before the encoding processing completion time Et _i (Jt _i + _Ri ≤ Et _i ₎ (step S408: Y). ), It is determined that the delay time of the raw subtitle data a is short and the correction process of the raw subtitle data a is completed before the encoding processing completion time Et _i elapses, and the correction type is determined to be A (step S409).

補正タイプＡは、番組内容に対する生字幕の時刻完全一致（完全同期）が可能なタイプであり、後段の字幕時刻補正部３３により、生字幕データａの時刻ｔ_aが音声認識データｂの時刻ｔ_bに補正される。 The correction type A is a type capable of completely matching the time of the live subtitles (complete synchronization) with respect to the program content, and the time t of the raw subtitle data a is the time t of the voice recognition data _b by the subtitle time correction unit 33 in the subsequent stage. Corrected to _b .

補正タイプ判定部３２は、ステップＳ４０８において、マッチング補正完了予定時刻（Ｊｔ_i＋Ｒ_i）がエンコード処理完了時刻Ｅｔ_i以前でない（Ｊｔ_i＋Ｒ_i＞Ｅｔ_i）と判定した場合（ステップＳ４０８：Ｎ）、生字幕データａの遅延時間がさほど短くないと判断し、補正タイプをＢと判断する（ステップＳ４１０）。 When the correction type determination unit 32 determines in step S408 that the scheduled matching correction completion time (Jt _i + Ri) is not before the encoding processing completion time Et _i (Jt _i + _Ri > Et _i ₎ (step S408: N). , It is determined that the delay time of the raw subtitle data a is not so short, and the correction type is determined to be B (step S410).

補正タイプＢは、番組内容に対する生字幕の時刻完全一致（完全同期）が困難なタイプであり、字幕時刻補正部３３により、生字幕データａの時刻ｔ_aが、当該時刻ｔ_aを基準として所定の固定値Ｐに基づいて補正される。 The correction type B is _a type in which it is difficult to completely match (completely synchronize) the time of the live subtitles with respect to the program content, and the subtitle time correction unit 33 determines the time ta of the live subtitle data _a with reference to the time ta. It is corrected based on the fixed value P of.

補正タイプ判定部３２は、ステップＳ４０６において、字幕遅延経過時刻ｔ_iがエンコード処理完了時刻Ｅｔ_i以前でない（ｔ_i＞Ｅｔ_i）と判定した場合（ステップＳ４０６：Ｎ）、字幕遅延経過時刻ｔ_iがエンコード処理完了時刻Ｅｔ_iを超えたとして、生字幕データａの遅延時間が長いと判断し、補正タイプをＣと判断する（ステップＳ４１１）。 When the correction type determination unit 32 determines in step S406 that the subtitle delay elapsed time t _i is not before the encoding processing completion time Et _i (ti> Et _i ₎ (step S406: N), the subtitle delay elapsed time t _i Is determined to exceed the encoding processing completion time Et _i , it is determined that the delay time of the raw subtitle data a is long, and the correction type is determined to be C (step S411).

補正タイプＣは、番組内容に対する生字幕の時刻完全一致（完全同期）が困難なタイプであり、字幕時刻補正部３３は、音声認識データｂの一部または全部を適用して生字幕データａ’を生成するか、または生字幕データａ’を生成する処理を行わない。 The correction type C is a type in which it is difficult to completely match the time of the live subtitles (complete synchronization) with respect to the program content, and the subtitle time correction unit 33 applies a part or all of the voice recognition data b to the live subtitle data a'. Or do not perform the process of generating the raw subtitle data a'.

このように、ステップＳ４０４～Ｓ４１１の処理により、字幕遅延経過時刻ｔ_i、字幕遅延確定時刻Ｊｔ_i、マッチング補正処理時間Ｒ_i及びエンコード処理完了時刻Ｅｔ_iに基づいて、生字幕の時刻補正処理の種類を示す補正タイプＡ，Ｂ，Ｃが判断される。 In this way, by the processing of steps S404 to S411, the time correction processing of the raw subtitle is _performed based on the subtitle delay elapsed time ti, the subtitle delay confirmation time Jt _i , the matching correction processing time R _i , and the encoding processing completion time Et _i . The correction types A, B, and C indicating the type are determined.

補正タイプ判定部３２は、ステップＳ４０９～Ｓ４１１から移行して、補正タイプ及びマッチングデータを字幕時刻補正部３３に出力する（ステップＳ４１２）。 The correction type determination unit 32 shifts from steps S409 to S411 and outputs the correction type and matching data to the subtitle time correction unit 33 (step S412).

補正タイプ判定部３２は、全ての字幕について処理が終了したか否かを判定し（ステップＳ４１３）、処理が終了していないと判定した場合（ステップＳ４１３：Ｎ）、字幕文章番号ｉをインクリメントし（ｉ＝ｉ＋１、ステップＳ４１４）、ステップＳ４０５へ移行する。これにより、次の字幕文章番号ｉの字幕について、ステップＳ４０５～Ｓ４１２の処理が行われる。 The correction type determination unit 32 determines whether or not the processing is completed for all the subtitles (step S413), and if it is determined that the processing is not completed (step S413: N), the subtitle sentence number i is incremented. (I = i + 1, step S414), the process proceeds to step S405. As a result, the processing of steps S405 to S412 is performed for the subtitle of the next subtitle sentence number i.

補正タイプ判定部３２は、ステップＳ４１３において全ての字幕について処理が終了したと判定した場合（ステップＳ４１３：Ｙ）、処理を終了する。 When the correction type determination unit 32 determines in step S413 that the processing has been completed for all the subtitles (step S413: Y), the correction type determination unit 32 ends the processing.

尚、補正タイプ判定部３２は、通信部３１から、実際に算出されたエンコード処理時間Ｅを含むエンコードステータスを入力するようにしたが、予め設定された固定値のエンコード処理時間Ｅを用いるようにしてもよい。また、補正タイプ判定部３２は、入力部３０から、実際に算出されたマッチング補正処理時間Ｒを含むマッチングデータを入力するようにしたが、予め設定された固定値のマッチング補正処理時間Ｒを用いるようにしてもよい。 The correction type determination unit 32 inputs the encoding status including the actually calculated encoding processing time E from the communication unit 31, but uses a preset fixed value encoding processing time E. You may. Further, the correction type determination unit 32 inputs the matching data including the actually calculated matching correction processing time R from the input unit 30, but uses a preset matching correction processing time R. You may do so.

さらに、補正タイプ判定部３２は、予め設定されたエンコード処理時間Ｅとして、時刻に応じた固定値または番組単位の固定値を用いるようにしてもよく、予め設定されたマッチング補正処理時間Ｒとして、時刻に応じた固定値または番組単位の固定値を用いるようにしてもよい。 Further, the correction type determination unit 32 may use a fixed value according to the time or a fixed value for each program as the preset encoding processing time E, and may use the preset matching correction processing time R as the matching correction processing time R. A fixed value according to the time or a fixed value for each program may be used.

（ステップＳ４０６，Ｓ４０８の比較処理）
次に、図４に示したステップＳ４０６，Ｓ４０８の比較処理について詳細に説明する。以下、字幕文章番号ｉの表記は省略する。 (Comparison processing of steps S406 and S408)
Next, the comparison process of steps S406 and S408 shown in FIG. 4 will be described in detail. Hereinafter, the notation of the subtitle sentence number i will be omitted.

図５は、（１）字幕遅延経過時刻ｔ≦エンコード処理完了時刻Ｅｔ、かつ、字幕遅延確定時刻Ｊｔ＋マッチング補正処理時間Ｒ≦エンコード処理完了時刻Ｅｔの場合（完全同期が可能な場合）を説明する図である。詳細には、図４のステップＳ４０６において「Ｙ」を判定し、ステップＳ４０８において「Ｙ」を判定し、補正タイプＡを判断する場合を説明する図である。横軸は時間軸である。 FIG. 5 describes the case of (1) subtitle delay elapsed time t ≤ encoding processing completion time Et, and subtitle delay confirmation time Jt + matching correction processing time R ≤ encoding processing completion time Et (when complete synchronization is possible). It is a figure. More specifically, it is a figure explaining the case where "Y" is determined in step S406 of FIG. 4, "Y" is determined in step S408, and the correction type A is determined. The horizontal axis is the time axis.

配信データＤは、映像及び音声の放送用送出信号のタイミング（音声認識データｂの音声を含む放送用送出信号を入力したタイミング）を基準として、エンコード処理時間Ｅを経過したエンコード処理完了時刻Ｅｔに生成される。 The distribution data D is set to the encoding processing completion time Et after the encoding processing time E has elapsed, based on the timing of the video and audio broadcasting transmission signal (the timing at which the broadcasting transmission signal including the audio of the audio recognition data b is input). Generated.

また、音声認識データｂに対応する生字幕データａは、映像及び音声の放送用送出信号のタイミングを基準として、字幕遅延時間Ｊだけ遅延しており、このタイミングでマッチングが完了したとする。このタイミングの時刻が字幕遅延確定時刻Ｊｔである。ｔ≦Ｅｔであり、かつＪｔ＋Ｒ≦Ｅｔの条件を満たすものとする。 Further, it is assumed that the raw subtitle data a corresponding to the voice recognition data b is delayed by the subtitle delay time J with reference to the timing of the video and audio broadcast transmission signals, and the matching is completed at this timing. The time at this timing is the subtitle delay confirmation time Jt. It is assumed that t ≦ Et and the condition of Jt + R ≦ Et is satisfied.

エンコード処理時間Ｅはほぼ固定値であり、字幕遅延時間Ｊは変動値であり、マッチング補正処理時間Ｒはほぼ固定値である。後述する図６，７についても同様である。 The encoding processing time E is a substantially fixed value, the subtitle delay time J is a variable value, and the matching correction processing time R is a substantially fixed value. The same applies to FIGS. 6 and 7 described later.

そうすると、マッチング補正後の生字幕データａ（ａ’）のタイミングは、エンコード処理完了時刻Ｅｔ以前のＪｔ＋Ｒのタイミングとなる。この条件（ｔ≦Ｅｔ，Ｊｔ＋Ｒ≦Ｅｔ）を満たす生字幕データａのタイミング範囲（字幕遅延確定時刻Ｊｔの範囲）は、図５の矢印破線に示すとおりである。 Then, the timing of the raw subtitle data a (a') after the matching correction becomes the timing of Jt + R before the encoding processing completion time Et. The timing range (range of the subtitle delay confirmation time Jt) of the raw subtitle data a satisfying this condition (t ≦ Et, Jt + R ≦ Et) is as shown by the broken line of the arrow in FIG.

この場合は、生字幕データａの遅延時間が短く、エンコード処理完了時刻Ｅｔが経過する前に生字幕データａの補正処理が完了すると判断される。したがって、番組内容に対する字幕の完全同期が可能であり、補正タイプＡが判断される。そして、生字幕データａのマッチングが完了したときに、生字幕データａの時刻ｔ_aが音声認識データｂの時刻ｔ_bに補正される（ｔ_a←ｔ_b）。これにより、エンコード処理完了時刻Ｅｔには、配信データＤ、及び当該配信データＤに対応する補正後の生字幕データａ’が揃うこととなる。 In this case, it is determined that the delay time of the raw subtitle data a is short and the correction process of the raw subtitle data a is completed before the encoding process completion time Et elapses. Therefore, the subtitles can be completely synchronized with the program content, and the correction type A is determined. Then, when the matching of the raw subtitle data a is completed, the time ta of the raw subtitle data a is corrected to the time t _b of the voice recognition data _b (t _a ← t _b ). As a result, the distribution data D and the corrected raw subtitle data a'corresponding to the distribution data D are aligned at the encoding processing completion time Et.

図６は、（２）字幕遅延経過時刻ｔ≦エンコード処理完了時刻Ｅｔ、かつ、字幕遅延確定時刻Ｊｔ＋マッチング補正処理時間Ｒ＞エンコード処理完了時刻Ｅｔの場合（完全同期が困難な場合）を説明する図である。詳細には、図４のステップＳ４０６において「Ｙ」を判定し、ステップＳ４０８において「Ｎ」を判定し、補正タイプＢを判断する場合を説明する図である。横軸は時間軸である。 FIG. 6 illustrates the case of (2) subtitle delay elapsed time t ≤ encoding processing completion time Et, and subtitle delay confirmation time Jt + matching correction processing time R> encoding processing completion time Et (when complete synchronization is difficult). It is a figure. More specifically, it is a figure explaining the case where "Y" is determined in step S406 of FIG. 4, "N" is determined in step S408, and the correction type B is determined. The horizontal axis is the time axis.

また、音声認識データｂに対応する生字幕データａは、映像及び音声の放送用送出信号のタイミングを基準として、字幕遅延時間Ｊだけ遅延しており、このタイミングでマッチングが完了したとする。このタイミングの時刻が字幕遅延確定時刻Ｊｔである。ｔ≦Ｅｔであり、かつＪｔ＋Ｒ＞Ｅｔの条件を満たすものとする。 Further, it is assumed that the raw subtitle data a corresponding to the voice recognition data b is delayed by the subtitle delay time J with reference to the timing of the video and audio broadcast transmission signals, and the matching is completed at this timing. The time at this timing is the subtitle delay confirmation time Jt. It is assumed that t ≦ Et and the condition of Jt + R> Et is satisfied.

そうすると、字幕処理部１２により生字幕データａの補正が行われるとすると、マッチング補正後の生字幕データａ（ａ’）のタイミングは、エンコード処理完了時刻Ｅｔを超えるＪｔ＋Ｒのタイミングとなってしまう。この条件（ｔ≦Ｅｔ，Ｊｔ＋Ｒ＞Ｅｔ）を満たす生字幕データａのタイミング範囲（字幕遅延確定時刻Ｊｔの範囲）は、図６の矢印破線に示すとおりである。 Then, assuming that the raw subtitle data a is corrected by the subtitle processing unit 12, the timing of the raw subtitle data a (a') after the matching correction becomes the timing of Jt + R that exceeds the encoding processing completion time Et. The timing range (range of the subtitle delay confirmation time Jt) of the raw subtitle data a satisfying this condition (t ≦ Et, Jt + R> Et) is as shown by the broken line of the arrow in FIG.

この場合は、生字幕データａの遅延時間がさほど短くないが、エンコード処理完了時刻Ｅｔを超える所定時刻において生字幕データａの補正処理が完了すると判断される。したがって、番組内容に対する字幕の完全同期が困難であり、補正タイプＢが判断される。そして、生字幕データａのマッチングが完了したときに、生字幕データａの時刻ｔ_aを基準として所定の固定値Ｐに基づいて補正される（ｔ_a←ｔ_a－Ｐ）。これにより、エンコード処理完了時刻Ｅｔには、配信データＤ、及び当該配信データＤに対応する補正後の生字幕データａ’が揃うこととなる。 In this case, although the delay time of the raw subtitle data a is not so short, it is determined that the correction process of the raw subtitle data a is completed at a predetermined time exceeding the encoding process completion time Et. Therefore, it is difficult to completely synchronize the subtitles with respect to the program content, and the correction type B is determined. Then, when the matching of the raw subtitle data a is completed, the correction is made based on _a predetermined fixed value P with the time ta of the raw subtitle data _a as _a reference (ta ← ta −P). As a result, the distribution data D and the corrected raw subtitle data a'corresponding to the distribution data D are aligned at the encoding processing completion time Et.

図７は、（３）字幕遅延経過時刻ｔ＞エンコード処理完了時刻Ｅｔの場合（完全同期が困難な場合）を説明する図であり、図４のステップＳ４０６において「Ｎ」を判定し、補正タイプＣを判断する場合の図である。横軸は時間軸である。 FIG. 7 is a diagram for explaining the case of (3) subtitle delay elapsed time t> encoding processing completion time Et (when perfect synchronization is difficult), and “N” is determined in step S406 of FIG. It is a figure in the case of determining C. The horizontal axis is the time axis.

また、音声認識データｂに対応する生字幕データａは、映像及び音声の放送用送出信号のタイミングを基準として、字幕遅延時間Ｊだけ遅延し、字幕遅延経過時刻ｔがエンコード処理完了時刻Ｅｔを超えるものとする。 Further, the raw subtitle data a corresponding to the voice recognition data b is delayed by the subtitle delay time J with reference to the timing of the video and audio broadcast transmission signal, and the subtitle delay elapsed time t exceeds the encoding processing completion time Et. It shall be.

そうすると、この条件（ｔ＞Ｅｔ）を満たす生字幕データａのタイミング範囲（字幕遅延確定時刻Ｊｔの範囲）は、図７の矢印破線に示すものとなる。 Then, the timing range (range of the subtitle delay confirmation time Jt) of the raw subtitle data a satisfying this condition (t> Et) is shown by the broken line of the arrow in FIG.

この場合は、生字幕データａの遅延時間が長いと判断され、番組内容に対する字幕の完全同期が困難であり、補正タイプＣが判断される。そして、エンコード処理完了時刻Ｅｔのタイミングで、音声認識データｂの一部または全部を適用して生字幕データａ’が生成されるか、または生字幕データａ’の生成処理は行われない。これにより、エンコード処理完了時刻Ｅｔには、配信データＤ、及び当該配信データＤに対応する生字幕データａ’が揃うこととなる。 In this case, it is determined that the delay time of the raw subtitle data a is long, it is difficult to completely synchronize the subtitles with respect to the program content, and the correction type C is determined. Then, at the timing of the encoding processing completion time Et, a part or all of the voice recognition data b is applied to generate the raw subtitle data a', or the raw subtitle data a'is not generated. As a result, the distribution data D and the raw subtitle data a'corresponding to the distribution data D are aligned at the encoding processing completion time Et.

〔字幕時刻補正部３３〕
図３に戻って、字幕時刻補正部３３は、補正タイプ判定部３２から補正タイプ及びマッチングデータを入力すると共に、通信部３１から番組情報及びプレイリストを入力する。 [Subtitle time correction unit 33]
Returning to FIG. 3, the subtitle time correction unit 33 inputs the correction type and matching data from the correction type determination unit 32, and inputs the program information and the playlist from the communication unit 31.

字幕時刻補正部３３は、補正タイプに応じて、マッチングデータに含まれる生字幕データａの時刻情報を補正する処理等を行い、生字幕データａ’を生成する。また、字幕時刻補正部３３は、プレイリストに含まれる生字幕データａの時刻情報等を補正し、当該生字幕データａを生字幕データａ’とすることで、プレイリストを編集する。 The subtitle time correction unit 33 performs processing for correcting the time information of the raw subtitle data a included in the matching data according to the correction type, and generates the raw subtitle data a'. Further, the subtitle time correction unit 33 corrects the time information and the like of the raw subtitle data a included in the playlist, and sets the raw subtitle data a as the raw subtitle data a'to edit the playlist.

字幕時刻補正部３３は、補正後の生字幕データａ’及びプレイリストを配信サーバ２へ送信する。ここで、生字幕データａの時刻情報の補正の際に、補正タイプによっては、番組情報に応じて予め設定された固定値Ｐが用いられる。 The subtitle time correction unit 33 transmits the corrected raw subtitle data a'and the playlist to the distribution server 2. Here, when correcting the time information of the raw subtitle data a, a fixed value P preset according to the program information is used depending on the correction type.

図８は、字幕時刻補正部３３の処理例を示すフローチャートである。字幕時刻補正部３３は、補正タイプ判定部３２から補正タイプ及びマッチングデータを入力する（ステップＳ８０１）。また、字幕時刻補正部３３は、通信部３１から番組情報及びプレイリストを入力する（ステップＳ８０２）。 FIG. 8 is a flowchart showing a processing example of the subtitle time correction unit 33. The subtitle time correction unit 33 inputs the correction type and matching data from the correction type determination unit 32 (step S801). Further, the subtitle time correction unit 33 inputs program information and a playlist from the communication unit 31 (step S802).

補正タイプはＡ、ＢまたはＣである。補正タイプＡ，Ｂのときのマッチングデータは、生字幕データａ、音声認識データｂ、字幕遅延確定時刻Ｊｔ及びマッチング補正処理時間Ｒである。また、補正タイプＣのときのマッチングデータは、音声認識データｂ、字幕遅延経過時刻ｔ及びマッチング補正処理時間Ｒである。 The correction type is A, B or C. The matching data for the correction types A and B are raw subtitle data a, voice recognition data b, subtitle delay confirmation time Jt, and matching correction processing time R. The matching data for the correction type C are voice recognition data b, subtitle delay elapsed time t, and matching correction processing time R.

字幕時刻補正部３３は、予め設定された複数の固定値Ｐ１，Ｐ２，・・・のうち、番組情報に応じた固定値Ｐを選択する（ステップＳ８０３）。 The subtitle time correction unit 33 selects a fixed value P according to the program information from a plurality of preset fixed values P1, P2, ... (Step S803).

例えば、番組情報に含まれる番組の種類と、固定値Ｐ１，Ｐ２，・・・とが対応付けられたテーブルが予めメモリに格納されている。字幕時刻補正部３３は、通信部３１から入力した番組情報に含まれる番組の種類に対応する固定値Ｐをテーブルから読み出し、読み出した固定値Ｐを選択する。 For example, a table in which the types of programs included in the program information and the fixed values P1, P2, ... Are associated with each other is stored in the memory in advance. The subtitle time correction unit 33 reads a fixed value P corresponding to the type of the program included in the program information input from the communication unit 31 from the table, and selects the read fixed value P.

固定値Ｐ１，Ｐ２，・・・は、番組の種類に応じてオペレータにより予め設定された遅延時間であり、テーブルに格納されている。一般に、ニュース番組は情報番組よりも、字幕の書き起こし時間が短くて済む。このため、ニュース番組の固定値Ｐ１は、情報番組の固定値Ｐ２よりも小さい値が格納される。 The fixed values P1, P2, ... Are delay times preset by the operator according to the type of the program, and are stored in the table. In general, news programs require less time to transcribe subtitles than information programs. Therefore, the fixed value P1 of the news program stores a value smaller than the fixed value P2 of the information program.

これにより、後述するステップＳ８０６において、生字幕データａの時刻情報（時刻ｔ_a）は、固定値Ｐに基づいて、番組の種類に応じて適切に補正される。 As a result, in step S806 described later, the time information (time ta) of the raw subtitle data _a is appropriately corrected according to the type of the program based on the fixed value P.

また、字幕時刻補正部３３は、番組の種類毎に、音声認識データｂに対応する生字幕データａの確定した字幕遅延時間を測定してメモリに蓄積し、蓄積した加算結果の統計値（例えば平均値）を求め、固定値Ｐの代わりに統計値Ｐ’を用いるようにしてもよい。この場合、統計値Ｐ’は、後述するステップＳ８０６において固定値Ｐの代わりに用いられる。 Further, the subtitle time correction unit 33 measures the fixed subtitle delay time of the raw subtitle data a corresponding to the voice recognition data b for each type of program and stores it in the memory, and the accumulated statistical value of the addition result (for example). The average value) may be obtained, and the statistical value P'may be used instead of the fixed value P. In this case, the statistical value P'is used instead of the fixed value P in step S806 described later.

例えば、番組情報に含まれる番組の種類と、前述の統計値Ｐ１’，Ｐ２’，・・・とが対応付けられたテーブルが予めメモリに格納されている。字幕時刻補正部３３は、番組情報に含まれる番組の種類に対応する統計値Ｐ’をテーブルから読み出し、読み出した統計値Ｐ’を選択するようにしてもよい。 For example, a table in which the types of programs included in the program information and the above-mentioned statistical values P1', P2', ... Are associated with each other is stored in the memory in advance. The subtitle time correction unit 33 may read the statistical value P'corresponding to the type of the program included in the program information from the table and select the read statistical value P'.

尚、字幕時刻補正部３３は、番組情報に応じた固定値Ｐを選択するようにしたが、番組情報によることなく予め設定された固定値Ｐを用いるようにしてもよい。 Although the subtitle time correction unit 33 selects the fixed value P according to the program information, the fixed value P set in advance may be used without depending on the program information.

字幕時刻補正部３３は、補正タイプを判定する（ステップＳ８０４）。字幕時刻補正部３３は、ステップＳ８０４において補正タイプＡを判定した場合（ステップＳ８０４：Ａ）、番組内容に対する生字幕の完全同期が可能であると判断し、生字幕データａの時刻ｔ_aを音声認識データｂの時刻ｔ_bに補正する（ステップＳ８０５）。これにより、生字幕データａの時刻ｔ_aは、時刻ｔ_bに修正される。この字幕時刻の補正処理は、マッチングが完了したタイミングで行われる。つまり、字幕時刻の補正処理は、エンコード処理完了時刻Ｅｔ以前のタイミングで確実に行われ、エンコード処理完了時刻Ｅｔを超えるタイミングで行われることはない。 The subtitle time correction unit 33 determines the correction type (step S804). When the subtitle time correction unit 33 determines the correction type A in step S804 (step S804: A), the subtitle time correction unit 33 determines that the live subtitles can be completely synchronized with the program content, and sets the time ta of the live subtitle data _a as audio. The recognition data b is corrected to the time t _b (step S805). As a result, the time t a of the raw subtitle data a is corrected to the time _{t b} _. This subtitle time correction process is performed at the timing when the matching is completed. That is, the correction processing of the subtitle time is surely performed at the timing before the encoding processing completion time Et, and is not performed at the timing exceeding the encoding processing completion time Et.

一方、字幕時刻補正部３３は、ステップＳ８０４において補正タイプＢを判定した場合（ステップＳ８０４：Ｂ）、番組内容に対する生字幕の完全同期が困難であると判断し、生字幕データａの時刻ｔ_aを、ステップＳ８０３にて選択した固定値Ｐに基づいて補正する（ステップＳ８０６）。 On the other hand, when the subtitle time correction unit 33 determines the correction type B in step S804 (step S804: B), the subtitle time correction unit 33 determines that complete synchronization of the live subtitles with respect to the program content is difficult, and the time ta of the live subtitle data _a . Is corrected based on the fixed value P selected in step S803 (step S806).

具体的には、字幕時刻補正部３３は、生字幕データａの時刻ｔ_aから固定値Ｐを減算し、減算結果を生字幕データａの新たな時刻（ｔ_a－Ｐ）とする。これにより、生字幕データａの時刻ｔ_aは、時刻ｔ_aから固定値Ｐを減算した時刻（ｔ_a－Ｐ）に修正される。この字幕時刻の補正処理は、マッチングが完了したタイミングで行われる。つまり、字幕時刻の補正処理は、ほぼエンコード処理完了時刻Ｅｔ以前のタイミングで行われる。 Specifically, the subtitle time correction unit 33 subtracts _a fixed value P from the time ta of the raw subtitle data a, and sets the subtraction result as a new time (t _a −P) of the raw subtitle data a. As _{a result, the time ta of the raw subtitle data a is corrected to the time (t a} _−P ₎ obtained by subtracting the fixed value P from the time ta. This subtitle time correction process is performed at the timing when the matching is completed. That is, the correction processing of the subtitle time is performed at a timing almost before the encoding processing completion time Et.

尚、字幕時刻補正部３３は、ステップＳ８０４において補正タイプＢを判定した場合、ステップＳ８０５と同様に、生字幕データａの時刻ｔ_aを音声認識データｂの時刻ｔ_bに補正するようにしてもよい。 When the correction type B is determined in step S804, the subtitle time correction unit 33 may correct the time t of the raw subtitle data a to the time t _b of the voice recognition data _b , as in step S805. good.

字幕時刻補正部３３は、ステップＳ８０４において補正タイプＣを判定した場合（ステップＳ８０４：Ｃ）、番組内容に対する生字幕の完全同期が困難であると判断し、音声認識データｂの一部または全部を適用して新たな生字幕データを生成する生字幕データ生成処理を行うか、または何らの処理を行わない（生字幕データ生成処理を行わない）（ステップＳ８０７）。 When the subtitle time correction unit 33 determines the correction type C in step S804 (step S804: C), the subtitle time correction unit 33 determines that it is difficult to completely synchronize the live subtitles with the program content, and performs a part or all of the voice recognition data b. The raw subtitle data generation process for applying and generating new raw subtitle data is performed, or no processing is performed (the raw subtitle data generation process is not performed) (step S807).

具体的には、字幕時刻補正部３３は、音声認識データｂの一部または全部を適用して新たな生字幕データを生成する場合、音声認識データｂからテキストデータを抽出し、テキストデータの一部または全部を新たな字幕データの字幕とし、音声認識データｂの時刻ｔ_bを新たな字幕データの時刻とする。これにより、新たな字幕データは、音声認識データｂを適用したデータとなる。この字幕時刻の補正処理は、エンコード処理完了時刻Ｅｔのタイミングで行われる。 Specifically, when the subtitle time correction unit 33 applies a part or all of the voice recognition data b to generate new raw subtitle data, the subtitle time correction unit 33 extracts text data from the voice recognition data b and is one of the text data. Part or all of the data is the subtitle of the new subtitle data, and the time t _b of the voice recognition data b is the time of the new subtitle data. As a result, the new subtitle data becomes the data to which the voice recognition data b is applied. This subtitle time correction process is performed at the timing of the encoding process completion time Et.

字幕時刻補正部３３は、ステップＳ８０５、ステップＳ８０６またはステップＳ８０７から移行して、補正後の生字幕データａを生字幕データａ’として生成するか、または新たな字幕データを生字幕データａ’とし、プレイリストに含まれる生字幕データａの時刻ｔ_aを、生字幕データａ’の時刻ｔ_b，ｔ_a－Ｐに編集する（ステップＳ８０８）。 The subtitle time correction unit 33 shifts from step S805, step S806 or step S807 to generate the corrected raw subtitle data a as raw subtitle data a', or uses new subtitle data as raw subtitle data a'. , The time ta of the raw subtitle data a included in the playlist is edited to the time t _b , ta −P of the raw subtitle data _a ′ ₍ step S808).

これにより、補正タイプＡの場合、時刻ｔ_aを含む生字幕データａに代えて、時刻ｔ_bを含み、かつ生字幕データａと同じ字幕を有する生字幕データａ’が生成され、プレイリストに含まれる生字幕データａの時刻ｔ_aが生字幕データａ’の時刻ｔ_bに編集される。 As a result, in the case of the correction type A, instead of the raw subtitle data a including the time t _a , the raw subtitle data a'that includes the time t _b and has the same subtitle as the raw subtitle data a is generated and is displayed in the playlist. The time t a of the included raw subtitle data a is edited to the time _{t b} _of the raw subtitle data a'.

また、補正タイプＢの場合、時刻ｔ_aを含む生字幕データａに代えて、時刻ｔ_a－Ｐを含み、かつ生字幕データａと同じ字幕を有する生字幕データａ’が生成され、プレイリストに含まれる生字幕データａの時刻ｔ_aが生字幕データａ’の時刻ｔ_a－Ｐに編集される。 Further, in the case of the correction type B, instead of the raw subtitle data _a including the time ta, the raw subtitle data _a'that includes the time ta −P and has the same subtitle as the raw subtitle data a is generated, and the playlist is generated. The time ta of the raw subtitle data _a included in is edited to the time ta −P of the raw subtitle data _a ′.

また、補正タイプＣの場合、時刻を時刻ｔ_bとし、かつ音声認識データｂのテキストデータの一部または全部を字幕とした生字幕データａ’が生成される。また、プレイリストに含まれる生字幕データａの時刻ｔ_aが生字幕データａ’の時刻ｔ_bに編集される。または、補正タイプＣの場合、補正処理は行われない。 Further, in the case of the correction type C, the raw subtitle data a'is generated in which the time is set to time t _b and part or all of the text data of the voice recognition data b is subtitled. Further, the time t a of the raw subtitle data a included in the playlist is edited to the time _{t b} _of the raw subtitle data a'. Or, in the case of the correction type C, the correction processing is not performed.

プレイリストは、エンコーダ１１において、配信データの取得先及び構成内容等が記載されたメタデータとして生成され、その後、字幕時刻補正部３３において、生字幕データａ’が端末装置３にて正常に画面表示されるように編集される。 The playlist is generated as metadata in which the acquisition destination and the configuration contents of the distribution data are described in the encoder 11, and then the raw subtitle data a'is normally displayed on the terminal device 3 in the subtitle time correction unit 33. Edited to be displayed.

字幕時刻補正部３３は、生字幕データａ’及びプレイリストを配信サーバ２へ送信する（ステップＳ８０９）。 The subtitle time correction unit 33 transmits the raw subtitle data a'and the playlist to the distribution server 2 (step S809).

尚、字幕時刻補正部３３から配信サーバ２へ送信される生字幕データａ’は、配信形式（HLS，MPEG-DASH等）に合わせた形式（WebVTT，TTML，ARIB-TTML等）とする。 The raw subtitle data a'transmitted from the subtitle time correction unit 33 to the distribution server 2 is in a format (WebVTT, TTML, ARIB-TTML, etc.) that matches the distribution format (HLS, MPEG-DASH, etc.).

以上のように、本発明の実施形態のコンテンツ配信装置１によれば、マッチング部２２は、音声認識部２１により音声認識データｂの音声を含む放送用送出信号が入力された時刻を基準として時間をカウントし、字幕遅延経過時刻ｔを求める。 As described above, according to the content distribution device 1 of the embodiment of the present invention, the matching unit 22 has a time based on the time when the broadcast transmission signal including the voice of the voice recognition data b is input by the voice recognition unit 21. Is counted, and the subtitle delay elapsed time t is obtained.

マッチング部２２は、音声認識部２１により音声認識データｂの音声を含む放送用送出信号が入力されたタイミングから、当該音声認識データｂに対応する生字幕データａのマッチングが完了するまでの間、音声認識データｂ、字幕遅延経過時刻ｔ及びマッチング補正処理時間Ｒを含むマッチングデータを字幕補正部２３に出力する。 The matching unit 22 is from the timing at which the broadcast transmission signal including the voice of the voice recognition data b is input by the voice recognition unit 21 until the matching of the raw subtitle data a corresponding to the voice recognition data b is completed. Matching data including voice recognition data b, subtitle delay elapsed time t, and matching correction processing time R is output to the subtitle correction unit 23.

また、マッチング部２２は、当該音声認識データｂに対応する生字幕データａのマッチングが完了すると、生字幕データａに含まれる時刻ｔ_aと音声認識データｂに含まれる時刻ｔ_bとの間の差分を算出し、当該差分を反映した字幕遅延確定時刻Ｊｔを求め、生字幕データａ、音声認識データｂ、字幕遅延確定時刻Ｊｔ及びマッチング補正処理時間Ｒを含むマッチングデータを字幕補正部２３に出力する。 Further, when the matching of the raw subtitle data a corresponding to the voice recognition data b is completed, the matching unit 22 between the time t a included in the raw subtitle data a and the time t _b included in the voice recognition data _b . The difference is calculated, the subtitle delay confirmation time Jt reflecting the difference is obtained, and the matching data including the raw subtitle data a, the voice recognition data b, the subtitle delay confirmation time Jt and the matching correction processing time R is output to the subtitle correction unit 23. do.

字幕補正部２３は、ｔ≦Ｅｔであり、かつＪｔ＋Ｒ≦Ｅｔである場合、補正タイプＡを判断する。そして、字幕補正部２３は、エンコード処理完了時刻Ｅｔ以前のタイミングで、生字幕データａの時刻ｔ_aを音声認識データｂの時刻ｔ_bに補正し、生字幕データａ’を生成する。 The subtitle correction unit 23 determines the correction type A when t ≦ Et and Jt + R ≦ Et. Then, the subtitle correction unit 23 corrects the time t of the raw subtitle data a to the time t _b of the voice recognition data _b at the timing before the encoding processing completion time Et, and generates the raw subtitle data a'.

また、字幕補正部２３は、ｔ≦Ｅｔであり、かつＪｔ＋Ｒ＞Ｅｔである場合、補正タイプＢを判断する。そして、字幕補正部２３は、エンコード処理完了時刻Ｅｔ以前のタイミングで、生字幕データａの時刻ｔ_aから所定の固定値Ｐを減算することで生字幕データａの時刻ｔ_aを（ｔ_a－Ｐ）に補正し、生字幕データａ’を生成する。 Further, the subtitle correction unit 23 determines the correction type B when t ≦ Et and Jt + R> Et. Then, the subtitle correction unit 23 subtracts _a predetermined fixed value P from the time ta of the raw subtitle data _a at the timing before the encoding processing completion time Et, thereby setting the time ta of the raw subtitle data _a (ta −). Correct to P) and generate raw subtitle data a'.

さらに、字幕補正部２３は、ｔ＞Ｅｔである場合、補正タイプＣを判断し、エンコード処理完了時刻Ｅｔのタイミングで、音声認識データｂの一部または全部を適用して生字幕データａ’を生成するか、または補正処理を行わない。 Further, when t> Et, the subtitle correction unit 23 determines the correction type C, and applies a part or all of the voice recognition data b to the raw subtitle data a'at the timing of the encoding processing completion time Et. Generate or do not perform correction processing.

これにより、生字幕データａの補正は、当該生字幕データａに対応する映像及び音声を含む放送用送出信号のエンコードが完了するエンコード処理完了時刻Ｅｔまでのタイミングで行われる。また、マッチングが完了した生字幕データａの時刻ｔ_aは、音声認識データｂの時刻ｔ_bに補正されるか、または当該時刻ｔ_bに近い時刻に補正される。 As a result, the correction of the live subtitle data a is performed at the timing until the encoding process completion time Et, when the encoding of the broadcast transmission signal including the video and audio corresponding to the live subtitle data a is completed. Further, the time t a of the raw subtitle data a for which matching is completed is corrected to the time t _b of the voice recognition data _{b, or is corrected to a time close to the time t b} _.

したがって、インターネット配信によるライブストリーミングにおいて、番組内容に対する生字幕の遅延を抑制することができる。特に、番組内容に対して生字幕が大きく遅延した場合であっても、生字幕の到着を待つことなく、エンコード処理完了時刻Ｅｔのタイミングで生字幕データａの補正を行うことができる。 Therefore, in live streaming by Internet distribution, it is possible to suppress the delay of live subtitles with respect to the program content. In particular, even when the live subtitles are significantly delayed with respect to the program content, the raw subtitle data a can be corrected at the timing of the encoding processing completion time Et without waiting for the arrival of the live subtitles.

すなわち、インターネット配信によるライブストリーミングにおいて、エンコード処理時間を超える遅延を発生させずにリアルタイム性を確保しつつ、番組内容に対する生字幕の遅延を抑制することができ、より分かりやすい番組提供が可能となる。 That is, in live streaming by Internet distribution, it is possible to suppress the delay of live subtitles for the program content while ensuring real-time performance without causing a delay exceeding the encoding processing time, and it is possible to provide a more understandable program. ..

以上、実施形態を挙げて本発明を説明したが、本発明は前記実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。例えば前記実施形態のコンテンツ配信装置１において、字幕処理部１２の字幕補正部２３は、図４のステップＳ４０６にて、マッチング部２２に生字幕データａが入力されない状態で、字幕遅延経過時刻ｔがエンコード処理完了時刻Ｅｔ以下でないと判定した場合、補正タイプＣを判断し、音声認識データｂの一部または全部を適用して生字幕データａ’を生成するか、または補正処理を行わないようにした。 Although the present invention has been described above with reference to embodiments, the present invention is not limited to the above-described embodiment and can be variously modified without departing from the technical idea. For example, in the content distribution device 1 of the above embodiment, the subtitle correction unit 23 of the subtitle processing unit 12 has a subtitle delay elapsed time t in a state where the raw subtitle data a is not input to the matching unit 22 in step S406 of FIG. If it is determined that the encoding process is not completed at Et or less, the correction type C is determined, and a part or all of the voice recognition data b is applied to generate the raw subtitle data a', or the correction process is not performed. did.

これに対し、字幕補正部２３は、音声認識データｂから時刻ｔ_bを抽出し、その時刻ｔ_bに関する情報を、生字幕データａの時刻として配信サーバ２へ送信するようにしてもよい。 On the other hand, the subtitle correction unit 23 may extract the time t _b from the voice recognition data b and transmit the information about the time t _b to the distribution server 2 as the time of the raw subtitle data a.

また、前記実施形態のコンテンツ配信装置１において、字幕処理部１２は、インターネット配信用の生字幕データａ’を生成して配信サーバ２へ送信するようにした。 Further, in the content distribution device 1 of the embodiment, the subtitle processing unit 12 generates raw subtitle data a'for Internet distribution and transmits it to the distribution server 2.

これに対し、字幕処理部１２は、生成した生字幕データａ’を、当該コンテンツ配信装置１の前段に設けられた装置（図１には図示せず）へ出力するようにしてもよい。この場合、当該装置は、放送用送出信号に生字幕データａ’を多重し、多重後の放送用送出信号をコンテンツ配信装置１へ出力する。コンテンツ配信装置１は、当該装置から多重後の放送用送出信号を入力し、放送用送出信号をエンコードして配信データＤを生成し、プレイリストを生成すると共に、放送用送出信号から生字幕データを抽出する。そして、コンテンツ配信装置１は、配信データ、生字幕データ及びプレイリストを配信サーバ２へ送信する。 On the other hand, the subtitle processing unit 12 may output the generated raw subtitle data a'to a device (not shown in FIG. 1) provided in front of the content distribution device 1. In this case, the device multiplexes the raw subtitle data a'with the broadcast transmission signal, and outputs the broadcast transmission signal after multiplexing to the content distribution device 1. The content distribution device 1 inputs a broadcast transmission signal after multiplexing from the device, encodes the broadcast transmission signal to generate distribution data D, generates a playlist, and raw subtitle data from the broadcast transmission signal. To extract. Then, the content distribution device 1 transmits the distribution data, the raw subtitle data, and the playlist to the distribution server 2.

また、図１に示したコンテンツ配信システムでは、配信サーバ２と端末装置３とがインターネットを介して接続される。これに対し、コンテンツ配信装置１及び配信サーバ２も、インターネットを介して接続されるようにしてもよい。 Further, in the content distribution system shown in FIG. 1, the distribution server 2 and the terminal device 3 are connected via the Internet. On the other hand, the content distribution device 1 and the distribution server 2 may also be connected via the Internet.

尚、本発明の実施形態によるコンテンツ配信装置１のハードウェア構成としては、通常のコンピュータを使用することができる。コンテンツ配信装置１は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 As the hardware configuration of the content distribution device 1 according to the embodiment of the present invention, a normal computer can be used. The content distribution device 1 is composed of a computer provided with a volatile storage medium such as a CPU and RAM, a non-volatile storage medium such as a ROM, and an interface.

コンテンツ配信装置１に備えた分配部１０、エンコーダ１１及び字幕処理部１２の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。字幕処理部１２は、字幕抽出部２０、音声認識部２１、マッチング部２２及び字幕補正部２３により構成され、字幕補正部２３は、入力部３０、通信部３１、補正タイプ判定部３２及び字幕時刻補正部３３により構成される。 Each function of the distribution unit 10, the encoder 11, and the subtitle processing unit 12 provided in the content distribution device 1 is realized by causing the CPU to execute a program describing these functions. The subtitle processing unit 12 is composed of a subtitle extraction unit 20, a voice recognition unit 21, a matching unit 22, and a subtitle correction unit 23. The subtitle correction unit 23 includes an input unit 30, a communication unit 31, a correction type determination unit 32, and a subtitle time. It is composed of a correction unit 33.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium, read by the CPU, and executed. In addition, these programs can be stored and distributed in storage media such as magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROM, DVD, etc.), semiconductor memories, etc., and can be distributed via a network. You can also send and receive.

１コンテンツ配信装置
２配信サーバ
３端末装置
１０分配部
１１エンコーダ
１２字幕処理部
２０字幕抽出部
２１音声認識部
２２マッチング部
２３字幕補正部
３０入力部
３１通信部
３２補正タイプ判定部
３３字幕時刻補正部
ａ，ａ’ 生字幕データ
ｂ音声認識データ
Ｄ配信データ
ｔ，ｔｉ字幕遅延経過時刻
Ｊｔ，Ｊｔｉ字幕遅延確定時刻
Ｅｔ，Ｅｔi エンコード処理完了時刻
Ｒ，Ｒｉマッチング補正処理時間
Ｊ字幕遅延時間
Ｅエンコード処理時間
ｉ字幕文章番号 1 Content distribution device 2 Distribution server 3 Terminal device 10 Distribution unit 11 Encoder 12 Subtitle processing unit 20 Subtitle extraction unit 21 Voice recognition unit 22 Matching unit 23 Subtitle correction unit 30 Input unit 31 Communication unit 32 Correction type determination unit 33 Subtitle time correction unit a, a'Live subtitle data b Voice recognition data D Distribution data t, ti Subtitle delay elapsed time Jt, Jti Subtitle delay confirmation time Et, Eti Encoding processing completion time R, Ri Matching correction processing time J Subtitle delay time E Encoding processing time i Subtitle text number

Claims

In a content distribution device that inputs a broadcast transmission signal, generates distribution data for live streaming the content of a broadcast program to the Internet, and corrects raw subtitle data included in the broadcast transmission signal.
An encoder that encodes the broadcast transmission signal and generates the distribution data in predetermined time units.
Voice recognition processing is performed on the voice included in the broadcast transmission signal, voice recognition data including voice time information regarding the time when the voice is output is generated, and the raw subtitle data corresponding to the voice recognition data is generated. Count the delay time to find the elapsed time of the subtitle delay,
The raw subtitle data included in the broadcast transmission signal is extracted, matching is performed between the voice recognition data and the live subtitle data, and the data is matched.
The subtitle delay elapsed time is compared with the encoding processing completion time at which the encoding of the broadcast transmission signal including the voice and video of the voice recognition data is completed by the encoder.
When the subtitle delay elapsed time is before the encoding processing completion time and the matching of the raw subtitle data corresponding to the voice recognition data is completed, it is included in the raw subtitle data at the timing when the matching is completed. , The subtitle time information regarding the time when the live subtitle data is displayed on the screen, the voice time information included in the voice recognition data, a preset fixed value, or a delay of the live subtitle data corresponding to the voice recognition data. It is corrected based on the statistical value of time, and the corrected raw subtitle data is output as new raw subtitle data.
If the elapsed subtitle delay time is not before the encoding processing completion time, new raw subtitle data is generated based on the voice recognition data at the timing of the encoding processing completion time, and the new subtitle data is output. Processing unit and
A content distribution device characterized by being equipped with.

In the content distribution device according to claim 1,
The subtitle processing unit
A subtitle extraction unit that extracts the raw subtitle data from the broadcast transmission signal,
A voice recognition unit that performs the voice recognition process on the voice included in the broadcast signal to generate the voice recognition data, and a voice recognition unit.
The delay time of the raw subtitle data extracted by the subtitle extraction unit corresponding to the voice recognition data generated by the voice recognition unit is counted, and the subtitle delay elapsed time is obtained.
When the voice recognition data and the raw subtitle data are matched and the matching is completed, the voice time information included in the voice recognition data and the subtitle time information included in the raw subtitle data are used. A matching unit that calculates the difference and obtains the subtitle delay confirmation time based on the difference,
The time when the predetermined matching correction processing time from the extraction of the raw subtitle data to the output of the new raw subtitle data is added to the subtitle delay confirmation time is set as the matching correction completion scheduled time, and the subtitle delay elapses. When the time is before the encoding processing completion time and the matching correction completion scheduled time is before the encoding processing completion time, the voice recognition data includes the matching at the timing when the matching is completed by the matching unit. Based on the audio time information, the subtitle time information included in the raw subtitle data is corrected, and the corrected raw subtitle data is output as the new live subtitle data.
When the subtitle delay elapsed time is before the encoding processing completion time and the matching correction completion scheduled time is not before the encoding processing completion time, the fixed value or the said at the timing when the matching is completed by the matching unit. Based on the statistical value, the subtitle time information included in the raw subtitle data is corrected, and the corrected raw subtitle data is output as the new raw subtitle data.
If the elapsed subtitle delay time is not before the encoding processing completion time, the new raw subtitle data is generated based on the voice recognition data at the timing of the encoding processing completion time, and the new raw subtitle data is output. Subtitle correction part and
A content distribution device characterized by being equipped with.

In the content distribution device according to claim 2,
The subtitle correction unit
When the subtitle delay elapsed time obtained by the matching unit is before the encoding processing completion time and the matching correction completion scheduled time is before the encoding processing completion time, the first correction type is determined and the subtitle. If the delayed elapsed time is before the encoding processing completion time and the matching correction completion scheduled time is not before the encoding processing completion time, the second correction type is determined, and the subtitle delay elapsed time is the encoding processing completion time. If not before, the correction type determination unit that determines the third correction type,
When the first correction type is determined by the correction type determination unit, the subtitle time information included in the raw subtitle data is corrected to the voice time information included in the voice recognition data, and the corrected raw subtitle is corrected. The data is output as the new raw subtitle data,
When the second correction type is determined, the fixed value or the statistical value is subtracted from the time of the subtitle time information included in the raw subtitle data to obtain a subtraction result, and the subtitle included in the raw subtitle data is obtained. The time information is corrected to the subtraction result, and the corrected raw subtitle data is output as the new raw subtitle data.
When the third correction type is determined, a subtitle time correction unit that generates new raw subtitle data based on the voice recognition data and outputs the new raw subtitle data, and a subtitle time correction unit.
A content distribution device characterized by being equipped with.

In the content distribution device according to claim 3,
The subtitle time correction unit
When the second correction type is determined, the table corresponding to the broadcast program of the broadcast transmission signal is used by using the table in which the fixed value or the statistical value corresponding to the type of the broadcast program is stored. The fixed value or the statistical value is read from the table, the fixed value or the statistical value is subtracted from the time of the subtitle time information included in the raw subtitle data to obtain a subtraction result, and the above-mentioned included in the raw subtitle data. A content distribution device characterized in that the subtitle time information is corrected to the subtraction result, and the corrected raw subtitle data is output as the new live subtitle data.

A program for causing a computer to function as the content distribution device according to any one of claims 1 to 4.