JP2004336606A

JP2004336606A - Caption production system

Info

Publication number: JP2004336606A
Application number: JP2003132670A
Authority: JP
Inventors: Takao Monma; 隆雄門馬; Eiji Sawamura; 英治沢村; Toru Tsugi; 徹都木; Katsuhiko Shirai; 克彦白井
Original assignee: NEC Corp; Nippon Hoso Kyokai NHK; National Institute of Information and Communications Technology; NHK Engineering Services Inc; Japan Broadcasting Corp
Current assignee: NEC Corp; National Institute of Information and Communications Technology; Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2003-05-12
Filing date: 2003-05-12
Publication date: 2004-11-25
Anticipated expiration: 2023-05-12
Also published as: JP4500957B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a caption production system which reduces the burden of write starting work and produces caption data from input data including non-speech information. <P>SOLUTION: A production unit extracting part 11 in collaboration with a morpheme analyzing part 40 divides text included in the input data 100 to generate processing unit caption data 110. A display unit captioning part 12 divides the text included in the processing unit caption data 110 to generate display unit caption data 120. In such a case, newly needed additional timing information is generated by a proportional distribution method and added to a page including spare timing information. As to a page including no spare timing information, a display unit caption is notified to a synchronization detecting part 30 to make a time code at the time of a start/end to be outputted. A timing attaching part 13 generates additional timing information on the basis of the time code from the synchronization detecting part 30, adds the additional timing information to the display unit caption data, and outputs it as caption data for broadcasting. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、字幕制作システムに関し、特に、非スピーチ情報を含む字幕データを効率的に制作することが可能な字幕制作システムに関する。
【０００２】
【従来の技術】
字幕放送を普及させるため、郵政省（当時）から、２００７年までに字幕付与可能な番組全てに字幕を付与するという指針が出されている。
【０００３】
ところが、日本における字幕放送の実施状況は、ＮＨＫの総合テレビが７３．４％と比較的高いものの、在京民放５社の平均は、１６．１％に過ぎない。この実施率は、欧米（例えば、米国では７０％）に比べて著しく低く、字幕放送利用者を始めとする各方面から字幕放送の拡充が強く求められている。
【０００４】
字幕放送の実施率が向上しない最大の原因は、字幕番組制作における多くの部分が手作業により行われていることにある。即ち、字幕番組の作成には、多くの人手と、時間と、コストがかかる。特に、スピーチの書起しを含む字幕データの作成は、作業者の知識と能力に大きく依存するため、自動化が極めて困難であり、また、効率化も困難である。
【０００５】
詳述すると、書起し作業は、テレビ素材番組を収録したビデオテープをＶＴＲで再生し、再生された音声の中から字幕として表示すべき音（通常人の声、以下、スピーチと呼ぶ。）を選択的に聴取することにより行われる。その際、再生された映像に付加されたタイムコードに基づき、スピーチの開始及び終了のタイミングを特定し、それを表すタイミング情報を書起されたテキストデータに付加することも行われる。このような作業を行うには、スピーチ部分の頭出しや、書起しを容易にするための再生速度の調整、繰り返し再生、あるいはスピーチ部分終了点の確認など、複雑なＶＴＲ操作を行わなければならない。したがって、書起し作業を行う者には、高い聞き取り能力のみならず、豊富な語彙と、高い理解力が求められ、さらに複雑なＶＴＲ操作も同時に行えるという器用さも求められる。
【０００６】
以上のことから、字幕番組の制作効率の向上には、書起し作業を行う者の負担軽減が必須である。
【０００７】
書起し作業の負担を軽減することができる従来の字幕作成システムとして、スピーチを書起したテキストデータに対して、タイミング情報の付与を自動的に行うものがある（例えば、特許文献１参照。）。
【０００８】
この字幕制作システムは、電子化された字幕文テキストデータが与えられると、そのデータが示すテキスト（文字列）が画面表示に適した文字列、即ちテレビ画面上での表示単位である単位字幕文、に分割されるように自動的に改行・改頁を行う。その一方で、その字幕文テキストデータの素材であるテレビ番組を収録したビデオテープを再生し、それに付加されているタイミング情報を利用して、分割された単位字幕文が表示されるべきタイミングを決定するタイミング情報を得る。そして、得られたタイミング情報を、単位字幕文データに付加して、字幕放送用の字幕データを生成する。
【０００９】
この字幕作成システムを用いれば、書起し作業者が、スピーチをテキストデータに書起すだけで、タイミング情報が適切に付与された字幕データを得ることができる。
【００１０】
また、字幕制作システムではないが、字幕放送を受信して得た字幕データに含まれるテキストを利用者からの指定に従って分割し、分割された字幕文にそれぞれ対応するタイミング情報を付加し、テレビ画面上での表示形式を変更できるようにするシステムもある（例えば、特許文献２参照）。
【００１１】
このシステムを書起し作業に利用すれば、画面に表示された状態の文字列を考慮すること無く、適当に（例えば、話者毎あるいは文単位に）、タイミング情報を付与しておくことで、後に、適切な画面表示を可能にする字幕用データに変更することができる。
【００１２】
【特許文献１】
特開２００２−３４２３１１号公報
【００１３】
【特許文献２】
特開２０００−３５０１１７号公報
【００１４】
【発明が解決しようとする課題】
通常のテレビ番組では、登場人物の紹介や場面説明を始めとする様々な非スピーチ情報を字幕により提供することが行われている。このような非スピーチ情報の提供は、文字放送においても提供されるべきである。
【００１５】
また、字幕放送においては、複数の登場人物がかわるがわる、あるいは同時に発言するような場面（例えば、対談や討論など）では、その発言内容を文字表示するだけでなく、話者を特定するための情報、例えば話者名、を文字表示する必要もある。
【００１６】
しかしながら、従来の字幕制作システムでは、単位字幕文データと素材番組を再生した音声との間の時間同期を取ることによって、タイミング情報を取得しているため、字幕用テキストと再生された音声の内容とが一致していなければならず、上記のような非スピーチ情報が存在すると自動的にタイミング情報を付与することができなくなるという問題点がある。
【００１７】
また、従来の表示形式を変更するシステムを書起しに利用した場合は、非スピーチ情報の存在が問題となることはなく、また、入力作業の手間を若干軽減することができる。しかしながら、この方法は、タイミング情報を、例えば一文毎に付与する必要があるため、従来の書起し作業と同じく、作業者に対して高い能力を要求するという問題点がある。
【００１８】
そこで、本発明は、書起し作業の負担を軽減することができ、非スピーチ情報を含む入力データから字幕データを制作することができる、簡易な構成の字幕制作システムを提供することを目的とする。
【００１９】
【課題を解決するための手段】
本発明によれば、素材番組の音声に対応するテキストを含む入力データと前記素材番組の再生音声及び再生タイムコードとに基づいて、字幕表示用に前記テキストを改行・改頁して得た表示単位字幕文データと当該表示単位字幕部データを頁毎に表示するタイミングを決定するタイミング情報とを含む放送用字幕データを自動制作する字幕制作システムにおいて、前記入力データとして、予備タイミング情報が付加された頁と予備タイミング情報が付加されていない頁との両方を含むデータを使用することができるようにしたことを特徴とする字幕制作システムが得られる。
【００２０】
また、本発明によれば、素材番組の音声に対応するテキストを含む入力データから、前記テキストを表示単位字幕文となるように改行・改頁して得た表示単位字幕文データを生成するとともに、前記表示単位字幕文データと、前記素材番組の生成音声及び再生タイムコードとに基づいて、前記表示単位字幕文を画面上に表示するタイミングを決定するタイミング情報を生成し、前記表示単位字幕文データと前記タイミング情報とを含む放送用字幕データを自動制作する字幕制作システムにおいて、前記入力データのうち、予備タイミング情報が付与されている頁については、当該頁に含まれるテキストを改行・改頁したことにより新たなタイミング情報が必要となったとき、前記予備タイミング情報を用いる按分法により当該新たなタイミング情報を生成し、前記入力データのうち、予備タイミング情報が付与されていない頁については、当該頁に含まれるテキストを改行・改頁したことにより新たなタイミング情報が必要となったとき、前記表示単位字幕文データと前記再生音声との時間同期をとる自動音声同期処理により新たなタイミング情報を生成するようにしたことを特徴とする字幕制作システムが得られる。
【００２１】
さらに、本発明によれば、素材番組の音声に対応するテキストを含む入力データから、前記テキストを表示単位字幕文となるように改行・改頁して得た表示単位字幕文データを生成するとともに、前記表示単位字幕文データと、前記素材番組の生成音声及び再生タイムコードとに基づいて、前記表示単位字幕文を画面上に表示するタイミングを決定するタイミング情報を生成し、前記表示単位字幕文データと前記タイミング情報とを含む放送用字幕データを自動制作する字幕制作方法において、前記入力データのうち、予備タイミング情報が付与されている頁については、当該頁に含まれるテキストを前記表示単位字幕文となるように改行・改頁したあと、前記予備タイミング情報を用いる按分法により必要に応じて新たなタイミング情報を生成し、前記入力データのうち、予備タイミング情報が付与されていない頁については、当該頁に含まれるテキストを前記表示単位字幕文となるように改行・改頁したあと、前記表示単位字幕文データと前記再生音声との時間同期をとる自動音声同期処理により必要に応じて新たなタイミング情報を生成するようにしたことを特徴とする字幕制作方法が得られる。
【００２２】
さらにまた、本発明によれば、コンピュータに、素材番組の音声に対応するテキストを含む入力データから、前記テキストを表示単位字幕文となるように改行・改頁して得た表示単位字幕文データを生成させるとともに、前記表示単位字幕文データと、前記素材番組の生成音声及び再生タイムコードとに基づいて、前記表示単位字幕文を画面上に表示するタイミングを表わすタイミング情報を生成させ、前記表示単位字幕文データと前記タイミング情報とを含む放送用字幕データを自動制作させる字幕制作プログラムにおいて、前記入力データのうち、予備タイミング情報が付与されている頁については、当該頁に含まれるテキストを前記表示単位字幕文となるように改行・改頁したあと、前記予備タイミング情報を用いる按分法により必要に応じて新たなタイミング情報を生成させ、前記入力データのうち、予備タイミング情報が付与されていない頁については、当該頁に含まれるテキストを前記表示単位字幕文となるように改行・改頁したあと、前記表示単位字幕文データと前記再生音声との時間同期をとる自動音声同期処理により必要に応じて新たなタイミング情報を生成させるようにしたことを特徴とする字幕制作プログラム。
【００２３】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態について詳細に説明する。
【００２４】
図１に本発明の一実施の形態に係る字幕制作システムの構成を示す。図示の字幕制作システムは、素材番組をもとに作成された入力データ（電子化原稿）１００を受けて放送用字幕データを制作する字幕制作制御部１０と、光ディスク等の記録媒体に記録された素材番組を再生する記録再生部２０と、字幕制作制御部１０の制御下で入力データから抽出した表示単位字幕文と記録再生部２０からの再生音声との同期を取る同期検出部３０と、字幕制作制御部１０の制御下で入力データから抽出されたテキスト（文字列）の形態素解析を行う形態素解析部４０と、入力データから抽出されたテキストを分割（改行・改頁）するための分割ルールを記憶する記憶部５０と、字幕制作制御部１０で制作された放送用字幕データに基づく字幕と記録再生部２０で再生された素材番組とをともに表示する監視モニタ６０とを有している。
【００２５】
字幕制作制御部１０は、例えばパーソナルコンピュータにより実現され、入力データ１００を処理単位字幕データ１１０に分割する制作単位抽出部１１と、処理単位字幕データ１１０に含まれるテキストを画面表示に適するよう分割・結合する表示単位字幕化部１２と、表示単位字幕データ１２０にタイミング情報を付加して放送用字幕データ１３０とするタイミング付与部１３とを備えている。
【００２６】
本実施の形態においては、同期検出部３０及び形態素解析部４０はそれぞれ専用コンピュータにより実現される場合を想定しているが、字幕制作制御部１０を含め１台のコンピュータで実現することも可能である。
【００２７】
入力データ１００は、図示しない原稿電子化装置（例えば、パーソナルコンピュータ等）を用いて、素材番組に含まれる音声を表わすを文字列を電子データ化（テキスト変換）したものである。書起し作業者は、ノンリニア操作可能なディスク記録再生装置用いて行われる。
【００２８】
入力データ１００としては、テキストデータ及びＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）データを想定している。テキストデータは、素材番組に含まれる音声を書起した文字列（漢字かな混じり文字列）を表わすデータ（スピーチ情報）と改行情報とを含む。また、ＸＭＬデータは、このテキストデータに、頁情報（頁先頭・末尾情報）や各頁を表示するタイミングを特定するためのタイミング情報等を加えたものである。また、ＸＭＬデータの場合には、話者のプロフィール等、素材番組の音声とは無関係の文字列を表わすデータ（非スピーチ情報）及びそれを表示するためのタイミング情報を含む場合もある。この入力データ１００は、磁気ディスクや光ディスクなどの記録媒体に記録された状態で、字幕制作制御部１０に与えられる。
【００２９】
字幕制作制御部１０の制作単位抽出部１１は、記録媒体に記録された入力データをテキスト１行分ずつ読み込み、読み込んだ一行分のテキストを表わすデータを形態素解析部４０へ送る。
【００３０】
形態素解析部４０は、創作単位抽出部１１から送られてきたテキストを形態素に分割して解析し、そのテキストにおける区切り可能個所を特定する。そして、特定した区切り可能個所情報をそのテキストに追加して制作単位抽出部１１へ送り返す。なお、区切り可能個所は、例えば、句点の後ろ、読点の後ろ、文節と文節との間、あるいは形態素品詞間などであり、この順に優先度を高くしてある。
【００３１】
制作単位抽出部１１は、形態素解析部４０から送り返されたテキストを、それに付加された区切り可能個所情報を参照して、適当な長さ（７０〜９０字程度）に区切り、区切り可能個所情報とともに処理単位字幕データ１１０として、表示単位字幕化部１２へ送出する。但し、入力データ１００のうちタイミング情報（以下、予備タイミング情報という。）が既に追加されている頁に関しては、頁毎に処理単位字幕データ１１０として表示単位字幕化部１２へ送出する。
【００３２】
表示単位字幕化部１２は、記憶部５０に記憶された分割ルールに基づいて、制作単位抽出部１１からの処理単位字幕データ１１０が表わすテキストを、画面上に表示される形式に適合するように分割・結合し、表示単位字幕データ１２０を生成する。つまり、表示単位字幕化部１２は、画面上に表示される字幕の行数及び文字数が、記憶部５０に記憶された分割ルールで規定される一頁当たりの最大行数及び最大文字数以下となるように、処理単位字幕データ１１０が表わすテキストを分割・結合（改行・改頁）して表示単位字幕文とし、それを表わす表示単位字幕データ１２０を生成する。その際、表示単位字幕化部１２は、処理単位字幕データ１２０が予備タイミング情報を含むものである場合には、テキストを分割したことによって新たに必要となるタイミング情報（以下、追加タイミング情報）を、入力された処理単位字幕データに含まれている予備タイミング情報を用いた按分法により生成、追加する。
【００３３】
また、表示単位字幕化部１２は、上記のように生成した表示単位字幕文を同期検出システム３０へ通知する。ただし、表示単位字幕化部１２は、予備タイミング情報を含む処理単位字幕データ１１０に基づいて生成した表示単位字幕文については、同期検出システム３０への通知を行わない。
【００３４】
同期検出システム３０には、表示単位字幕化部１２からの表示単位字幕文を表わすデータのほか、記録再生部２０で再生された素材番組の再生音声と再生タイムコードとが入力される。ここで、記録再生部２０は、制作単位抽出部１１の読み込み開始から所定時間遅れて素材番組の再生を開始するものとする。
【００３５】
同期検出システム３０は、記録再生部２０からの再生音声に対して音声認識処理を行い、表示単位字幕化部１２からの表示単位字幕文と比較する。そして、各表示単位文に対応する再生音声の開始時点及び終了時点に夫々対応するタイムコードを得る。同期検出システム３０は、得られたタイムコードをタイミング付与部１３へ出力する。
【００３６】
タイミング付与部１３は、同期検出システム３０から入力されるタイムコードを、対応する表示単位字幕文を表示するためのタイミング情報として、表示単位字幕化部１２からの表示単位字幕データ１２０に挿入し、放送用字幕データ１３０として監視用モニタ６０及び図示しない外部蓄積部へ出力する。
【００３７】
監視モニタ６０には、記録再生部２０で再生された映像が所定時間遅延させて入力されており、その映像にタイミング付与部１３からの放送用字幕データを重ねて表示する。監視モニタ６０の表示内容を監視することにより、字幕制作制御部１０から出力された放送用字幕データに基づいて表示される字幕の表示タイミング等の問題点が見出され、その後の修正等に利用される。
【００３８】
図１の字幕制作システムは、概略、図２に示すように動作する。
【００３９】
即ち、入力データを一行分ずつ取り込み（ステップＳ２０１）、処理単位字幕データを作成する（ステップＳ２０２）。次に、処理単位字幕文を分割ルールに従い分割して表示単位字幕文データを生成する（ステップＳ２０３）。そして、分割により新たなタイミング情報が必要となった場合（ステップＳ２０４のＹ）は、表示単位字幕文が含まれる頁に予備タイミング情報が付与されているか否か判定する（ステップＳ２０５）。予備タイミング情報が付与されている場合は、それを用いた按分法により追加タイミング情報を生成し追加する（ステップＳ２０６）。予備タイミング情報が付与されていない場合は、同期検出部３０で同期検出を行ってタイミングコードを得る（ステップＳ２０７）。そして、タイミングコードを表わす追加タイミング情報を生成して表示単位字幕データに追加する（ステップＳ２０８）。最後に、上述のようにして得られた放送用字幕データを外部へ出力する（ステップＳ２０９）。
【００４０】
以下、本実施の形態に係る字幕制作システムの動作について実施例に基づき詳細に説明する。
【００４１】
［実施例１］
入力データとして図３に示すＸＭＬデータを考える。このＸＭＬデータでは、全ての頁に予備タイミング情報（ＩＮ＿ＴＩＭＥ＝””及びＯＵＴ＿ＴＩＭＥ＝””；””内の数字は、前から２桁ずつ、時間、分、秒及びフレームＮｏ．をそれぞれ示す。）が付与されている。
【００４２】
このような入力データが字幕制作制御部１０に与えられると、制作単位抽出部１１は、形態素解析部４０と協働して各頁に含まれるテキストに対して区切り可能個所情報を付加し、処理単位字幕文を生成して処理単位字幕データ１１０として表示単位字幕化部１２へ出力する。
【００４３】
表示単位字幕化部１２は、記憶部５０の分割ルールを参照して各頁のテキストを分割（改行・改頁）する。本実施例では、全ての頁に予備タイミング情報が付与されているので、テキストの分割により新たに必要となる追加タイミング情報は、予備タイミング情報を利用した按分法により生成される。
【００４４】
本実施例の場合、全ての頁に既存タイミング情報が含まれているので、表示単位字幕化部１２から同期検出部３０への表示単位字幕文に関する情報は出力されない。したがって、同期検出部３０からタイムコードは出力されず、タイミング付与部１３での新たなタイミング情報の付与も行われない。したがって、表示単位字幕化部１２からの出力がそのままタイミング付与部１３を介して放送用字幕データとして出力される。このとき得られる放送用字幕データは、図４のようになる。
【００４５】
ここで、表示単位字幕化部１２における追加タイミング情報の生成について説明する。
【００４６】
入力データ（ＸＭＬデータ）が図５で示されるようなものであって、そこから、図６に示すような表示単位字幕文が生成されたとする。この場合、図６における第２番目の文字列の終了タイミング及び第３番目の文字列の開始タイミングを図５から直接得ることができない。即ち、入力データに含まれる予備タイミング情報をそのまま利用することはできない。
【００４７】
そこで、表示単位字幕化部１２は、「法務局内に忘れてきました。」の終了タイミングを計算により求める。この文字列の構成は、かな７文字、漢字５文字であって、その統計的読み数は１６．３である。図５からこの文字列に関する一読み数に対応する時間（読速）は０．１２なので、この文字列に関する所要時間は、１６．３×０．１２＝１．８９となる。したがって、図６の第２番目の文字列の終了タイミングは、上記文字列の開始タイミングである２７．１５に所要時間１．８９を加えた２９．０４となる。図６の第３番目の文字列の開始タイミングについては、第２番目の文字列の終了タイミングに一致させればよい。
【００４８】
以上のように、本実施の形態の字幕創作システムは、全ての頁に予備タイミング情報が付与されている入力データから画面表示に適した放送用字幕データを自動的に制作することができる。なお、按分法は、音声同期処理に比べて著しく短い時間で必要とされる追加タイミング情報を生成することができるので、上記のように全ての頁に予備タイミング情報が付与されている場合には、素材番組の放送（再生）に要する時間の１／３０程度の時間で、放送用字幕データを得ることができる。また、按分法により得られたタイミング情報に基づく字幕の表示とそれ対応する音声との間のずれは、１秒以下であり、許容範囲内と考えられる。
【００４９】
［実施例２］
本実施の形態に係る字幕制作システムは、入力データとして既存の放送用字幕データを用いることもできる。即ち、この字幕制作システムは、既存の放送用字幕データの表示形式を変更する場合にも利用できる。
【００５０】
例えば、標準放送と高精細放送とでは、画面（画像）の縦横比が異なるので、標準放送用に作成した字幕データを高精細放送に利用する場合には、見易さ等の観点から字幕１行当たりの文字数を変更したい場合がある。そのような場合に、本実施の形態に係る字幕制作システムを利用することができる。
【００５１】
具体例を挙げると、図７に示すような、１５行×２行で字幕表示を行う３頁分の標準放送用字幕データがあり、これを２２行×２行での字幕表示を行う高精細放送用字幕データに変更したい場合、本実施の形態に係る字幕制作システムは、実施例１と同様に動作して、図８に示すような出力データを出力する。本実施例の場合、字幕の再整形により必要となる追加タイミング情報は、全て按分法により得ることができる。
【００５２】
このように、本実施の形態に係る字幕制作システムは、既に制作されている放送用字幕データの再整形にも利用できる。
【００５３】
［実施例３］
本実施の形態に係る字幕制作システムは、予備タイミング情報を持つ頁と持たない頁とが混在する入力データにも対応できる。
【００５４】
例えば、図９に示す様な字幕表示を行う場合を考える。図９に示すテキストのうち、第２行から第６行の斜体文字列が非スピーチ情報であり、残りがスピーチ情報である。この場合、非スピーチ情報については、書起し作業者が手入力でタイミング情報を付加する。その結果、得られるＸＭＬデータは、図１０のようになる。
【００５５】
図１０に示す入力データが与えられると、本実施の形態に係る字幕制作システムは、予備タイミング情報を有する頁については、実施例１及び２と同様にして、按分法により必要な追加タイミング情報を得る。一方、予備タイミング情報を有していない頁については、同期検出部３０において音声自動同期検出を行い、各表示単位字幕に対応する再生音声の開始タイミング及び表示終了タイミングに一致するタイムコードを取得し、取得したタイムコードの基づいて追加タイミング情報を生成する。こうして、図１１に示すような放送用字幕データを得ることができる。
【００５６】
以上のように、本実施の形態に係る字幕制作システムでは、入力データのうちの予備タイミング情報が含まれている頁については、自動音声同期処理を行うこと無く追加タイミング情報が生成される。また、入力データのうちの予備タイミング情報が含まれていない頁については、自動音声同期処理により追加タイミング情報が生成される。従って、スピーチ情報については予備タイミング情報を付与せず、非スピーチ情報には予備タイミング情報を付与しておくようにすることで、スピーチ情報及び非スピーチ情報が混在する入力データに対して、タイミング情報の自動付与を行うことができる。
【００５７】
しかも、按分法による追加タイミングの生成は、自動音声同期処理に比べて処理に要する時間が著しく短いので、非スピーチ情報が含まれていない入力データに対してタイミング情報を自動付与する場合に要する時間とほぼ同じ処理時間で、スピーチ情報及び非スピーチ情報が混在する入力データに対するタイミング情報の自動付与を行うことができる。
【００５８】
［実施例４］
既存タイミング情報を持つ頁と持たない頁が混在する入力データに対してタイミング情報の自動付与を行うと、予備タイミング情報を持たない頁のテキストに関して付加した追加タイミング情報が、予備タイミング情報と整合しない場合が起こり得る。これは、例えば、同期検出部３０において再生音声に含まれる雑音をスピーチと認識した場合などに生じる。
【００５９】
ここで、図１２に示すテキストデータを考える。第２行目の斜体文字列が非スピーチ情報であり、この非スピーチ情報には、書起し作業者が手作業で予備タイミング情報を与えるものとする。そして、このような入力データが入力された場合に、同期検出部３０から得られるタイムコードに基づいて追加タイミング情報を付加した結果、図１３に示すような結果が得られたと仮定する。この場合、第１番目の頁を表示している間に、第２番目の頁を表示しなければならないことになっており、いずれかのタイミング情報が不適切であることは明らかである。
【００６０】
このような事態を避けるため、タイミング付与部１３は、自動付与した追加タイミング情報が予備タイミング情報と整合するか否か判定する。そして、互いに連続する頁の表示時間が重なっている場合には、自動付与した追加タイミング情報を変更して２つの頁の表示時間に重なりがなくなるように調整できるか否か判定する。つまり、自動付与したタイミング情報を表示時間に重なりがなくなるように変更した場合、その頁を表示する時間として許容できる表示時間が残るか否か判断する。図１３の場合、第１番目の頁の表示時間は２１秒であり、第２番目の頁の表示時間である１０秒を差し引いても１１秒の残りがある。したがって、１番目の頁の表示時間を短縮しても何ら問題がないであろうと判定できる。この結果、タイミング付与部１３は、１番目の頁の表示終了時間が誤りであると判断して、第２頁の表示開始時間に一致させる。こうして、タイミング付与部１３は、タイミング情報を自動補正して、図１３に示す放送用字幕データを出力する。
【００６１】
なお、タイミング付与部１３は、重なり部分をなくすよう調整することが適切でないと判断した場合は、タイミング情報の自動補正を行わない。この場合は、後に、手入力により修正が行われる。
【００６２】
以上、本発明について一実施の形態に即して説明したが、本発明は上記実施の形態に限定されるものではない。
【００６３】
例えば、上記実施の形態では、予備タイミング情報の有無に基づいて頁単位で表示単位字幕文を同期検出部３０へ送出するか否か判断するようにしたが、対談や討論における話者名とそのスピーチのように、スピーチ情報と非スピーチ情報とを同一頁に表示したい場合には、例えば、所定の記号で挟まれた文字列のみ同期検出部３０へ送出する／送出しない、とすればよい。具体的には、“「」（かぎ括弧）”で囲まれた文字列のみを同期検出部３０へ送信するようにしたり、あるいは“＜＞”で囲まれた文字列は同期検出部３０へ送出しないようにしておけばよい。
【００６４】
【発明の効果】
本発明によれば、入力データのうち予備タイミング情報が付与された頁についてはその予備タイミング情報を用いた按分法により、予備タイミング情報が付与されていない頁については音声同期処理法により、テキストの改行・改頁により必要となる新たなタイミング情報を得るようにしたことで、非スピーチ情報を含むテキストデータから字幕データを自動制作することができる。そして、その結果、字幕作成の書起し作業の負担を軽減することができる。
【図面の簡単な説明】
【図１】本発明の一実施の形態に係る字幕制作システムの構成を示す概略図である。
【図２】図１の字幕制作システムにおける字幕制作制御部の動作を説明するためのフローチャートである。
【図３】図１の字幕制作システムに入力される入力データの一例を示す図である。
【図４】図３の入力データから制作された放送用字幕データを示す図である。
【図５】図１の字幕制作システムに入力される他の入力データに含まれるテキストとそれに関連する関連情報とを示す図である。
【図６】図５の他の入力データに含まれるテキストから得た表示単位字幕文とその開始・終了時間とを示す図である。
【図７】図１の字幕制作システムに入力される入力データのさらに別の例を示す図である。
【図８】図７の入力データから制作された放送用字幕データを示す図である。
【図９】スピーチ情報と非スピーチ情報とを含むテキストの例である。
【図１０】図９のテキストから生成したＸＭＬデータを示す図である。
【図１１】図１０のＸＭＬデータを図１の字幕制作システムに入力して得られる放送用字幕データを示す図である。
【図１２】スピーチ情報と非スピーチ情報とを含むテキストの他の例である。
【図１３】手入力された予備タイミング情報と自動付与された追加タイミング情報とが整合していていない状態のＸＭＬデータを示す図である。
【図１４】手入力された予備タイミング情報と自動付与された追加タイミング情報とが整合した状態のＸＭＬデータを示す図である。
【符号の説明】
１０字幕制作制御部
１１制作単位抽出部
１２表示単位字幕化部
１３タイミング付与部
２０記録再生部
３０同期検出システム
４０形態素解析部
５０記憶部
６０監視モニタ
１００電子化原稿（入力データ）
１１０処理単位字幕データ
１２０表示単位字幕データ
１３０放送用字幕データ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a caption production system, and more particularly, to a caption production system capable of efficiently producing caption data including non-speech information.
[0002]
[Prior art]
In order to spread subtitle broadcasting, the Ministry of Posts and Telecommunications (at that time) has issued a guideline to provide subtitles to all programs to which subtitles can be added by 2007.
[0003]
However, while subtitle broadcasting is currently being implemented in Japan at a relatively high rate of 73.4% for NHK's general television, the average among the five commercial broadcasters in Tokyo is only 16.1%. This implementation rate is remarkably lower than that in Europe and the United States (for example, 70% in the United States), and there is a strong demand for expansion of subtitle broadcasting from various fields including subtitle broadcasting users.
[0004]
The biggest reason why the implementation rate of subtitle broadcasting does not improve is that many parts in the production of subtitle programs are performed manually. That is, creation of a subtitle program requires a lot of manpower, time, and cost. In particular, the creation of caption data including the transcript of speech greatly depends on the knowledge and ability of the worker, and therefore, it is extremely difficult to automate and make it more efficient.
[0005]
More specifically, in the transcript operation, a video tape on which a television material program is recorded is reproduced by a VTR, and sounds to be displayed as subtitles from the reproduced sounds (normal human voice, hereinafter referred to as speech). This is done by selectively listening to At that time, the start and end timings of the speech are specified based on the time code added to the reproduced video, and timing information indicating the timing is added to the written text data. In order to perform such operations, complex VTR operations must be performed, such as cueing the speech portion, adjusting the playback speed to facilitate transcription, repeating playback, or confirming the end point of the speech portion. No. Therefore, those who perform the transcript work are required to have not only high listening ability but also abundant vocabulary and high comprehension, and dexterity that can perform complicated VTR operations at the same time.
[0006]
From the above, in order to improve the production efficiency of a subtitle program, it is essential to reduce the burden on the person who performs the transcript work.
[0007]
2. Description of the Related Art As a conventional caption creation system that can reduce the burden of the transcripting operation, there is a system that automatically adds timing information to text data in which a speech is transcribed (for example, see Patent Document 1). ).
[0008]
In this subtitle production system, when digitized subtitle text data is provided, a text (character string) indicated by the data is a character string suitable for screen display, that is, a unit subtitle text which is a display unit on a television screen. , A line feed and a page break are automatically performed. On the other hand, a videotape containing a TV program that is the material of the subtitle text data is played back, and the timing at which the divided subtitle text should be displayed is determined using the timing information added to the video tape. Obtain timing information to perform. Then, the obtained timing information is added to the unit subtitle sentence data to generate subtitle data for subtitle broadcasting.
[0009]
With this subtitle creation system, a transcription operator can obtain subtitle data to which timing information is appropriately added only by transcribing a speech to text data.
[0010]
In addition, although it is not a caption production system, the text included in the caption data obtained by receiving the caption broadcast is divided according to the specification from the user, and the timing information corresponding to each of the divided caption sentences is added, and the TV screen is added. There is also a system that can change the display format described above (for example, see Patent Document 2).
[0011]
If this system is used for writing and working, it is possible to add timing information appropriately (for example, for each speaker or for each sentence) without considering the character string displayed on the screen. After that, it can be changed to subtitle data that enables appropriate screen display.
[0012]
[Patent Document 1]
JP-A-2002-342311
[0013]
[Patent Document 2]
JP 2000-350117 A
[0014]
[Problems to be solved by the invention]
In ordinary television programs, various non-speech information such as introduction of characters and scene descriptions is provided by subtitles. The provision of such non-speech information should be provided also in teletext.
[0015]
In closed caption broadcasting, in a scene where a plurality of characters alternate or speak simultaneously (for example, in a dialogue or a discussion), not only the contents of the statement are displayed in text but also information for identifying the speaker. For example, the speaker name needs to be displayed in characters.
[0016]
However, in the conventional subtitle production system, since the timing information is obtained by synchronizing the unit subtitle sentence data with the audio reproduced from the material program, the subtitle text and the content of the reproduced audio are obtained. Must be matched, and if the non-speech information as described above exists, the timing information cannot be automatically given.
[0017]
Further, when a conventional system for changing the display format is used for transcript, the existence of non-speech information does not cause a problem, and the labor of input work can be slightly reduced. However, this method requires timing information, for example, for each sentence, and thus has a problem in that it requires a high ability for the operator, as in the case of the conventional writing operation.
[0018]
Therefore, an object of the present invention is to provide a caption production system with a simple configuration that can reduce the burden of transcript work and can produce caption data from input data including non-speech information. I do.
[0019]
[Means for Solving the Problems]
According to the present invention, based on input data including text corresponding to the audio of the material program and the reproduced audio and reproduction time code of the material program, a display obtained by line-feeding and page-breaking the text for subtitle display In a subtitle production system for automatically producing broadcast subtitle data including unit subtitle sentence data and timing information for determining the timing of displaying the display unit subtitle data for each page, preliminary timing information is added as the input data. Thus, it is possible to obtain a caption production system characterized in that data including both pages that have been added and pages to which no preliminary timing information has been added can be used.
[0020]
Further, according to the present invention, from input data including text corresponding to the audio of the material program, display unit subtitle sentence data obtained by line-feeding and paginating the text so as to become a display unit subtitle sentence is generated. Generating, based on the display unit subtitle sentence data, the generated audio and playback time code of the material program, timing information for determining a timing of displaying the display unit subtitle sentence on a screen, In a caption production system for automatically producing broadcast subtitle data including data and the timing information, in the input data, for a page to which preliminary timing information has been added, text included in the page is line-feeded / page-breaked. When new timing information becomes necessary due to the above, the new timing information is obtained by the proportional distribution method using the preliminary timing information. Information is generated, and for the page to which the preliminary timing information is not added among the input data, when the timing included in the text included in the page is changed to a new line or a new page, new timing information is required. A new subtitle production system is provided in which new timing information is generated by an automatic audio synchronization process for synchronizing the unit subtitle sentence data with the reproduction audio.
[0021]
Furthermore, according to the present invention, from input data including text corresponding to the audio of the material program, display unit subtitle sentence data obtained by line-feeding and paginating the text so as to be a display unit subtitle sentence is generated. Generating, based on the display unit subtitle sentence data, the generated audio and playback time code of the material program, timing information for determining a timing of displaying the display unit subtitle sentence on a screen, In the caption production method for automatically producing broadcast subtitle data including data and the timing information, for a page of the input data to which preliminary timing information is added, the text included in the page is displayed in the display unit subtitle. After line breaks and page breaks to form sentences, new timing information is added as necessary by the proportional distribution method using the preliminary timing information. In the input data, for the page to which the preliminary timing information is not added, the text included in the page is line-feeded / broken down to become the display unit subtitle sentence, and then the display unit subtitle sentence data A new subtitle production method is provided in which new timing information is generated as necessary by an automatic audio synchronization process for time synchronization between the subtitles and the reproduced audio.
[0022]
Still further, according to the present invention, display unit subtitle sentence data obtained by inputting line breaks and page breaks from input data including text corresponding to audio of a material program to a computer so that the text becomes a display unit subtitle sentence And generating, based on the display unit subtitle sentence data and the generated audio and playback time code of the material program, timing information indicating a timing at which the display unit subtitle sentence is displayed on a screen, In a caption production program for automatically producing broadcast subtitle data including unit subtitle sentence data and the timing information, among the input data, for a page to which preliminary timing information is added, the text included in the page is converted to the text. After line breaks and page breaks to become display unit subtitles, necessary by the proportional distribution method using the preliminary timing information New timing information is generated in response to the input data, and for the page to which the preliminary timing information is not added, after the text included in the page is subjected to line feed / line feed to become the display unit subtitle sentence, A subtitle producing program for generating new timing information as needed by an automatic audio synchronization process for synchronizing the display unit subtitle sentence data with the reproduced audio.
[0023]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0024]
FIG. 1 shows a configuration of a caption production system according to an embodiment of the present invention. The illustrated subtitle production system includes a subtitle production control unit 10 that receives input data (digitized manuscript) 100 produced based on a material program and produces subtitle data for broadcast, and a subtitle production control unit 10 that is recorded on a recording medium such as an optical disk. A recording / reproducing unit 20 for reproducing a material program, a synchronization detecting unit 30 for synchronizing a display unit subtitle sentence extracted from input data under the control of the subtitle production control unit 10 and a reproduction audio from the recording / reproducing unit 20, A morphological analysis unit 40 that performs morphological analysis of text (character string) extracted from input data under the control of the production control unit 10, and a division rule for dividing (line feed / page break) the text extracted from the input data And a monitoring monitor 60 that displays both the subtitles based on the broadcast subtitle data produced by the subtitle production control unit 10 and the material programs reproduced by the recording and reproduction unit 20. It has.
[0025]
The subtitle production control unit 10 is realized by a personal computer, for example, and divides the input data 100 into processing unit subtitle data 110, and divides the text included in the processing unit subtitle data 110 so as to be suitable for screen display. The display unit includes a display unit subtitle conversion unit 12 to be combined and a timing providing unit 13 that adds timing information to the display unit subtitle data 120 to generate broadcast subtitle data 130.
[0026]
In the present embodiment, it is assumed that the synchronization detection unit 30 and the morphological analysis unit 40 are each realized by a dedicated computer, but may be realized by one computer including the subtitle production control unit 10. is there.
[0027]
The input data 100 is obtained by converting a character string representing a sound included in a material program into electronic data (text conversion) using a not-shown document digitizing device (for example, a personal computer). The transcription is performed by using a disk recording / reproducing apparatus capable of performing a non-linear operation.
[0028]
The input data 100 is assumed to be text data and XML (extensible Markup Language) data. The text data includes data (speech information) representing a character string (a character string mixed with kanji and kana) which transcribed the sound included in the material program and line feed information. The XML data is obtained by adding page information (page start / end information), timing information for specifying timing for displaying each page, and the like to the text data. In the case of XML data, the data may include data (non-speech information) representing a character string irrelevant to the audio of the material program, such as a speaker profile, and timing information for displaying the data. The input data 100 is provided to the caption production control unit 10 in a state recorded on a recording medium such as a magnetic disk or an optical disk.
[0029]
The production unit extraction unit 11 of the subtitle production control unit 10 reads the input data recorded on the recording medium for each line of text, and sends data representing the read one line of text to the morphological analysis unit 40.
[0030]
The morphological analysis unit 40 divides the text sent from the creation unit extraction unit 11 into morphemes, analyzes the morphemes, and specifies delimitable locations in the text. Then, the specified delimitable location information is added to the text and sent back to the production unit extraction unit 11. Note that the delimitable places are, for example, after a period, after a reading point, between phrases, or between morpheme parts of speech, and the order of priority is higher in this order.
[0031]
The production unit extraction unit 11 divides the text returned from the morphological analysis unit 40 into an appropriate length (about 70 to 90 characters) with reference to the delimitable location information added thereto, together with the delimitable location information. It is transmitted to the display unit subtitle conversion unit 12 as the processing unit subtitle data 110. However, pages of the input data 100 to which timing information (hereinafter referred to as “preliminary timing information”) has already been added are sent to the display unit subtitle conversion unit 12 as processing unit subtitle data 110 for each page.
[0032]
The display unit subtitle conversion unit 12 converts the text represented by the processing unit subtitle data 110 from the production unit extraction unit 11 into a format that is displayed on the screen, based on the division rule stored in the storage unit 50. By dividing and combining, the display unit subtitle data 120 is generated. That is, the display unit captioning unit 12 makes the number of lines and the number of characters of the caption displayed on the screen less than or equal to the maximum number of lines and the maximum number of characters per page specified by the division rule stored in the storage unit 50. As described above, the text represented by the processing unit subtitle data 110 is divided / combined (line feed / page break) into a display unit subtitle sentence, and the display unit subtitle data 120 representing the sentence is generated. At this time, when the processing unit subtitle data 120 includes preliminary timing information, the display unit subtitle conversion unit 12 inputs timing information (hereinafter, additional timing information) newly required by dividing the text. Is generated and added by the proportional distribution method using the preliminary timing information included in the processed subtitle data.
[0033]
Further, the display unit captioning unit 12 notifies the synchronization detection system 30 of the display unit caption text generated as described above. However, the display unit captioning unit 12 does not notify the synchronization detection system 30 of the display unit caption text generated based on the processing unit caption data 110 including the preliminary timing information.
[0034]
To the synchronization detection system 30, in addition to the data representing the display unit caption text from the display unit captioning unit 12, the playback audio and playback time code of the material program played back by the recording and playback unit 20 are input. Here, it is assumed that the recording / reproducing unit 20 starts reproducing the material program with a predetermined time delay from the start of reading by the production unit extracting unit 11.
[0035]
The synchronization detection system 30 performs a speech recognition process on the reproduced sound from the recording / reproducing unit 20, and compares it with the display unit subtitle sentence from the display unit subtitle unit 12. Then, time codes respectively corresponding to the start time and the end time of the reproduced sound corresponding to each display unit sentence are obtained. The synchronization detection system 30 outputs the obtained time code to the timing providing unit 13.
[0036]
The timing providing unit 13 inserts the time code input from the synchronization detection system 30 into the display unit subtitle data 120 from the display unit subtitle unit 12 as timing information for displaying the corresponding display unit subtitle sentence, It is output to the monitoring monitor 60 and an external storage unit (not shown) as broadcast subtitle data 130.
[0037]
The video reproduced by the recording / reproducing unit 20 is input to the monitoring monitor 60 with a delay of a predetermined time, and the video is displayed with broadcast subtitle data from the timing imparting unit 13 superimposed thereon. By monitoring the display content of the monitoring monitor 60, problems such as the display timing of subtitles displayed based on the subtitle data for broadcast output from the subtitle production control unit 10 are found and used for subsequent correction and the like. Is done.
[0038]
The subtitle production system of FIG. 1 operates roughly as shown in FIG.
[0039]
That is, input data is fetched line by line (step S201), and processing unit subtitle data is created (step S202). Next, the processing unit subtitle text is divided according to the division rule to generate display unit subtitle text data (step S203). Then, when new timing information is required due to the division (Y in step S204), it is determined whether or not preliminary timing information is added to the page including the display unit subtitle text (step S205). If the preliminary timing information has been added, additional timing information is generated and added by a proportional distribution method using the preliminary timing information (step S206). If the preliminary timing information has not been added, the synchronization detection unit 30 performs synchronization detection to obtain a timing code (step S207). Then, additional timing information indicating the timing code is generated and added to the display unit subtitle data (step S208). Finally, the subtitle data for broadcasting obtained as described above is output to the outside (step S209).
[0040]
Hereinafter, the operation of the caption production system according to the present embodiment will be described in detail based on examples.
[0041]
[Example 1]
Consider the XML data shown in FIG. 3 as input data. In this XML data, preliminary timing information (IN_TIME = "" and OUT_TIME = "";"" in all pages indicates the hour, minute, second, and frame number, respectively, two digits from the front). Is given.
[0042]
When such input data is provided to the subtitle production control unit 10, the production unit extraction unit 11 adds delimitable place information to text included in each page in cooperation with the morphological analysis unit 40, and performs processing. A unit caption sentence is generated and output to the display unit captioning unit 12 as processing unit caption data 110.
[0043]
The display unit subtitle unit 12 divides the text of each page (line feed / page break) with reference to the division rule of the storage unit 50. In this embodiment, since the preliminary timing information is added to all the pages, additional timing information newly required by dividing the text is generated by the proportional distribution method using the preliminary timing information.
[0044]
In the case of the present embodiment, since the existing timing information is included in all the pages, the information regarding the display unit subtitle sentence from the display unit subtitle generation unit 12 to the synchronization detection unit 30 is not output. Therefore, no time code is output from the synchronization detection unit 30, and no new timing information is added by the timing addition unit 13. Therefore, the output from the display unit captioning unit 12 is output as it is as broadcast caption data via the timing providing unit 13. The subtitle data for broadcasting obtained at this time is as shown in FIG.
[0045]
Here, generation of the additional timing information in the display unit subtitle conversion unit 12 will be described.
[0046]
It is assumed that the input data (XML data) is as shown in FIG. 5 and a display unit subtitle sentence as shown in FIG. 6 is generated therefrom. In this case, the end timing of the second character string and the start timing of the third character string in FIG. 6 cannot be directly obtained from FIG. That is, the preliminary timing information included in the input data cannot be used as it is.
[0047]
Therefore, the display unit subtitle conversion unit 12 calculates the end timing of “forgot in the Legal Affairs Bureau” by calculation. The structure of this character string is 7 kana characters and 5 kanji characters, and the number of statistical readings is 16.3. From FIG. 5, the time (reading speed) corresponding to the number of readings for this character string is 0.12, so the required time for this character string is 16.3 × 0.12 = 1.89. Therefore, the end timing of the second character string in FIG. 6 is 29.04 obtained by adding the required time 1.89 to 27.15 which is the start timing of the character string. The start timing of the third character string in FIG. 6 may coincide with the end timing of the second character string.
[0048]
As described above, the subtitle creation system according to the present embodiment can automatically generate broadcast subtitle data suitable for screen display from input data in which preliminary timing information is added to all pages. In addition, the apportioning method can generate additional timing information required in a significantly shorter time than the voice synchronization processing. Therefore, when the preliminary timing information is added to all pages as described above, Thus, the subtitle data for broadcasting can be obtained in about 1/30 of the time required for broadcasting (reproduction) of the material program. In addition, the difference between the display of the caption based on the timing information obtained by the apportionment method and the corresponding sound is less than 1 second, and is considered to be within the allowable range.
[0049]
[Example 2]
The subtitle production system according to the present embodiment can use existing broadcast subtitle data as input data. That is, this subtitle production system can be used even when the display format of the existing broadcast subtitle data is changed.
[0050]
For example, since the aspect ratio of the screen (image) differs between the standard broadcast and the high-definition broadcast, when the subtitle data created for the standard broadcast is used for the high-definition broadcast, the caption 1 You may want to change the number of characters per line. In such a case, the subtitle production system according to the present embodiment can be used.
[0051]
As a specific example, as shown in FIG. 7, there are three pages of standard broadcast subtitle data for displaying subtitles in 15 lines × 2 lines, which are displayed in high definition for displaying subtitles in 22 lines × 2 lines. When it is desired to change to broadcast subtitle data, the subtitle production system according to the present embodiment operates in the same manner as in Example 1, and outputs output data as shown in FIG. In the case of the present embodiment, all the additional timing information required by the subtitle reshaping can be obtained by the proportional method.
[0052]
As described above, the subtitle production system according to the present embodiment can also be used for reshaping broadcast subtitle data that has already been produced.
[0053]
[Example 3]
The caption production system according to the present embodiment can cope with input data in which pages with preliminary timing information and pages without preliminary timing information are mixed.
[0054]
For example, consider a case where a caption display as shown in FIG. 9 is performed. In the text shown in FIG. 9, the italic character strings in the second to sixth lines are non-speech information, and the rest are speech information. In this case, for the non-speech information, the transcriber manually adds the timing information. As a result, the obtained XML data is as shown in FIG.
[0055]
When the input data shown in FIG. 10 is given, the caption production system according to the present embodiment supplies the additional timing information necessary for the page having the preliminary timing information by the proportional distribution method in the same manner as in Examples 1 and 2. obtain. On the other hand, for a page that does not have the preliminary timing information, the synchronization detection unit 30 performs automatic audio synchronization detection, and acquires a time code that matches the start timing and the display end timing of the playback audio corresponding to each display unit subtitle. , And generates additional timing information based on the acquired time code. Thus, broadcast subtitle data as shown in FIG. 11 can be obtained.
[0056]
As described above, in the caption production system according to the present embodiment, additional timing information is generated without performing automatic audio synchronization processing for a page of input data that includes preliminary timing information. For pages of the input data that do not include the preliminary timing information, additional timing information is generated by automatic audio synchronization processing. Therefore, by not adding preliminary timing information to speech information and adding preliminary timing information to non-speech information, timing information can be applied to input data in which speech information and non-speech information are mixed. Can be automatically provided.
[0057]
Moreover, the generation of the additional timing by the apportionment method requires much less processing time than the automatic voice synchronization processing, so the time required to automatically add timing information to input data that does not include non-speech information is reduced. Automatically adding timing information to input data in which speech information and non-speech information coexist in almost the same processing time as described above.
[0058]
[Example 4]
When timing information is automatically added to input data in which a page having existing timing information and a page having no existing timing information are mixed, additional timing information added to text of a page having no preliminary timing information does not match the preliminary timing information. Cases can happen. This occurs, for example, when the synchronization detection unit 30 recognizes noise included in the reproduced sound as speech.
[0059]
Here, consider the text data shown in FIG. The italic character string in the second line is non-speech information, and the transcriber manually gives preliminary timing information to this non-speech information. Then, when such input data is input, it is assumed that the result shown in FIG. 13 is obtained as a result of adding the additional timing information based on the time code obtained from the synchronization detecting unit 30. In this case, the second page must be displayed while the first page is being displayed, and it is clear that any timing information is inappropriate.
[0060]
In order to avoid such a situation, the timing providing unit 13 determines whether or not the automatically provided additional timing information matches the preliminary timing information. If the display times of the consecutive pages overlap each other, it is determined whether the automatically added additional timing information can be changed to adjust the display times of the two pages so that they do not overlap. That is, when the automatically added timing information is changed so as not to overlap the display time, it is determined whether or not an allowable display time remains as a time for displaying the page. In the case of FIG. 13, the display time of the first page is 21 seconds, and there is a remaining 11 seconds after subtracting 10 seconds which is the display time of the second page. Therefore, it can be determined that there will be no problem even if the display time of the first page is reduced. As a result, the timing providing unit 13 determines that the display end time of the first page is incorrect, and matches the display end time with the display start time of the second page. Thus, the timing providing unit 13 automatically corrects the timing information and outputs the broadcast subtitle data shown in FIG.
[0061]
In addition, when the timing imparting unit 13 determines that it is not appropriate to perform the adjustment so as to eliminate the overlapping portion, it does not automatically correct the timing information. In this case, the correction is manually made later.
[0062]
As described above, the present invention has been described in accordance with one embodiment, but the present invention is not limited to the above embodiment.
[0063]
For example, in the above embodiment, it is determined whether or not the display unit subtitle sentence is transmitted to the synchronization detection unit 30 in page units based on the presence or absence of the preliminary timing information. When it is desired to display speech information and non-speech information on the same page as in the case of speech, for example, only a character string sandwiched between predetermined symbols may or may not be transmitted to the synchronization detection unit 30. Specifically, only the character string enclosed by "" (angle brackets) is transmitted to the synchronization detection unit 30, or the character string enclosed by "<>" is transmitted to the synchronization detection unit 30. You should not do it.
[0064]
【The invention's effect】
According to the present invention, of the input data, the page to which the preliminary timing information is added is divided by the proportional distribution method using the preliminary timing information, and the page to which the preliminary timing information is not added is converted to the text by the voice synchronization processing method. By obtaining new timing information required by line feed and page break, subtitle data can be automatically produced from text data including non-speech information. As a result, it is possible to reduce the burden of the subtitle creation operation.
[Brief description of the drawings]
FIG. 1 is a schematic diagram showing a configuration of a subtitle production system according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining an operation of a caption production control unit in the caption production system of FIG. 1;
FIG. 3 is a diagram illustrating an example of input data input to the caption production system of FIG. 1;
FIG. 4 is a diagram showing broadcast subtitle data produced from the input data of FIG. 3;
5 is a diagram showing a text included in other input data input to the caption production system of FIG. 1 and related information related thereto.
6 is a diagram showing a display unit subtitle sentence obtained from text included in other input data in FIG. 5 and its start / end time.
FIG. 7 is a diagram showing still another example of input data input to the caption production system of FIG. 1;
8 is a diagram showing subtitle data for broadcast produced from the input data of FIG. 7;
FIG. 9 is an example of a text including speech information and non-speech information.
FIG. 10 is a diagram showing XML data generated from the text of FIG. 9;
11 is a diagram showing broadcast subtitle data obtained by inputting the XML data of FIG. 10 to the subtitle production system of FIG. 1;
FIG. 12 is another example of a text including speech information and non-speech information.
FIG. 13 is a diagram showing XML data in a state where the manually input preliminary timing information and the automatically added additional timing information do not match.
FIG. 14 is a diagram showing XML data in a state where preliminary timing information manually input and additional timing information automatically added match.
[Explanation of symbols]
10 Subtitle production control unit
11 Production unit extractor
12 Display unit subtitle conversion unit
13 Timing giving section
20 Recording and playback unit
30 Synchronization detection system
40 Morphological analyzer
50 storage unit
60 Monitoring monitor
100 digitized manuscript (input data)
110 processing unit subtitle data
120 display unit subtitle data
130 Subtitle data for broadcasting

Claims

Based on the input data including the text corresponding to the audio of the material program and the reproduction audio and reproduction time code of the material program, the display unit subtitles and the page breaks of the text for subtitle display are displayed. In a subtitle production system for automatically producing broadcast subtitle data including display unit subtitle data and timing information for determining the timing of displaying each page,
A subtitle production system, wherein data including both a page to which preliminary timing information is added and a page to which preliminary timing information is not added can be used as the input data.

From input data including text corresponding to the sound of the material program, display unit subtitle sentence data obtained by line-feeding and page-breaking the text so as to become a display unit subtitle sentence is generated, and the display unit subtitle sentence data and Generating, based on the generated audio and the reproduction time code of the material program, timing information for determining a timing at which the display unit subtitle sentence is displayed on a screen, including the display unit subtitle sentence data and the timing information In a subtitle production system that automatically produces broadcast subtitle data,
Of the input data, for a page to which preliminary timing information has been added, when new timing information is needed due to line breaks and page breaks of the text included in the page, the distribution using the preliminary timing information is performed. Generate the new timing information by the method,
Of the input data, for a page to which preliminary timing information has not been added, when new timing information is required due to line feed / page break of the text included in the page, the display unit subtitle sentence data and New timing information was generated by an automatic audio synchronization process for time synchronization with the playback audio,
Caption production system characterized by the following.

From input data including text corresponding to the sound of the material program, display unit subtitle sentence data obtained by line-feeding and page-breaking the text so as to become a display unit subtitle sentence is generated, and the display unit subtitle sentence data and Generating, based on the generated audio and the reproduction time code of the material program, timing information for determining a timing at which the display unit subtitle sentence is displayed on a screen, including the display unit subtitle sentence data and the timing information In a subtitle production method for automatically producing broadcast subtitle data,
Of the input data, for a page to which preliminary timing information has been added, after the text included in the page is line-feeded and page-breaked so as to become the display unit subtitle sentence, by a proportional distribution method using the preliminary timing information. Generate new timing information as needed,
In the input data, for a page to which the preliminary timing information is not added, the text included in the page is subjected to line feed / page break so as to become the display unit subtitle sentence, and then the display unit subtitle sentence data and the reproduction New timing information is generated as needed by automatic audio synchronization processing that takes time synchronization with audio,
A subtitle production method characterized by the following.

A computer generates input unit subtitle sentence data obtained by line-feeding and page-breaking the text so as to become a display unit subtitle sentence from input data including text corresponding to the audio of the material program, and Based on the sentence data and the generated audio and reproduction time code of the material program, the display unit generates timing information indicating the timing at which the display unit subtitle sentence is displayed on the screen, and the display unit subtitle sentence data and the timing information In a subtitle production program that automatically produces broadcast subtitle data including
Of the input data, for a page to which preliminary timing information has been added, after the text included in the page is line-feeded and page-breaked so as to become the display unit subtitle sentence, by a proportional distribution method using the preliminary timing information. Generate new timing information as needed,
In the input data, for a page to which the preliminary timing information is not added, the text included in the page is subjected to line feed / page break so as to become the display unit subtitle sentence, and then the display unit subtitle sentence data and the reproduction New timing information is generated as needed by automatic audio synchronization processing that takes time synchronization with audio,
Caption production program characterized by the following.