JP2023073901A

JP2023073901A - Information processing device, information processing method, and information processing program

Info

Publication number: JP2023073901A
Application number: JP2021186655A
Authority: JP
Inventors: 靖之潮; Yasuyuki Ushio; 圭太萬上; Keita Manjo
Original assignee: Toshiba Corp; Toshiba Infrastructure Systems and Solutions Corp
Current assignee: Toshiba Corp; Toshiba Infrastructure Systems and Solutions Corp
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2023-05-26

Abstract

To reduce a workload on a caption creation staff when correcting transcription, when superimposing a statement of a performer or the like performing in a program of a broadcast or the like on a screen as a caption.SOLUTION: An information processing device comprises an acquisition unit, a determination unit, and a transmission control unit. The acquisition unit acquires voice data about a program, and schedule data of the program. The determination unit determines whether or not acquisition time of the voice data is included in a time zone of the program, on the basis of the schedule data. The transmission control unit, when it is determined that the acquisition time is included in the time zone of the program, transmits the voice data to a transcription unit which executes transcription of the voice data, and when it is determined that the acquisition time is not included in the time zone of the program, stops the transmission of the voice data to the transcription unit.SELECTED DRAWING: Figure 3

Description

本発明の実施形態は、情報処理装置、情報処理方法、および情報処理プログラムに関する。 TECHNICAL FIELD Embodiments of the present invention relate to an information processing apparatus, an information processing method, and an information processing program.

例えば、報道番組等を生放送する際に、番組に出演する出演者等の発言を字幕（ＣｌｏｓｅｄＣａｐｔｉｏｎ）として画面に重畳させる技術がある。この様な技術では、番組中の出演者等の発言を即時的に文字に起こす必要があるため、文字起こしの際の誤変換が発生することがあり、また、字幕制作担当者の作業負担も大きい。 For example, there is a technology for superimposing statements of performers appearing in the program as closed captions on the screen when live broadcasting a news program or the like. With this kind of technology, since it is necessary to immediately transcribe the statements of the performers, etc. in the program, mistranslations may occur during transcription, and the work burden on the person in charge of subtitle production is also increased. big.

特開２０１３－６８７８３号公報JP 2013-68783 A

本発明が解決しようとする課題の一つは、放送等の番組に出演する出演者等の発言を字幕として画面に重畳させる場合において、文字起こしの校正の際の字幕制作担当者の作業負担を低減することである。 One of the problems to be solved by the present invention is to reduce the workload of the person in charge of producing subtitles when proofreading the transcription when superimposing the words of the performers appearing in a program such as broadcasting on the screen as subtitles. to reduce.

実施形態に係る情報処理装置は、取得部と、判定部と、送信制御部とを備える。取得部は、番組に関する音声データと、前記番組のスケジュールデータとを取得する。判定部は、前記スケジュールデータに基づいて、前記音声データの取得時刻が前記番組の時間帯に含まれるか否かを判定する。送信制御部は、前記取得時刻が前記番組の時間帯に含まれると判定された場合、前記音声データの文字起こしを実行する文字起こし部へ前記音声データを送信し、前記取得時刻が前記番組の時間帯に含まれないと判定された場合、文字起こし部への前記音声データの送信を停止する。 An information processing apparatus according to an embodiment includes an acquisition unit, a determination unit, and a transmission control unit. The acquisition unit acquires audio data related to a program and schedule data of the program. The determination unit determines whether or not the acquisition time of the audio data is included in the time slot of the program based on the schedule data. When the acquisition time is determined to be included in the time slot of the program, the transmission control unit transmits the voice data to a transcription unit that transcribes the voice data, and the acquisition time is within the program. When it is determined that the time period does not fall within the time period, the transmission of the voice data to the transcription unit is stopped.

第１の実施形態に係る情報処理システムの構成の一例を示した図である。It is a figure showing an example of composition of an information processing system concerning a 1st embodiment. 第１の実施形態に係る情報処理装置のハードウェア構成の一例を示した図である。It is a figure showing an example of hardware constitutions of an information processor concerning a 1st embodiment. 第１の実施形態に係る情報処理装置のプロセッサの機能ブロックの一例を示した図である。3 is a diagram showing an example of functional blocks of a processor of the information processing apparatus according to the first embodiment; FIG. 第１の実施形態に係り、放送番組に関するスケジュールデータの一例を示す図である。FIG. 4 is a diagram showing an example of schedule data regarding broadcast programs according to the first embodiment; 第１の実施形態に係る字幕情報生成処理の手順の一例を示す図である。It is a figure which shows an example of the procedure of the caption information production|generation process which concerns on 1st Embodiment. 第１の実施形態に係り、情報処理装置におけるディスプレイに表示されたグラフィカルユーザインターフェースの一例を示す図である。FIG. 3 is a diagram showing an example of a graphical user interface displayed on the display of the information processing apparatus according to the first embodiment; 第２の実施形態に係る情報処理装置のプロセッサの機能ブロックの一例を示した図である。FIG. 10 is a diagram illustrating an example of functional blocks of a processor of an information processing apparatus according to a second embodiment; FIG. 第２の本実施形態に係る字幕情報生成処理の手順の一例を示す図である。FIG. 10 is a diagram showing an example of the procedure of subtitle information generation processing according to the second embodiment;

以下、実施形態に係る情報処理装置、情報処理プログラム、情報処理方法、情報処理システムについて説明する。実施形態に係る情報処理装置等は、報道番組等を放送、配信する際に、番組に出演する出演者等の発言を字幕としてなるべく少ない時間差で対応する画面に重畳させるものである。 An information processing apparatus, an information processing program, an information processing method, and an information processing system according to embodiments will be described below. The information processing apparatus and the like according to the embodiments superimpose the remarks of the performers appearing in the program as subtitles on the corresponding screen with as little time difference as possible when broadcasting or distributing a news program or the like.

なお、以下の各実施形態においては、説明を具体的にするために、無線又は有線によって番組を生放送する場合に用いられる情報処理装置等を例として説明する。しかしながら、実施形態に係る情報処理装置等は、番組を放送する場合に限らず、例えば通信ネットワークにより番組を配信する場合にも利用することもできる。 In each of the following embodiments, an information processing apparatus and the like used for live broadcasting of a program wirelessly or by wire will be described as an example in order to make the description more specific. However, the information processing apparatus and the like according to the embodiments can be used not only when broadcasting a program but also when distributing a program via a communication network, for example.

［第１の実施形態］
図１は、第１の実施形態に係る情報処理システムＳＹの構成の一例を示した図である。図１に示したように、情報処理システムＳＹは、スイッチャ１、情報処理装置３、インサータ５、端末装置６、第１の外部装置７を備える。なお、情報処理装置３は、ネットワークＮを介して端末装置６および第１の外部装置７と通信可能に接続されている。 [First Embodiment]
FIG. 1 is a diagram showing an example of the configuration of an information processing system SY according to the first embodiment. As shown in FIG. 1, the information processing system SY includes a switcher 1, an information processing device 3, an inserter 5, a terminal device 6, and a first external device . The information processing device 3 is connected to the terminal device 6 and the first external device 7 via the network N so as to be able to communicate therewith.

スイッチャ１、情報処理装置３、インサータ５は、例えば放送を送出するマスター室に設けられる。ここで、マスター室とは、放送局に設けられる設備のひとつであり、主調整室とも呼ばれる。マスター室では、放送局内外で撮影した画音データを編集し、番組素材として放送進行表に従い送信局に送り出す処理が実行される。なお、スイッチャ１、情報処理装置３、インサータ５は、マスター室以外にも、例えばリアルタイムに字幕を作成する設備がある部屋、或いはクラウド上に設けることもできる。 The switcher 1, the information processing device 3, and the inserter 5 are provided, for example, in a master room that transmits broadcasts. Here, the master room is one of the facilities provided in a broadcasting station, and is also called a main control room. In the master room, processing is executed to edit image and sound data shot inside and outside the broadcasting station, and to transmit the data as program material to the transmitting station in accordance with the broadcasting progress table. Note that the switcher 1, the information processing device 3, and the inserter 5 can also be provided in a room equipped with real-time subtitle creation equipment, or on the cloud, in addition to the master room.

スイッチャ１は、副調整室（「ニュースサブ室」、「サブ・コントロール・ルーム」とも呼ばれる）の番組制作用装置や中継現場の中継装置等からの画音データを受け取り、操作者の指示に基づいて切り換え処理を実行する。ここで、画音データとは、時間情報によって互いに関連付けされた画像データ及び音声データを含むデータである。スイッチャ１の切り替え処理により、複数の画音データから放送用の画音データが生成される。スイッチャ１は、生成した放送用の画音データを情報処理装置３へ出力する。 The switcher 1 receives image and sound data from a program production device in a sub-control room (also called a "news sub-room" or "sub-control room"), a relay device at a relay site, etc. to execute the switching process. Here, the image and sound data is data including image data and sound data associated with each other by time information. Image/sound data for broadcasting is generated from a plurality of pieces of image/sound data by switching processing of the switcher 1 . The switcher 1 outputs the generated image and sound data for broadcasting to the information processing device 3 .

情報処理装置３は、スイッチャ１から取得した画音データを用いて字幕情報を生成する処理（以下、字幕生成処理と呼ぶ）を実行する。すなわち、情報処理装置３は、スイッチャ１から取得した画音データを画像データと音声データとに分離する。情報処理装置３は、分離した音声データを用いて文字情報を取得する。情報処理装置３は、取得した文字情報を端末装置６へ出力（送信）する。 The information processing device 3 executes processing for generating subtitle information using the image and sound data acquired from the switcher 1 (hereinafter referred to as subtitle generation processing). That is, the information processing device 3 separates the image and sound data acquired from the switcher 1 into image data and sound data. The information processing device 3 acquires character information using the separated voice data. The information processing device 3 outputs (transmits) the acquired character information to the terminal device 6 .

また、情報処理装置３は、端末装置６から受け取った文字情報を符号化し、当該符号化された文字情報と付帯情報とを含む字幕情報を生成する。ここで、付帯情報とは、文字情報を表示する画面上の位置（座標）、文字サイズ、色、スクロールの速さ、画像データと対応付けするための時間情報等を含む情報である。情報処理装置３は、画音データ及び生成した字幕情報をインサータ５へ出力（送信）する。 Further, the information processing device 3 encodes the character information received from the terminal device 6, and generates caption information including the encoded character information and supplementary information. Here, the incidental information is information including the position (coordinates) on the screen where character information is displayed, character size, color, scrolling speed, time information for associating with image data, and the like. The information processing device 3 outputs (transmits) the image/sound data and the generated caption information to the inserter 5 .

なお、例えばクラウド上にある第１の外部装置７に内蔵された音声認識アプリケーションソフトを利用した文字起こし処理により、分離した音声データを用いて文字情報を取得する場合を例とする。音声認識アプリケーションは、第１の外部装置７における文字起こし部において実行される。音声認識アプリケーションソフトは、音声データの文字起こしを実行するプログラム、または音声認識エンジンにより実現される。文字起こしを実行するプログラムまたは音声認識エンジンは既知のものが利用可能であるため、説明は省略する。なお、文字起こし部は、情報処理装置３に搭載されてもよい。 Here, for example, a case where character information is obtained using separated speech data by transcription processing using speech recognition application software built in the first external device 7 on the cloud is taken as an example. A speech recognition application is executed in the transcription unit in the first external device 7 . The speech recognition application software is implemented by a program that transcribes speech data or a speech recognition engine. A known program or speech recognition engine for transcribing can be used, so the description is omitted. Note that the transcription unit may be installed in the information processing device 3 .

なお、情報処理装置３が実行する字幕情報生成処理については、後で詳しく説明する。 Note that the caption information generation processing executed by the information processing device 3 will be described later in detail.

インサータ５は、情報処理装置３から受け取った字幕情報を用いて、スイッチャ１から受け取った画音データに字幕情報を挿入するインサート処理を実行する。インサータ５は、インサート処理が施された画音データを送出局に設けられた送出装置８へ出力する。なお、インサータ５によるインサート処理は画音データの時間情報とは同期しない。以降に示す校正に係る遅延時間が存在する前提である。このため、視聴者の立場からすると、画音の数秒後に字幕情報を視聴できることになる。 The inserter 5 uses the caption information received from the information processing device 3 to perform insert processing for inserting the caption information into the image/sound data received from the switcher 1 . The inserter 5 outputs the inserted image/sound data to the transmission device 8 provided in the transmission station. Note that the insert processing by the inserter 5 is not synchronized with the time information of the image/sound data. This is based on the premise that there is a delay time associated with the calibration described below. Therefore, from the viewer's point of view, the caption information can be viewed several seconds after the image and sound.

端末装置６は、情報処理装置３と通信可能に接続され、字幕制作担当者によって用いられるコンピュータである。端末装置６は、情報処理装置３から取得した文字情報を含む校正用画面を出力（表示）する。字幕制作担当者は、文字情報を含む校正用画面を用いて、文字情報の校正処理を実行することができる。端末装置６は、校正処理が施された文字情報を情報処理装置３へ送信する。 The terminal device 6 is a computer that is communicably connected to the information processing device 3 and used by a person in charge of creating subtitles. The terminal device 6 outputs (displays) a calibration screen including the character information acquired from the information processing device 3 . A person in charge of producing subtitles can use the proofreading screen including the text information to perform proofreading processing of the text information. The terminal device 6 transmits the corrected character information to the information processing device 3 .

第１の外部装置７は、例えばネットワークＮを介して情報処理装置３と通信可能に接続される。第１の外部装置７は、音声認識アプリケーションソフトを内蔵する。第１の外部装置７は、情報処理装置３から受信した音声データを用いて文字起こし処理を実行し、文字情報を生成する。 The first external device 7 is communicably connected to the information processing device 3 via the network N, for example. The first external device 7 incorporates speech recognition application software. The first external device 7 performs transcription processing using the voice data received from the information processing device 3 to generate character information.

（情報処理装置３）
次に、情報処理装置３の構成及び機能について、図２、図３を参照しながら詳しく説明する。 (Information processing device 3)
Next, the configuration and functions of the information processing device 3 will be described in detail with reference to FIGS. 2 and 3. FIG.

図２は、第１の実施形態に係る情報処理装置３のハードウェア構成の一例を示した図である。情報処理装置３は、通常のコンピュータと同様のハードウェア構成を有している。すなわち、図２に示したように、情報処理装置３は、プロセッサ３０１、主記憶装置３０３、デバイスインタフェース３０５、補助記憶装置３０７、ネットワークインタフェース３０９を備える。 FIG. 2 is a diagram showing an example of the hardware configuration of the information processing device 3 according to the first embodiment. The information processing device 3 has a hardware configuration similar to that of a normal computer. That is, as shown in FIG. 2, the information processing device 3 includes a processor 301 , a main storage device 303 , a device interface 305 , an auxiliary storage device 307 and a network interface 309 .

プロセッサ３０１は、情報処理装置３、及び情報処理装置３に接続された第２の外部装置９の統括的な制御を行う処理回路である。具体的には、プロセッサ３０１は、コンピュータの制御装置及び演算装置を含む電子回路としてのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等である。また、プロセッサ３０１は、光論理素子を用いた光回路により実現されてもよい。プロセッサ３０１は、後述の情報処理プログラムを主記憶装置３０３または補助記憶装置３０７から読み出し、自身のメモリに展開する。プロセッサ３０１は、展開された情報処理プログラムに従って、例えば、情報処理装置３における各種ハードウェアを制御する。 The processor 301 is a processing circuit that performs overall control of the information processing device 3 and the second external device 9 connected to the information processing device 3 . Specifically, the processor 301 includes a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), and an FPGA (Field Programmable Gate) as electronic circuits including a computer control device and an arithmetic device. array ), etc. Also, the processor 301 may be realized by an optical circuit using optical logic elements. The processor 301 reads an information processing program, which will be described later, from the main storage device 303 or the auxiliary storage device 307, and develops it in its own memory. The processor 301 controls, for example, various hardware in the information processing device 3 according to the developed information processing program.

主記憶装置３０３は、プロセッサ３０１が実行する命令及び各種データ等を記憶する記憶装置であり、主記憶装置３０３に記憶された情報がプロセッサ３０１により読み出される。より具体的には、主記憶装置３０３或いは補助記憶装置３０７は、スイッチャ１から取得した画音データ、後述する字幕情報生成処理を実現するためのプログラムを記憶する。また、主記憶装置３０３は、放送番組に関するスケジュールデータを記憶する。スケジュールデータについては後ほど説明する。なお、スケジュールデータは、補助記憶装置３０７に記憶されてもよい。 The main storage device 303 is a storage device that stores instructions and various data executed by the processor 301 , and information stored in the main storage device 303 is read by the processor 301 . More specifically, the main storage device 303 or the auxiliary storage device 307 stores image and sound data acquired from the switcher 1 and a program for realizing subtitle information generation processing, which will be described later. The main storage device 303 also stores schedule data relating to broadcast programs. Schedule data will be explained later. Note that the schedule data may be stored in the auxiliary storage device 307 .

デバイスインタフェース３０５は、バスＢを介して、第２の外部装置９とプロセッサ３０１とを直接的または間接的に接続する。なお、デバイスインタフェース３０５は、ＵＳＢ等の接続端子を有していてもよい。また、デバイスインタフェース３０５には、接続端子を介して、外部記憶媒体や記憶装置（メモリ）などが接続されてもよい。 The device interface 305 connects the second external device 9 and the processor 301 via the bus B directly or indirectly. Note that the device interface 305 may have a connection terminal such as a USB. Also, the device interface 305 may be connected to an external storage medium, a storage device (memory), or the like via a connection terminal.

補助記憶装置３０７は、主記憶装置３０３以外の記憶装置である。 Auxiliary storage device 307 is a storage device other than main storage device 303 .

なお、主記憶装置３０３、補助記憶装置３０７は、電子情報を格納可能な任意の電子部品を意味するものとし、半導体のメモリでもよい。典型的には、主記憶装置３０３、補助記憶装置３０７は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリ等の半導体メモリ素子、ハードディスク、光ディスク等によって構成される。主記憶装置３０３、補助記憶装置３０７は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、メモリ及びＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）などの可搬型メディアによって構成されてもよい。 The main storage device 303 and the auxiliary storage device 307 mean arbitrary electronic components capable of storing electronic information, and may be semiconductor memories. Typically, the main storage device 303 and the auxiliary storage device 307 are composed of a RAM (Random Access Memory), a semiconductor memory device such as a flash memory, a hard disk, an optical disk, or the like. The main storage device 303 and the auxiliary storage device 307 may be composed of portable media such as USB (Universal Serial Bus), memory, and DVD (Digital Versatile Disk).

ネットワークインタフェース３０９は、無線又は有線により、ネットワークＮに接続するためのインタフェースである。ネットワークインタフェース３０９は、ネットワークＮを介して第１の外部装置７と情報の送受信を行うことができる。 A network interface 309 is an interface for connecting to the network N wirelessly or by wire. The network interface 309 can transmit and receive information to and from the first external device 7 via the network N. FIG.

第２の外部装置９は、例えば、入力装置（マイク、キーボード、マウス、又はタッチパネル等）、出力装置（ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）、有機ＥＬ（ＥｌｅｃｔｒｏＬｕｍｉｎｅｓｃｅｎｃｅ）パネル等の表示装置）、記憶装置（メモリ）である。 The second external device 9 includes, for example, an input device (microphone, keyboard, mouse, touch panel, etc.), an output device (LCD (Liquid Crystal Display), a display device such as an organic EL (Electro Luminescence) panel), a storage device ( memory).

図３は、第１の実施形態に係る情報処理装置３のプロセッサ３０１の機能ブロックの一例を示した図である。図３に示したように、プロセッサ３０１は、取得部３１と、判定部３３と、送信制御部３５とを有する。取得部３１により実行される機能（以下、取得機能と呼ぶ）と、判定部３３により実行される機能（以下、判定機能と呼ぶ）と、送信制御部３５により実行される機能（以下、送信制御機能と呼ぶ）とは、例えば、コンピュータによって実行可能なプログラムの形態で、主記憶装置３０３または補助記憶装置３０７に記憶される。すなわち、取得機能と判定機能と送信制御機能とをそれぞれ実行する情報処理プログラムは、主記憶装置３０３または補助記憶装置３０７に記憶される。なお、情報処理プログラムは、第１の外部装置７に記憶されてもよい。 FIG. 3 is a diagram showing an example of functional blocks of the processor 301 of the information processing device 3 according to the first embodiment. As shown in FIG. 3, the processor 301 has an acquisition unit 31, a determination unit 33, and a transmission control unit . A function executed by the acquisition unit 31 (hereinafter referred to as an acquisition function), a function executed by the determination unit 33 (hereinafter referred to as a determination function), and a function performed by the transmission control unit 35 (hereinafter referred to as transmission control function) is stored in the main storage device 303 or the auxiliary storage device 307 in the form of a computer-executable program, for example. That is, the information processing programs for executing the acquisition function, the determination function, and the transmission control function are stored in the main storage device 303 or the auxiliary storage device 307 . Note that the information processing program may be stored in the first external device 7 .

取得部３１は、番組に関する音声データと、番組のスケジュールデータとを取得する。具体的には、取得部３１は、スイッチャ１により分離された音声データと当該音声データに対応する画像データとを、スイッチャ１から取得する。また、取得部３１は、主記憶装置３０３、補助記憶装置３０７、または第２の記憶装置から、当該放送に関する番組のスケジュールデータを取得する。また、取得部３１は、第１の外部装置７の文字起こし部において音声データから文字起こしされたデータ（以下、文字起こしデータと呼ぶ）を取得する。なお、文字起こし部が情報処理装置３に搭載されている場合、取得部３１は、文字起こし部により生成された文字起こしデータを、文字起こし部から取得してもよい。 Acquisition unit 31 acquires audio data about a program and schedule data of the program. Specifically, the acquisition unit 31 acquires the audio data separated by the switcher 1 and the image data corresponding to the audio data from the switcher 1 . Acquisition unit 31 also acquires schedule data of programs related to the broadcast from main storage device 303, auxiliary storage device 307, or the second storage device. The acquisition unit 31 also acquires data transcribed from the voice data in the transcription unit of the first external device 7 (hereinafter referred to as transcribed data). Note that when the transcription unit is installed in the information processing device 3, the acquisition unit 31 may acquire the transcription data generated by the transcription unit from the transcription unit.

図４は、放送番組に関するスケジュールデータＳＤの一例を示す図である。図４に示すように、スケジュールデータＳＤは、放送における番組のコーナーの時間的順序と番組の途中に放送される広告の時間的順序とを有する。図４に示すように、スケジュールデータＳＤは、例えば、放送日、番組開始時刻、番組終了時刻、番組名、音声モード、放送に用いられる装置名またはスタジオ名、素材名（番組コーナー、ＣＭ（コマーシャルメッセージ：ＣｏｍｍｅｒｃｉａｌＭｅｓｓａｇｅ、広告））、素材の放送開始予定時刻などを有する。図４に示すスケジュールデータＳＤは一例であって、実施には、より多くの各種情報が記載される。また、図４に示すスケジュールデータＳＤのフォーマットは、放送局によって異なる。 FIG. 4 is a diagram showing an example of schedule data SD regarding broadcast programs. As shown in FIG. 4, the schedule data SD has the temporal order of program corners in the broadcast and the temporal order of advertisements broadcast during the program. As shown in FIG. 4, the schedule data SD includes, for example, broadcast date, program start time, program end time, program name, audio mode, device name or studio name used for broadcasting, material name (program segment, CM (commercial) Messages: Commercial Messages, advertisements)), scheduled broadcast start times of materials, and the like. The schedule data SD shown in FIG. 4 is an example, and more various information is described in the implementation. Also, the format of the schedule data SD shown in FIG. 4 differs depending on the broadcasting station.

例えば、図４に示すスケジュールデータＳＤの１行目には、放送日が記載される。また、図４に示すスケジュールデータＳＤの２行目には、番組開始時刻、番組終了時刻、番組名、音声モード（ステレオ：Ｓ）が記載される。図４に示すスケジュールデータＳＤの３行目には、４時００分００秒からＰＧという装置で素材ＸＸＸ（オープニングコーナーなど）がオンエアーされることが記載されている。図４に示すスケジュールデータＳＤの４行目には、４時３０分００秒からＣＭという装置で、素材ＡＡＡ（ＣＭ）がオンエアーされることが記載されている。以降、４時３０分１５秒、４時３０分３０秒、および４時３０分４５秒からＣＭという装置で、素材ＢＢＢ（ＣＭ）、素材ＣＣＣ（ＣＭ）、素材ＤＤＤ（ＣＭ）がそれぞれオンエアーされることが記載されている。すなわち、４時３０分００秒から１５秒間隔で、４本のＣＭがオンエアーされることが記載されている。 For example, the broadcast date is described in the first line of the schedule data SD shown in FIG. The second line of the schedule data SD shown in FIG. 4 describes the program start time, program end time, program name, and audio mode (stereo: S). The third line of the schedule data SD shown in FIG. 4 describes that the material XXX (opening corner, etc.) will be on air from 4:00:00 on the device PG. The fourth line of the schedule data SD shown in FIG. 4 describes that the material AAA (CM) will be on air from 4:30:00 on the device named CM. After that, from 4:30:15, 4:30:30, and 4:30:45, the material BBB (CM), the material CCC (CM), and the material DDD (CM) will be on air, respectively. It is stated that That is, it states that four commercials will be on air at 15-second intervals from 4:30:00.

また、図４に示すスケジュールデータＳＤの８行目には、４時３１分００秒からスタジオＬ１でニュース１コーナーがオンエアーされることが記載されている。また、図４に示すスケジュールデータＳＤの９行目には、時刻が未定であるが、ユーザによるテイク操作で素材ＥＥＥがオンエアーされることが記載されている。また、図４に示すスケジュールデータＳＤの１０行目には、時刻が未定であるが、ＣＭ明けにスタジオＬ１でニュース２コーナーがオンエアーされることが記載されている。 Also, in the eighth line of the schedule data SD shown in FIG. 4, it is described that the news section 1 will be on air at studio L1 from 4:31:00. Further, the ninth line of the schedule data SD shown in FIG. 4 describes that the material EEE will be on-air by the user's take operation, although the time is undecided. Further, the 10th line of the schedule data SD shown in FIG. 4 describes that the news section 2 will be on air at the studio L1 after the CM, although the time is undecided.

また、図４に示すスケジュールデータＳＤの１１行目と１２行目には、４時５０分００秒から１５秒間隔で２本がオンエアーされることが記載されている。また、図４に示すスケジュールデータＳＤの１３行目には、ＷＮという装置で、天気コーナーがオンエアーされることが記載されている。また、図４に示すスケジュールデータＳＤの１４行目から１７行目には、４時５９分００秒から１５秒間隔で４本のＣＭ（素材ＧＧＧ、素材ＨＨＨ、素材ＩＩＩ、素材ＪＪＪ）がオンエアーされることが記載されている。また、図４に示すスケジュールデータＳＤの１８行目には、５時００分００秒からオンエアーされる次の番組名が記載されている。 Also, in the 11th and 12th lines of the schedule data SD shown in FIG. 4, it is described that two programs will be on air at 15-second intervals from 4:50:00. Also, in the 13th line of the schedule data SD shown in FIG. 4, it is described that the weather corner will be on air by the device WN. In addition, in the 14th to 17th lines of the schedule data SD shown in FIG. 4, four commercials (Material GGG, Material HHH, Material III, Material JJJ) were aired at 15-second intervals from 4:59:00. It is stated that The 18th line of the schedule data SD shown in FIG. 4 describes the name of the next program to be aired from 5:00:00.

判定部３３は、スケジュールデータＳＤに基づいて、取得された音声データが番組中であるか否かを判定する。すなわち、判定部３３は、取得部３１により音声データ（および当該音声データに対応する画像データ）が取得された時刻（以下、取得時刻と呼ぶ）が番組の時間帯に含まれるか否かを判定する。番組の時間帯とは、番組が放送されている時間帯（以下、放送時間帯と呼ぶ）であって、ＣＭを抜いた時間帯である。なお、放送時間帯は、ＣＭだけでなく、字幕制作のために文字起こしを行わない時間帯として、提供中、番宣中、ＶＴＲ中などの時間帯を抜いても良い。より詳細には、判定部３３は、「ＰＧ」「ＣＭ］「Ｌ１」「ＷＮ」などの装置名により文字起こしの可否を決定できる。例えば、判定部３３は、取得時刻とスケジュールデータＳＤにおける時刻とを照会する。これにより、判定部３３は、取得時刻がスケジュールデータＳＤにおける番組の放送時間帯に含まれるか否かを判定する。 The determination unit 33 determines whether or not the acquired audio data is being broadcast based on the schedule data SD. That is, the determining unit 33 determines whether or not the time when the audio data (and the image data corresponding to the audio data) was acquired by the acquiring unit 31 (hereinafter referred to as acquisition time) is included in the time slot of the program. do. The time zone of the program is the time zone during which the program is broadcast (hereinafter referred to as the broadcast time zone), excluding commercials. The broadcasting time zone may be not limited to commercials, but may include time zones such as during provision, during program, during VTR, etc., as a time zone during which transcription is not performed for the production of subtitles. More specifically, the determination unit 33 can determine whether or not transcription is possible based on device names such as "PG", "CM", "L1", and "WN". For example, the determination unit 33 inquires the acquisition time and the time in the schedule data SD. Thereby, the determination unit 33 determines whether or not the acquisition time is included in the broadcast time zone of the program in the schedule data SD.

送信制御部３５は、取得時刻が番組の放送時間帯に含まれると判定された場合、音声データの文字起こしを実行する文字起こし部へ音声データを送信する。具体的には、送信制御部３５は、取得時刻が番組の放送時間帯に含まれる場合、文字起こし部を有する第１の外部装置７へ、取得された音声データを送信する。 When it is determined that the acquisition time is included in the broadcast time slot of the program, the transmission control unit 35 transmits the audio data to the transcription unit that transcribes the audio data. Specifically, the transmission control unit 35 transmits the acquired audio data to the first external device 7 having the transcription unit when the acquisition time is included in the broadcasting time slot of the program.

送信制御部３５は、取得時刻が番組の放送時間帯に含まれないと判定された場合、文字起こし部への音声データの送信を停止する。具体的には、送信制御部３５は、取得時刻が番組の放送時間帯に含まれる場合、文字起こし部を有する第１の外部装置７への音声データの送信を停止する。より詳細には、送信制御部３５は、スケジュールデータＳＤにおける広告の時間帯に取得時刻が含まれる場合、文字起こし部への音声データの送信を停止する。 The transmission control unit 35 stops transmitting the audio data to the transcription unit when it is determined that the acquisition time is not included in the broadcast time slot of the program. Specifically, the transmission control unit 35 stops transmitting the audio data to the first external device 7 having the transcription unit when the acquisition time is included in the broadcasting time slot of the program. More specifically, the transmission control unit 35 stops transmitting the voice data to the transcription unit when the acquisition time is included in the advertisement time period in the schedule data SD.

以下、本実施形態に係る字幕情報生成処理について説明する。図５は、字幕情報生成処理の手順の一例を示す図である。 Subtitle information generation processing according to the present embodiment will be described below. FIG. 5 is a diagram illustrating an example of the procedure of subtitle information generation processing.

（字幕情報生成処理）
（ステップＳ５０１）
取得部３１は、音声データとスケジュールデータＳＤとを取得する。なお、取得部３１は、音声データの取得に先立って、スケジュールデータＳＤを取得してもよい。 (Caption information generation processing)
(Step S501)
Acquisition unit 31 acquires audio data and schedule data SD. Note that the acquisition unit 31 may acquire the schedule data SD prior to acquiring the audio data.

（ステップＳ５０２）
判定部３３は、スケジュールデータＳＤに基づいて、取得された音声データが番組中であるか否かを判定する。取得された音声データの取得時刻がスケジュールデータＳＤにおける番組の放送時間帯に含まれない場合（ステップＳ５０２のＮｏ）、ステップＳ５０３の処理が実行される。換言すれば、音声データの取得時刻がスケジュールデータＳＤにおける広告の放送時間帯に含まれる場合、ステップＳ５０３の処理が実行される。 (Step S502)
The determination unit 33 determines whether or not the acquired audio data is being broadcast based on the schedule data SD. If the acquisition time of the acquired audio data is not included in the broadcast time slot of the program in the schedule data SD (No in step S502), the process of step S503 is executed. In other words, when the acquisition time of the audio data is included in the advertisement broadcast time slot in the schedule data SD, the process of step S503 is executed.

取得された音声データの取得時刻がスケジュールデータＳＤにおける番組の放送時間帯に含まれる場合（ステップＳ５０２のＹｅｓ）、ステップＳ５０４の処理が実行される。換言すれば、音声データの取得時刻がスケジュールデータＳＤにおける広告の放送時間帯に含まれない場合、ステップＳ５０４の処理が実行される。 If the acquisition time of the acquired audio data is included in the broadcast time slot of the program in the schedule data SD (Yes in step S502), the process of step S504 is executed. In other words, if the acquisition time of the audio data is not included in the advertisement broadcast time slot in the schedule data SD, the process of step S504 is executed.

（ステップＳ５０３）
送信制御部３５は、文字起こし部を有する第１の外部装置７への音声データの送信を停止する。なお、文字起こし部が情報処理装置３に搭載されている場合、送信制御部３５は、取得された音声データの文字起こし部への出力を停止する。 (Step S503)
The transmission control unit 35 stops transmission of voice data to the first external device 7 having the transcription unit. Note that if the transcription unit is installed in the information processing device 3, the transmission control unit 35 stops outputting the acquired voice data to the transcription unit.

（ステップＳ５０４）
送信制御部３５は、取得された音声データを、文字起こし部を有する第１の外部装置７へ送信する。なお、文字起こし部が情報処理装置３に搭載されている場合、送信制御部３５は、取得された音声データを文字起こし部へ出力する。 (Step S504)
The transmission control unit 35 transmits the acquired voice data to the first external device 7 having a transcription unit. It should be noted that when the transcription unit is installed in the information processing device 3, the transmission control unit 35 outputs the acquired voice data to the transcription unit.

（ステップＳ５０５）
取得部３１は、文字起こしの結果である第１の外部装置７から文字起こしデータを取得する。送信制御部３５は、取得された文字起こしデータを端末装置６へ送信する。このとき、端末装置６は、文字起こしデータによる文字情報を含む字幕の校正用画面を表示する。字幕制作担当者は、表示された校正用画面を用いて、文字情報の校正処理を実行する。校正処理の完了に応答して、端末装置６は、校正処理が施された文字情報を情報処理装置３へ送信する。 (Step S505)
The acquisition unit 31 acquires transcription data from the first external device 7, which is the transcription result. The transmission control unit 35 transmits the acquired transcription data to the terminal device 6 . At this time, the terminal device 6 displays a caption proofreading screen including character information based on the transcription data. The person in charge of subtitle production uses the displayed proofreading screen to proofread the character information. In response to the completion of the proofreading process, the terminal device 6 transmits the character information subjected to the proofreading process to the information processing device 3 .

（ステップＳ５０６）
取得部３１は、端末装置６から文字情報を取得する。情報処理装置３における不図示の字幕情報生成部は、端末装置６から受け取った文字情報を符号化し、当該符号化された文字情報と付帯情報とを含む字幕情報を生成する。送信制御部３５は、画音データ及び生成した字幕情報をインサータ５へ出力（送信）する。インサータ５は、情報処理装置３から受け取った画音データ及び字幕情報を用いて、字幕情報を画音データに挿入するインサート処理を実行し、インサート処理が施された画音データを送出局に設けられた送出装置８へ出力する。 (Step S506)
The acquisition unit 31 acquires character information from the terminal device 6 . A caption information generating unit (not shown) in the information processing device 3 encodes the character information received from the terminal device 6 and generates caption information including the encoded character information and supplementary information. The transmission control unit 35 outputs (transmits) the image/sound data and the generated caption information to the inserter 5 . The inserter 5 uses the image/sound data and caption information received from the information processing device 3 to execute insert processing for inserting the caption information into the image/sound data, and provides the image/sound data subjected to the insert processing to the transmitting station. output to the sending device 8.

（ステップＳ５０７）
判定部３３は、現在時刻とスケジュールデータＳＤとに基づいて、番組終了の有無を判定する。具体的には、判定部３３は、現在時刻をスケジュールデータＳＤと照合することにより、番組終了の有無を判定する。番組が終了していなければ（ステップＳ５０７のＮｏ）、ステップＳ５０８の処理が実行される。番組が終了していれば（ステップＳ５０７のＹｅｓ）、字幕情報生成処理は終了する。 (Step S507)
The determination unit 33 determines whether or not the program has ended based on the current time and the schedule data SD. Specifically, the determination unit 33 determines whether or not the program has ended by comparing the current time with the schedule data SD. If the program has not ended (No in step S507), the process of step S508 is executed. If the program has ended (Yes in step S507), the caption information generation process ends.

（ステップＳ５０８）
取得部３１は、音声データを取得する。続いて、ステップＳ５０２の処理が再度実行される。 (Step S508)
Acquisition unit 31 acquires audio data. Subsequently, the process of step S502 is executed again.

なお、本実施形態の変形例として、情報処理装置３におけるディスプレイに表示されたグラフィカルユーザインターフェース（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）に対するユーザの入力に応じて、送信制御部３５は、文字起こし部を有する第１の外部装置７への音声データの送信を停止してもよい。 As a modified example of the present embodiment, the transmission control unit 35 causes the first transcribing unit having a transcription unit according to the user's input to the graphical user interface displayed on the display of the information processing device 3. Transmission of audio data to the external device 7 may be stopped.

図６は、情報処理装置３におけるディスプレイに表示されたグラフィカルユーザインターフェースＤＰの一例を示す図である。図６において、「フタ」に対応するボタンＬＢは、ステップ５０２の処理において、取得された音声データの取得時刻がスケジュールデータＳＤにおける番組の放送時間帯に含まれない場合（ステップＳ５０２のＮｏ）、ハイライトで表示される。このとき、文字起こし部を有する第１の外部装置７への音声データの送信は、フタをされることとなる。すなわち、送信制御部３５は、第１の外部装置７への音声データの送信を停止する。また、ユーザによる操作により「フタ」に対応するボタンＬＢが押下されると、「フタ」に対応するボタンＬＢは、ハイライトで表示される。この場合においても、送信制御部３５は、第１の外部装置７への音声データの送信を停止する。 FIG. 6 is a diagram showing an example of the graphical user interface DP displayed on the display of the information processing device 3. As shown in FIG. In FIG. 6, the button LB corresponding to "cover" is pressed when the acquisition time of the acquired audio data is not included in the broadcast time zone of the program in the schedule data SD in the process of step 502 (No in step S502). displayed in highlight. At this time, the transmission of the voice data to the first external device 7 having the transcription unit is closed. That is, the transmission control unit 35 stops transmitting audio data to the first external device 7 . Further, when the user presses the button LB corresponding to the "lid", the button LB corresponding to the "lid" is highlighted. In this case as well, the transmission control unit 35 stops transmitting audio data to the first external device 7 .

以上のように、本実施形態に係る情報処理装置３は、放送に関する音声データと、放送に関する番組のスケジュールデータＳＤとを取得し、スケジュールデータＳＤに基づいて、音声データの取得時刻が番組の放送時間帯に含まれるか否かを判定し、取得時刻が放送時間帯に含まれると判定された場合、音声データの文字起こしを実行する文字起こし部へ音声データを送信し、取得時刻が前記放送時間帯に含まれないと判定された場合、文字起こし部への音声データの送信を停止する。また、本情報処理装置３において、スケジュールデータＳＤは、放送における番組のコーナーの時間的順序と番組の途中に放送される広告の時間的順序とを有し、本情報処理装置３は、取得時刻が広告の時間帯に含まれる場合、文字起こし部への音声データの送信を停止する。 As described above, the information processing apparatus 3 according to the present embodiment acquires the audio data relating to broadcasting and the schedule data SD of the program relating to broadcasting, and based on the schedule data SD, the acquisition time of the audio data corresponds to the time of the program broadcasting. It is determined whether or not the acquisition time is included in the broadcast time zone, and if it is determined that the acquired time is included in the broadcast time zone, the voice data is transmitted to a transcription unit that transcribes the voice data, and the acquisition time is determined to be within the broadcast time zone. If it is determined that the time is not included in the time period, the transmission of the voice data to the transcription unit is stopped. Further, in the information processing device 3, the schedule data SD has the time sequence of program corners in the broadcast and the time sequence of advertisements broadcast during the program. is included in the time period of the advertisement, the transmission of voice data to the transcription unit is stopped.

このような構成によれば、本実施形態に係る情報処理装置３は、番組および／またはＣＭの開始／終了を判定することで、文字起こしに関する動作を番組の放送時間帯に限定することができる。これにより、本情報処理装置３によれば、放送のスケジュールデータＳＤを取得することで、番組に対応した文字起こしを実現することができる。以上のことから、本情報処理装置３によれば、放送等の番組に出演する出演者等の発言を字幕として画面に重畳させる場合において、文字起こしの校正の際の字幕制作担当者の作業負担を低減することができる。 According to such a configuration, the information processing apparatus 3 according to the present embodiment can limit transcription-related operations to the broadcast time of the program by determining the start/end of the program and/or CM. . Thus, according to the information processing device 3, by acquiring the broadcast schedule data SD, transcription corresponding to the program can be realized. From the above, according to the information processing device 3, when the remarks of the performers appearing in a program such as broadcasting are superimposed on the screen as subtitles, the work load of the person in charge of subtitle production when proofreading the transcription can be reduced. can be reduced.

第１の実施形態における技術的思想を情報処理方法で実現する場合、当該情報処理方法は、放送に関する音声データと、当該放送に関する番組のスケジュールデータとを取得し、スケジュールデータに基づいて、音声データの取得時刻が当該番組の放送時間帯に含まれるか否かを判定し、取得時刻が当該放送時間帯に含まれると判定された場合、音声データの文字起こしを実行する文字起こし部へ音声データを送信し、取得時刻が当該放送時間帯に含まれないと判定された場合、文字起こし部への音声データの送信を停止する。本情報処理方法による字幕情報生成処理の手順および効果は、第１の実施形態における記載と同様なため、説明は省略する。 When the technical idea of the first embodiment is realized by an information processing method, the information processing method acquires audio data related to broadcasting and program schedule data related to the broadcasting, and based on the schedule data, generates audio data. determines whether or not the acquisition time is included in the broadcast time slot of the program, and if it is determined that the acquisition time is included in the broadcast time slot, the audio data is transmitted, and when it is determined that the acquired time is not included in the broadcast time slot, the transmission of the voice data to the transcription unit is stopped. The procedure and effect of the subtitle information generation process by this information processing method are the same as those described in the first embodiment, and thus the description is omitted.

第１の実施形態における技術的思想を情報処理プログラムで実現する場合、当該情報処理プログラムは、コンピュータに、放送に関する音声データと、当該放送に関する番組のスケジュールデータとを取得し、スケジュールデータに基づいて、音声データの取得時刻が当該番組の放送時間帯に含まれるか否かを判定し、取得時刻が当該放送時間帯に含まれると判定された場合、音声データの文字起こしを実行する文字起こし部へ音声データを送信し、取得時刻が当該放送時間帯に含まれないと判定された場合、文字起こし部への音声データの送信を停止すること、を実現させる。 When the technical idea of the first embodiment is realized by an information processing program, the information processing program acquires audio data related to broadcasting and schedule data of programs related to the broadcasting into a computer, and based on the schedule data , a transcription unit that determines whether or not the acquisition time of the audio data is included in the broadcast time slot of the program, and transcribes the audio data if it is determined that the acquisition time is included in the broadcast time slot. When it is determined that the acquisition time is not included in the broadcast time slot, the transmission of the voice data to the transcription unit is stopped.

例えば、マスター室に配置されるサーバ装置やクラウド上でのサーバ装置などに当該情報処理プログラムをインストールして、当該プログラムをメモリ上で展開することによっても、字幕情報生成処理を実現することができる。このとき、コンピュータに字幕情報生成処理を実行させることのできるプログラムは、磁気ディスク（ハードディスクなど）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤなど）、半導体メモリなどの記憶媒体に格納して頒布することも可能である。情報処理プログラムによる字幕情報生成処理の手順および効果は、第１の実施形態と同様なため、説明は省略する。 For example, the caption information generation process can be realized by installing the information processing program in a server device placed in the master room or a server device on the cloud, and deploying the program on the memory. . At this time, the program that allows the computer to execute the caption information generation process can be distributed by storing it in a storage medium such as a magnetic disk (hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, etc. is. The procedure and effect of the subtitle information generation process by the information processing program are the same as those of the first embodiment, and thus description thereof is omitted.

［第２の実施形態］
第２の実施形態において、情報処理システムの構成および情報処理装置のハードウェア構成は、第１の実施形態と同様なため、説明は省略する。本実施形態と第１の実施形態との相違は、放送の番組情報に基づいて、音声データの文字起こしに適用される音声認識辞書を特定し、特定された音声認識辞書を音声データの文字起こしに適用する情報（以下、適用情報と呼ぶ）を、文字起こし部へ送信することにある。音声認識辞書の適用に関する適用情報は、番組情報における番組のコーナーの種別に対応付けられて、主記憶装置３０３または補助記憶装置３９７に記憶される。すなわち、音声認識辞書は、番組情報における番組のコーナーの種別に対応付けられ、音声データの文字起こしに適用される辞書である。例えば、音声認識辞書は、文字起こし部において、例えば、音声データの文字起こしの結果の平仮名を、形態素に応じて漢字に変換するときに用いられる。音声認識辞書は、例えば、文字起こし部を有する第１の外部装置７のメモリに、番組のコーナーの種別に応じて記憶される。なお、文字起こし部が情報処理装置３に搭載される場合、音声認識辞書は、例えば、主記憶装置３０３または補助記憶装置３９７に、番組のコーナーの種別に応じて記憶される。 [Second embodiment]
In the second embodiment, the configuration of the information processing system and the hardware configuration of the information processing apparatus are the same as those in the first embodiment, so description thereof will be omitted. The difference between this embodiment and the first embodiment is that a speech recognition dictionary to be applied to transcription of voice data is specified based on broadcast program information, and the specified voice recognition dictionary is used to transcribe voice data. (hereinafter referred to as application information) to be applied to the transcription unit. Application information relating to application of the voice recognition dictionary is stored in the main storage device 303 or the auxiliary storage device 397 in association with the type of program section in the program information. That is, the speech recognition dictionary is a dictionary that is associated with the type of corner of the program in the program information and that is applied to transcription of the speech data. For example, the speech recognition dictionary is used in a transcription section to convert hiragana characters resulting from transcription of voice data into kanji characters according to morphemes. The speech recognition dictionaries are stored, for example, in the memory of the first external device 7 having a transcription unit according to the types of program corners. When the transcription unit is installed in the information processing device 3, the speech recognition dictionary is stored in, for example, the main storage device 303 or the auxiliary storage device 397 according to the type of corner of the program.

適用情報は、スケジュールデータＳＤにより音声データが属する番組のコーナーの種別（例えば、ニュース、天気、スポーツ競技などの番組コーナー）に対応付けられる。当該対応付けは、例えば、参照対応表（ＬｏｏｋＵｐＴａｂｌｅ：ルックアップテーブル）の形式で、主記憶装置３０３または補助記憶装置３９７に記憶される。以下、第１の実施形態と相違するユニットについて説明する。 The application information is associated with the type of corner of the program to which the audio data belongs by the schedule data SD (for example, corners of programs such as news, weather, sports competitions, etc.). The correspondence is stored in the main storage device 303 or the auxiliary storage device 397 in the form of a lookup table, for example. Units different from the first embodiment will be described below.

図７は、第２の実施形態に係る情報処理装置３のプロセッサ３０１の機能ブロックの一例を示した図である。図７に示したように、プロセッサ３０１は、取得部３１と、特定部３４と、送信制御部３５とを有する。取得機能と、特定部３４により実行される機能（以下、特定機能と呼ぶ）と、送信制御機能とは、例えば、コンピュータによって実行可能なプログラムの形態で、主記憶装置３０３または補助記憶装置３０７に記憶される。すなわち、取得機能と特定機能と送信制御機能とをそれぞれ実行する情報処理プログラムは、主記憶装置３０３または補助記憶装置３０７に記憶される。なお、情報処理プログラムは、第１の外部装置７に記憶されてもよい。 FIG. 7 is a diagram showing an example of functional blocks of the processor 301 of the information processing device 3 according to the second embodiment. As shown in FIG. 7, the processor 301 has an acquisition unit 31, an identification unit 34, and a transmission control unit 35. The acquisition function, the function executed by the identification unit 34 (hereinafter referred to as the identification function), and the transmission control function are stored in the main memory 303 or the auxiliary memory 307 in the form of a computer-executable program, for example. remembered. That is, the information processing programs for executing the acquisition function, the specific function, and the transmission control function are stored in the main storage device 303 or the auxiliary storage device 307 . Note that the information processing program may be stored in the first external device 7 .

取得部３１は、放送に関する音声データと、放送の番組情報とを取得する。番組情報は、図４に示すように、例えば、スケジュールデータＳＤに含まれており、番組のコーナーの種別を示すコードあるいは情報である。このため、具体的には、取得部３１は、放送に関する音声データと、放送に関するスケジュールデータＳＤとを取得してもよい。 The acquisition unit 31 acquires audio data related to broadcasting and program information of the broadcasting. The program information is included in the schedule data SD, for example, as shown in FIG. 4, and is code or information indicating the type of corner of the program. Therefore, specifically, the acquisition unit 31 may acquire audio data related to broadcasting and schedule data SD related to broadcasting.

特定部３４は、番組情報に基づいて、音声データの文字起こしに適用される音声認識辞書を特定する。具体的には、特定部３４は、スケジュールデータＳＤにおける番組名と参照対応表とを照合することにより、音声認識辞書を特定する。例えば、番組名が図４に示すようなニュースである場合、特定部３４は、ニュースに対応する音声認識辞書を特定する。なお、特定部３４は、番組情報に基づいて、音声データに関する番組のコーナーの種別と同一の種別の過去の音声データに対して適用された音声認識辞書を特定してもよい。例えば、朝のニュース番組において音声認識辞書が特定された場合、特定部３４は、当日の他の時間帯よび他の日におけるニュース番組での文字起こしに関して、朝のニュース番組において用いられた音声認識辞書を特定する。このとき、適用情報は、朝のニュース番組において用いられた音声認識辞書を、当日の他の時間帯、および他の日におけるニュース番組での文字起こしに用いることを指示する内容を含む。 The specifying unit 34 specifies a voice recognition dictionary to be applied to transcription of voice data based on the program information. Specifically, the identifying unit 34 identifies the voice recognition dictionary by matching the program name in the schedule data SD with the reference correspondence table. For example, if the program name is news as shown in FIG. 4, the identifying unit 34 identifies the speech recognition dictionary corresponding to the news. Note that the specifying unit 34 may specify a voice recognition dictionary applied to the past voice data of the same type as the type of the corner of the program relating to the voice data, based on the program information. For example, when a speech recognition dictionary is specified for a morning news program, the specifying unit 34 uses the voice recognition dictionary used in the morning news program for the transcription of news programs in other time zones and on other days of the day. Identify a dictionary. At this time, the application information includes contents instructing that the speech recognition dictionary used in the morning news program be used for transcription in news programs in other time zones and other days of the day.

送信制御部３５は、音声データと特定された音声認識辞書を文字起こしに適用する適用情報とを、音声データの文字起こしを実行する文字起こし部へ送信する。すなわち、送信制御部３５は、当該音声データの文字起こしにおいて、特定された音声認識辞書を適用する適用情報を、音声データとともに、文字起こし部を有する第１の外部装置７に送信する。 The transmission control unit 35 transmits the voice data and application information for applying the specified voice recognition dictionary to transcription to the transcription unit that transcribes the voice data. That is, the transmission control unit 35 transmits application information for applying the specified voice recognition dictionary to the first external device 7 having the transcription unit together with the voice data in the transcription of the voice data.

以下、本実施形態に係る字幕情報生成処理について説明する。図８は、本実施形態に係る字幕情報生成処理の手順の一例を示す図である。 Subtitle information generation processing according to the present embodiment will be described below. FIG. 8 is a diagram showing an example of the procedure of subtitle information generation processing according to the present embodiment.

（字幕情報生成処理）
（ステップＳ８０１）
取得部３１は、音声データと番組情報とを取得する。具体的には、取得部３１は、音声データと、番組情報が包含されたスケジュールデータＳＤとを取得する。なお、取得部３１は、音声データの取得に先立って、番組情報を取得してもよい。 (Caption information generation processing)
(Step S801)
Acquisition unit 31 acquires audio data and program information. Specifically, the acquisition unit 31 acquires audio data and schedule data SD including program information. Note that the acquisition unit 31 may acquire program information prior to acquiring audio data.

（ステップＳ８０２）
特定部３４は、番組情報に基づいて、音声認識辞書を特定する。例えば、特定部３４は、スケジュールデータＳＤにおけるタイムスケジュールと音声データの時刻とを用いて、当該音声データが属する番組（番組コーナー）を特定する。次いで、特定部３４は、音声データが属する番組コーナーに対応する音声認識辞書を特定する。このとき、特定部３４は、特定された音声認識辞書に関する識別子を、主記憶装置３０３または補助記憶装置３０７に登録してもよい。 (Step S802)
The specifying unit 34 specifies the voice recognition dictionary based on the program information. For example, the identifying unit 34 identifies the program (program section) to which the audio data belongs, using the time schedule in the schedule data SD and the time of the audio data. Next, the specifying unit 34 specifies the voice recognition dictionary corresponding to the program section to which the voice data belongs. At this time, the specifying unit 34 may register an identifier related to the specified voice recognition dictionary in the main storage device 303 or the auxiliary storage device 307 .

なお、過去に特定された音声認識辞書に関する識別子が主記憶装置３０３または補助記憶装置３０７に記憶されている場合、特定部３４は、番組情報に基づいて、音声データに関する番組コーナーの種別と同一の種別の過去の音声データに対して適用された音声認識辞書を、主記憶装置３０３または補助記憶装置３０７において特定してもよい。例えば、特定部３４は、音声データが属する番組コーナーの種別に関する識別子を、主記憶装置３０３または補助記憶装置３０７において検索する。次いで、特定部３４は、当該識別子と合致する識別子に対応する音声認識辞書を、音声データが属する番組に対応する音声認識辞書として特定する。 Note that if an identifier related to the voice recognition dictionary specified in the past is stored in the main storage device 303 or the auxiliary storage device 307, the specifying unit 34 selects the same type of program section for voice data based on the program information. A speech recognition dictionary applied to the past speech data of the type may be specified in the main storage device 303 or the auxiliary storage device 307 . For example, the identification unit 34 searches the main storage device 303 or the auxiliary storage device 307 for an identifier relating to the type of program segment to which the audio data belongs. Next, the specifying unit 34 specifies the voice recognition dictionary corresponding to the identifier that matches the identifier as the voice recognition dictionary corresponding to the program to which the voice data belongs.

（ステップＳ８０３）
送信制御部３５は、取得された音声データと適用情報とを、文字起こし部を有する第１の外部装置７へ送信する。なお、文字起こし部が情報処理装置３に搭載されている場合、送信制御部３５は、取得された音声データと適用情報を文字起こし部へ出力する。このとき、文字起こし部は、適用情報において指定された音声認識辞書を用いて、音声データの文字起こしを実行する。 (Step S803)
The transmission control unit 35 transmits the acquired voice data and application information to the first external device 7 having a transcription unit. If the transcription unit is installed in the information processing device 3, the transmission control unit 35 outputs the acquired voice data and application information to the transcription unit. At this time, the transcription unit transcribes the voice data using the speech recognition dictionary specified in the application information.

（ステップＳ８０４）
取得部３１は、文字起こしの結果である第１の外部装置７から文字起こしデータを取得する。送信制御部３５は、取得された文字起こしデータを端末装置６へ送信する。本ステップにおける処理内容は、ステップＳ５０５と同様なため説明は省略する。 (Step S804)
The acquisition unit 31 acquires transcription data from the first external device 7, which is the transcription result. The transmission control unit 35 transmits the acquired transcription data to the terminal device 6 . The contents of the processing in this step are the same as in step S505, so the description is omitted.

（ステップＳ８０５）
取得部３１は、端末装置６から文字情報を取得する。字幕情報生成部は、文字情報に基づいて字幕情報を生成する。送信制御部３５は、画音データ及び生成した字幕情報をインサータ５へ出力（送信）する。他の処理は、ステップＳ５０６と同様なため、説明は省略する。 (Step S805)
The acquisition unit 31 acquires character information from the terminal device 6 . The subtitle information generation unit generates subtitle information based on the character information. The transmission control unit 35 outputs (transmits) the image/sound data and the generated caption information to the inserter 5 . Other processes are the same as those in step S506, so the description is omitted.

（ステップＳ８０６）
番組が終了していなければ（ステップＳ８０６のＮｏ）、ステップＳ８０７の処理が実行される。番組が終了していれば（ステップＳ８０６のＹｅｓ）、字幕情報生成処理は終了する。 (Step S806)
If the program has not ended (No in step S806), the process of step S807 is executed. If the program has ended (Yes in step S806), the caption information generation process ends.

（ステップＳ８０７）
取得部３１は、音声データを取得する。次いで、ステップＳ８０２以降の処理が再度実行される。 (Step S807)
Acquisition unit 31 acquires audio data. Then, the processing after step S802 is executed again.

以上のように、本実施形態に係る情報処理装置３は、放送に関する音声データと、放送の番組情報とを取得し、番組情報に基づいて、番組情報における番組のコーナーの種別に対応付けられ音声データの文字起こしに適用される音声認識辞書を特定し、音声データと特定された音声認識辞書を適用する適用情報とを、音声データの文字起こしを実行する文字起こし部へ送信する。また、本情報処理装置３は、番組情報に基づいて、音声データに関する番組のコーナーの種別と同一の種別の過去の音声データに対して適用された音声認識辞書を特定する。 As described above, the information processing apparatus 3 according to the present embodiment acquires audio data related to broadcasting and program information of the broadcasting, and based on the program information, the audio data associated with the type of corner of the program in the program information. A voice recognition dictionary to be applied to transcribe data is specified, and voice data and application information for applying the specified voice recognition dictionary are transmitted to a transcriber that transcribes the voice data. Further, the information processing device 3 identifies the voice recognition dictionary applied to the past voice data of the same type as the corner type of the program relating to the voice data, based on the program information.

このような構成によれば、本実施形態に係る情報処理装置３は、マスター室における設備から番組情報（番組コーナー情報）を受信することで、番組のコーナーに応じた音声認識辞書を自動的に選択することができる。これにより、本情報処理装置３は、選択された音声認識辞書を文字起こしに適用する適用情報を第１の外部装置７に送信することで、文字起こしに用いられる音声認識辞書を、番組のコーナーに応じて指定することができる。以上のことから、本情報処理装置３によれば、音声データが属する番組のコーナーに応じた適切な（専門的な）音声認識辞書を用いて文字起こしされた文字起こしデータを取得することができるため、放送等の番組に出演する出演者等の発言を字幕として画面に重畳させる場合において、文字起こしの校正の際の字幕制作担当者の作業負担を低減することができる。 According to such a configuration, the information processing apparatus 3 according to the present embodiment receives program information (program corner information) from equipment in the master room, and automatically creates a voice recognition dictionary corresponding to the program corner. can be selected. As a result, the information processing device 3 transmits the application information for applying the selected speech recognition dictionary to transcription to the first external device 7, thereby transferring the speech recognition dictionary used for transcription to the corner of the program. can be specified according to As described above, according to the information processing apparatus 3, it is possible to obtain transcribed data transcribed using an appropriate (specialized) speech recognition dictionary corresponding to the section of the program to which the audio data belongs. Therefore, in the case of superimposing, as captions, statements of performers appearing in a program such as broadcasting on the screen, the work load of the person in charge of creating captions when proofreading the transcription can be reduced.

第２の実施形態における技術的思想を情報処理方法で実現する場合、当該情報処理方法は、放送に関する音声データと放送の番組情報とを取得し、番組情報に基づいて、番組情報における番組のコーナーの種別に対応付けられ音声データの文字起こしに適用される音声認識辞書を特定し、音声データと特定された音声認識辞書を適用する適用情報とを、音声データの文字起こしを実行する文字起こし部へ送信する。本情報処理方法による字幕情報生成処理の手順および効果は、第２の実施形態における記載と同様なため、説明は省略する。 When the technical concept of the second embodiment is realized by an information processing method, the information processing method acquires audio data related to broadcasting and program information of the broadcasting, and based on the program information, displays corners of the program in the program information. specifies a speech recognition dictionary that is associated with the type of and is applied to the transcription of the speech data, and converts the speech data and application information that applies the specified speech recognition dictionary to a transcription unit that transcribes the speech data Send to The procedure and effect of the subtitle information generation process by this information processing method are the same as those described in the second embodiment, so the description is omitted.

第２の実施形態における技術的思想を情報処理プログラムで実現する場合、当該情報処理プログラムは、コンピュータに、放送に関する音声データと放送の番組情報とを取得し、番組情報に基づいて、番組情報における番組のコーナーの種別に対応付けられ音声データの文字起こしに適用される音声認識辞書を特定し、音声データと特定された音声認識辞書を適用する適用情報とを、音声データの文字起こしを実行する文字起こし部へ送信すること、を実現させる。情報処理プログラムによる字幕情報生成処理の手順および効果は、第２の実施形態と同様なため、説明は省略する。 When the technical idea in the second embodiment is realized by an information processing program, the information processing program acquires audio data related to broadcasting and program information of broadcasting in a computer, and based on the program information, A speech recognition dictionary that is associated with the type of corner of the program and applied to the transcription of the audio data is specified, and the speech data and application information for applying the specified speech recognition dictionary are used to transcribe the audio data. sending to a transcription unit. The procedure and effect of the subtitle information generation process by the information processing program are the same as those of the second embodiment, and thus description thereof is omitted.

以上、本発明のいくつかの実施形態を説明したが、この実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described above, the embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

１…スイッチャ
３…情報処理装置
５…インサータ
６…端末装置
７…第１の外部装置
８…送出装置
９…第２の外部装置
３１…取得部
３３…判定部
３４…特定部
３５…送信制御部
３０１…プロセッサ
３０３…主記憶装置
３０５…デバイスインタフェース
３０７…補助記憶装置
３０９…ネットワークインタフェース
REFERENCE SIGNS LIST 1 switcher 3 information processing device 5 inserter 6 terminal device 7 first external device 8 sending device 9 second external device 31 acquiring unit 33 determining unit 34 specifying unit 35 transmission control unit 301... Processor 303... Main storage device 305... Device interface 307... Auxiliary storage device 309... Network interface

Claims

an acquisition unit that acquires audio data related to a program and schedule data of the program;
a determination unit that determines whether or not the acquisition time of the audio data is included in the time slot of the program based on the schedule data;
When it is determined that the acquisition time is included in the time slot of the program, the audio data is transmitted to a transcription unit that transcribes the audio data, and the acquisition time is included in the time slot of the program. a transmission control unit for stopping transmission of the voice data to the transcription unit when it is determined that there is no
Information processing device.

the schedule data includes a temporal order of corners of the program and a temporal order of advertisements in the middle of the program;
The transmission control unit stops transmitting the voice data to the transcription unit when the acquisition time is included in the time period of the advertisement.
The information processing device according to claim 1 .

an acquisition unit that acquires audio data related to a program and program information;
a specifying unit that specifies, based on the program information, a voice recognition dictionary that is associated with the type of corner of the program in the program information and that is applied to transcription of the voice data;
a transmission control unit that transmits the voice data and application information to which the specified voice recognition dictionary is applied to a transcription unit that performs transcription of the voice data;
Information processing device.

The identifying unit identifies, based on the program information, a speech recognition dictionary applied to past audio data of the same type as a corner type of the program relating to the audio data.
The information processing apparatus according to claim 3.

Acquiring audio data related to a program and schedule data of the program;
determining whether or not the acquisition time of the audio data is included in the time slot of the program based on the schedule data;
transmitting the audio data to a transcription unit that transcribes the audio data when it is determined that the acquisition time is included in the time slot of the program;
stopping transmission of the audio data to a transcription unit when it is determined that the acquisition time is not included in the time slot of the program;
An information processing method comprising:

to the computer,
Acquiring audio data related to a program and schedule data of the program;
determining whether or not the acquisition time of the audio data is included in the time slot of the program based on the schedule data;
transmitting the audio data to a transcription unit that transcribes the audio data when it is determined that the acquisition time is included in the time slot of the program;
stopping transmission of the audio data to a transcription unit when it is determined that the acquisition time is not included in the time slot of the program;
Information processing program that realizes

Acquiring audio data related to a program and program information,
based on the program information, identifying a speech recognition dictionary that is associated with the type of corner of the program in the program information and that is applied to transcription of the audio data;
transmitting the voice data and application information to which the specified voice recognition dictionary is applied to a transcription unit that performs transcription of the voice data;
An information processing method comprising:

to the computer,
Acquiring audio data related to a program and program information,
Based on the program information, identifying a speech recognition dictionary that corresponds to the type of program section in the program information and is applied to transcription of the audio data;
transmitting the voice data and application information to which the specified voice recognition dictionary is applied to a transcription unit that performs transcription of the voice data;
Information processing program that realizes