JP2023170822A

JP2023170822A - Explanation voice production device and program

Info

Publication number: JP2023170822A
Application number: JP2022082878A
Authority: JP
Inventors: 麻乃一木; Manon Ichiki; 徹都木; Toru Tsugi
Original assignee: Nhk Foundation; Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Nhk Foundation; Japan Broadcasting Corp
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2023-12-01

Abstract

To provide an explanation voice with high expandability and versatility at a real time while using data of a plurality of information sources.SOLUTION: An analysis part 11 of an explanation voice production device 1 extracts a text element by analyzing data input from a plurality of information sources 2, and a storage part 12 stores the data to an information management table 13 by applying a label to the text element. A reading part 16 reads out the updated text element from the information management table 13 in each text element of the label defined to a template 14. A format conversion part 17 converts the label and the text element into a Json file in each speech of the read text element. A text generation part 18 generates a text for an explanation voice from the Json file. An order disposal control part 19 determines an order of the speech on the basis of a priority of a presentation timing contained in the level of the Json file, and disposes the Json file of the speech when an output processing related to the speech in ahead is completed.SELECTED DRAWING: Figure 2

Description

本発明は、解説音声用のテキストを生成する解説音声制作装置及びプログラムに関する。 The present invention relates to an explanatory audio production device and program that generate text for explanatory audio.

従来、スポーツ中継の放送番組を放送すると共に、その放送番組の解説音声を視聴者へ提供する解説音声サービスが知られている（例えば特許文献１を参照）。 2. Description of the Related Art Conventionally, commentary audio services have been known that broadcast sports broadcast programs and provide viewers with commentary audio of the broadcast programs (for example, see Patent Document 1).

図１５は、解説音声サービスを提供するシステムの概要を説明する図である。このシステムは、放送送信装置１０１、放送受信装置１０２、解説音声制作配信装置１０３、アプリサーバ１０４及び携帯端末１０５を備えて構成される。 FIG. 15 is a diagram illustrating an overview of a system that provides an audio commentary service. This system includes a broadcast transmitting device 101, a broadcast receiving device 102, an explanatory audio production and distribution device 103, an application server 104, and a mobile terminal 105.

放送送信装置１０１、解説音声制作配信装置１０３及びアプリサーバ１０４は、例えば放送局に設置され、放送受信装置１０２は、例えば視聴者１００の自宅に設置される。また、携帯端末１０５は、自宅で放送番組を視聴する視聴者１００により使用される。 The broadcast transmitting device 101, the commentary audio production and distribution device 103, and the application server 104 are installed, for example, at a broadcasting station, and the broadcast receiving device 102 is installed, for example, at the home of the viewer 100. Furthermore, the mobile terminal 105 is used by a viewer 100 who views a broadcast program at home.

このシステムの解説音声サービスにより、視聴者１００は、アナウンサーの実況及び解説者の解説にて試合状況を説明する音声及び映像の放送番組と共に、解説音声の提供を受けることができる。 With the commentary audio service of this system, the viewer 100 can receive commentary audio as well as an audio and video broadcast program that explains the match situation using the announcer's commentary and the commentator's commentary.

放送送信装置１０１は、地上デジタル放送波を介して、放送番組コンテンツを放送受信装置１０２へ送信する。放送受信装置１０２は例えばテレビ受信機であり、放送送信装置１０１から地上デジタル放送波を介して送信された放送番組コンテンツを受信し、受信した放送番組コンテンツを再生する。 Broadcast transmitting device 101 transmits broadcast program content to broadcast receiving device 102 via digital terrestrial broadcast waves. The broadcast receiving device 102 is, for example, a television receiver, and receives the broadcast program content transmitted from the broadcast transmitting device 101 via digital terrestrial broadcast waves, and reproduces the received broadcast program content.

解説音声制作配信装置１０３は、放送送信装置１０１が送信している放送番組コンテンツの解説音声を制作し、解説音声を携帯端末１０５へ送信する。アプリサーバ１０４は、携帯端末１０５で動作するアプリを記憶しており、携帯端末１０５からの要求に応じて、アプリを携帯端末１０５へ送信する。アプリは、アプリケーションの略語であり、ここでは、解説音声を受信して再生するプログラムである。 The commentary audio production and distribution device 103 produces commentary audio for the broadcast program content being transmitted by the broadcast transmitting device 101, and transmits the commentary audio to the mobile terminal 105. The application server 104 stores an application that operates on the mobile terminal 105, and transmits the application to the mobile terminal 105 in response to a request from the mobile terminal 105. App is an abbreviation for application, and here it is a program that receives and plays explanatory audio.

携帯端末１０５は、例えばスマートフォン、ＰＤＡ（Personal Digital Assistant）であり、放送受信装置１０２が受信した放送番組コンテンツに同期して、放送番組コンテンツの解説音声を再生する。携帯端末１０５は、解説音声を再生するときに、視聴者１００の操作に従って再生速度等を変更する。 The mobile terminal 105 is, for example, a smartphone or a PDA (Personal Digital Assistant), and plays explanatory audio of the broadcast program content in synchronization with the broadcast program content received by the broadcast receiving device 102. When reproducing the commentary audio, the mobile terminal 105 changes the reproduction speed etc. according to the operation of the viewer 100.

例えば放送番組が野球中継である場合、視聴者１００は、野球の試合の映像及び音声と共に、そのときの試合状況を詳しく解説した解説音声の提供を受けることができ、試合の内容を詳細に把握することができる。野球の解説音声は、例えば試合状況に応じた投手の情報、投手の動作、球種、球速、コース、打者の情報、打者の動作、得点等である。 For example, if the broadcast program is a baseball broadcast, the viewer 100 can receive video and audio of the baseball game as well as commentary audio that explains the game situation at that time in detail, so that the viewer 100 can understand the contents of the game in detail. can do. Baseball commentary audio includes, for example, pitcher information, pitcher's movements, pitch type, ball speed, course, batter's information, batter's movements, scores, etc. according to the game situation.

このような解説音声サービスを実現する解説音声制作配信装置１０３の例としては、ＯＤＦ（Olympic Data Feed）の仕様に従ったデータの提供を受け、当該データを用いて解説音声を制作し、配信するシステムが知られている（例えば非特許文献１を参照）。 An example of the commentary audio production and distribution device 103 that realizes such commentary audio service is one that receives data in accordance with ODF (Olympic Data Feed) specifications, produces and distributes commentary audio using the data. A system is known (see, for example, Non-Patent Document 1).

この非特許文献１に記載された解説音声制作配信装置１０３は、オリンピックのデータを提供する１つの情報源から、現在の試合状況の得点、反則等のデータを逐次受信する。そして、解説音声制作配信装置１０３は、予め設定されたテンプレートに変数を当てはめる等することで、試合状況に応じた実況文面のテキストを生成し、音声合成器を用いてテキストを音声化し、解説音声の音声ファイルを携帯端末１０５へ送信する。 The commentary audio production and distribution device 103 described in this non-patent document 1 sequentially receives data such as scores and fouls in the current match situation from one information source that provides Olympic data. Then, the commentary audio production and distribution device 103 generates the text of the commentary text according to the match situation by applying variables to a preset template, converts the text into audio using a speech synthesizer, and generates commentary audio. The audio file is sent to the mobile terminal 105.

特開２０１７－２０３８２７号公報JP2017-203827A

熊野正、“スポーツ番組を解説する「音声ガイド」生成技術”、ＮＨＫ技研Ｒ＆Ｄ、No.154、pp.12-20、2017Tadashi Kumano, “Audio guide generation technology for explaining sports programs”, NHK Giken R&D, No. 154, pp. 12-20, 2017

前述のとおり、非特許文献１の技術は、特定のオリンピックの大会のみで利用することができ、大会のＯＤＦの仕様に従ったデータの限定したフォーマットによりテキストを生成し、解説音声の音声ファイルを生成するものである。 As mentioned above, the technology in Non-Patent Document 1 can be used only at specific Olympic Games, and it generates text using a limited format of data in accordance with the ODF specifications of the Games, and generates audio files of explanatory audio. It is something that generates.

このため、非特許文献１の技術は、その他の大会にそのまま利用することができず、拡張性及び汎用性が低いという問題があった。 For this reason, the technology of Non-Patent Document 1 cannot be used as is for other tournaments, and has the problem of low expandability and versatility.

また、非特許文献１の技術では、解説音声を生成するための情報源が１つであることから、解説音声として視聴者１００へ伝えたい情報があったとしても、その情報源に、必ずしも伝えたい情報が存在するとは限らない。このため、複数の情報源を利用することが可能な技術が所望されていた。 Furthermore, in the technology of Non-Patent Document 1, there is only one information source for generating explanatory audio, so even if there is information that you want to convey to the viewer 100 as an explanatory audio, it is not necessarily conveyed to that information source. There is no guarantee that the information you want exists. For this reason, a technology that can utilize multiple information sources has been desired.

また、非特許文献１の技術では、情報源から必要なタイミングでデータが配信される保証がない。このため、リアルタイム性を要するデータに関しては、視聴者１００へ伝えるべきタイミングで情報源からデータの配信を受けなければ、解説音声サービスが成立しないという問題があった。 Furthermore, with the technique disclosed in Non-Patent Document 1, there is no guarantee that data will be distributed from the information source at the required timing. For this reason, with respect to data that requires real-time performance, there is a problem in that an audio commentary service cannot be established unless the data is distributed from the information source at the timing when it should be conveyed to the viewer 100.

このように、１つの情報源から配信されたデータを用いてテキストを生成し、解説音声を生成する非特許文献１の技術では、解説音声サービスとしては不十分であり、視聴者１００の要求を十分に満たすことができない。 In this way, the technology of Non-Patent Document 1, which uses data distributed from one information source to generate text and generate explanatory audio, is insufficient as an explanatory audio service, and it is difficult to meet the demands of the viewer 100. cannot be fully satisfied.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、複数の情報源のデータを利用すると共に、拡張性及び汎用性の高い解説音声をリアルタイムで提供可能な解説音声制作装置及びプログラムを提供することにある。 Therefore, the present invention has been made to solve the above problems, and its purpose is to provide an explanatory voice that utilizes data from multiple information sources and that can provide extensible and versatile explanatory audio in real time. Its purpose is to provide production equipment and programs.

前記課題を解決するために、請求項１の解説音声制作装置は、ライブ配信しているスポーツ番組の解説音声用のテキストを発話毎に生成する解説音声制作装置において、前記発話毎に、前記テキストが１または複数のテキスト要素により構成される場合の前記１または複数のテキスト要素に対応する１または複数のラベルを含む発話定義データが定義されたテンプレートと、前記テキスト要素が格納される情報管理テーブルと、複数の情報源のそれぞれから前記スポーツ番組の試合状況に応じたデータを入力し、前記情報源の予め設定されたデータフォーマットに従って前記データを解析することで、前記データから前記テキスト要素を抽出する解析部と、前記解析部により抽出された前記テキスト要素に対し、前記テキスト要素の発話を提示するタイミングの優先度を含むラベルを付与し、前記ラベルが付与されたテキスト要素を前記情報管理テーブルに格納する格納部と、前記情報管理テーブルに格納された前記テキスト要素が更新されたか否かを監視し、更新されたと判定された場合の前記テキスト要素に付与された前記ラベルを出力する更新監視部と、前記更新監視部により出力された前記ラベルを含む前記発話定義データの前記発話について、前記情報管理テーブルから、前記発話定義データに含まれる１または複数のラベルが付与された対応する１または複数のテキスト要素を読み出し、当該発話の１または複数のラベル及びこれに対応する１または複数のテキスト要素を出力する読出部と、前記読出部により出力された当該発話の１または複数のラベル及びこれに対応する１または複数のテキスト要素を、所定の再生時刻を含むファイルにフォーマット変換するフォーマット変換部と、前記フォーマット変換部によりフォーマット変換された前記ファイルから前記１または複数のテキスト要素を抽出し、前記テキストを生成して出力するテキスト生成部と、前記フォーマット変換部によりフォーマット変換された前記発話毎のファイルを入力し、前記発話毎のファイルの前記ラベルに含まれる前記優先度に基づいて、前記発話の順序を決定し、前記順序に従って前記発話毎のファイルに含まれる前記再生時刻を再設定し、前記順序が決定された先頭の発話のファイルに含まれる前記再生時刻を出力し、前記先頭の発話のファイルを破棄する順序破棄制御部と、を備えたことを特徴とする。 In order to solve the above problem, an explanation audio production device according to claim 1 is an explanation audio production device that generates a text for an explanation audio of a sports program being live distributed for each utterance. is composed of one or more text elements, a template in which utterance definition data including one or more labels corresponding to the one or more text elements is defined, and an information management table in which the text elements are stored. and inputting data corresponding to the game situation of the sports program from each of a plurality of information sources, and extracting the text element from the data by analyzing the data according to a preset data format of the information source. an analysis unit that assigns a label including the priority of the timing of presenting the utterance of the text element to the text element extracted by the analysis unit, and stores the text element to which the label has been assigned in the information management table; and an update monitor that monitors whether the text element stored in the information management table has been updated and outputs the label given to the text element when it is determined that the text element has been updated. and the utterance of the utterance definition data including the label output by the update monitoring unit, from the information management table, the corresponding one or more labels to which one or more labels included in the utterance definition data are attached a reading unit that reads out a plurality of text elements and outputs one or more labels of the utterance and one or more text elements corresponding thereto; and one or more labels of the utterance output by the reading unit and the same. a format conversion unit that format-converts one or more text elements corresponding to a file including a predetermined playback time; and extracting the one or more text elements from the file format-converted by the format conversion unit; A text generation unit that generates and outputs the text and a file for each utterance whose format has been converted by the format conversion unit are input, and based on the priority included in the label of the file for each utterance, determine the order of utterances, reset the playback time included in the file for each utterance according to the order, output the playback time included in the file of the first utterance for which the order has been determined, The present invention is characterized by comprising an order discard control unit that discards a file of utterances.

また、請求項２の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記複数の情報源には、前記スポーツ番組の試合状況に応じたリアルタイムのデータを送信する情報源が含まれると共に、さらに、オペレータの入力操作に従って前記スポーツ番組のデータを送信する情報源、前記スポーツ番組の試合状況の画像を解析することで得られるデータを送信する情報源及び前記スポーツ番組の試合状況の音声を認識することで得られるデータを送信する情報源のうちの少なくとも１つが含まれる、ことを特徴とする。 Further, in the commentary audio production device according to claim 2, in the commentary audio production device according to claim 1, the plurality of information sources include an information source that transmits real-time data according to the match situation of the sports program. and an information source that transmits data of the sports program according to an input operation by an operator, an information source that transmits data obtained by analyzing an image of a game situation of the sports program, and a game situation of the sports program. The information source includes at least one information source that transmits data obtained by recognizing the voice of the user.

また、請求項３の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記テンプレートには、前記発話毎に、前記１または複数のラベルに加え、前記１または複数のラベルのうちの１つがトリガーラベルとして定義されており、前記更新監視部が、前記トリガーラベルが付与された前記テキスト要素が前記情報管理テーブルにおいて更新されたか否かを監視し、更新されたと判定された場合の前記トリガーラベルを出力する、ことを特徴とする。 Further, in the explanatory audio production device according to claim 1, the template includes, in addition to the one or more labels, the one or more labels for each utterance. One of the text elements is defined as a trigger label, and the update monitoring unit monitors whether or not the text element to which the trigger label is attached has been updated in the information management table, and it is determined that the text element has been updated. The trigger label is output.

また、請求項４の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記ラベルが、前記情報源の種類、前記スポーツ番組の競技種目、前記優先度、前記テキスト要素が属するグループ、及び前記グループ内の項目を示すそれぞれの数値から構成されるものとし、前記発話定義データに含まれる前記１または複数のラベルが付与された対応する１または複数のテキスト要素を読み出す際の前記ラベルを読出対象ラベルとし、前記読出対象ラベルに加え、当該読出対象ラベルを構成する前記テキスト要素が属するグループ及び前記グループ内の項目が同じであって、前記情報源の種類が異なるラベルを同種ラベルとして、前記読出部が、前記情報管理テーブルに、前記同種ラベルが付与されたテキスト要素が複数格納されている場合、前記情報管理テーブルから、前記同種ラベルが付与された複数のテキスト要素のうち最先に格納された前記テキスト要素を読み出す、ことを特徴とする。 Further, in the explanatory audio production apparatus according to claim 1, the label includes the type of the information source, the competition type of the sports program, the priority, and the text element to which the text element belongs. The group shall be composed of a group and each numerical value indicating an item within the group, and the above-mentioned information is used when reading out the corresponding one or more text elements to which the one or more labels included in the utterance definition data are attached. A label is set as a read target label, and in addition to the read target label, a group to which the text elements constituting the read target label belong and a label in which the items in the group are the same but the type of the information source is different is classified as a homogeneous label. If the information management table stores a plurality of text elements to which the same kind of label is attached, the reading unit selects the most text element from the plurality of text elements to which the same kind of label is attached from the information management table. The method is characterized in that the previously stored text element is read out.

また、請求項５の解説音声制作装置は、請求項４に記載の解説音声制作装置において、前記読出部が、前記情報管理テーブルに、前記読出対象ラベルが付与されたテキスト要素が格納されておらず、前記読出対象ラベル以外の前記同種ラベルが付与されたテキスト要素が格納されている場合、前記情報管理テーブルから、前記読出対象ラベル以外の前記同種ラベルが付与されたテキスト要素を読み出す、ことを特徴とする。 Further, in the explanatory audio production device according to claim 5, in the explanatory audio production device according to claim 4, the reading unit stores a text element to which the reading target label is attached in the information management table. First, if a text element to which the same type label other than the read target label is stored is stored, the text element to which the same type label other than the read target label is attached is read from the information management table. Features.

また、請求項６の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記ラベルには、前記優先度に加え、前記スポーツ番組の競技種目が含まれており、前記読出部が、前記情報管理テーブルから読み出したテキスト要素について、当該テキスト要素に付与されたラベルに含まれる前記競技種目に応じて、当該テキスト要素を修正し、当該発話の１または複数のラベル及び１または複数のテキスト要素（修正したテキスト要素がある場合は当該テキスト要素）を出力する、ことを特徴とする。 Further, in the explanatory audio production device according to claim 6, in the explanatory audio production device according to claim 1, the label includes a competition event of the sports program in addition to the priority level, and the reading section The text element read from the information management table is modified according to the sport event included in the label assigned to the text element, and one or more labels and one or more labels of the utterance are added. The text element (or the modified text element, if any) is output.

また、請求項７の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記発話定義データに含まれる前記１または複数のラベルには、所定の助詞または単語に対応するラベルが含まれており、前記読出部が、前記情報管理テーブルから、前記所定の助詞または単語を含む１または複数のテキスト要素を読み出し、前記テキスト生成部が、前記ファイルから、前記所定の助詞または単語を含む１または複数のテキスト要素を抽出し、前記所定の助詞または単語を含むテキストを生成して出力する、ことを特徴とする。 Further, in the explanatory audio production device according to claim 7, in the explanatory audio production device according to claim 1, the one or more labels included in the utterance definition data include a label corresponding to a predetermined particle or word. The reading unit reads one or more text elements including the predetermined particle or word from the information management table, and the text generation unit reads the predetermined particle or word from the file. The present invention is characterized in that one or more text elements containing the predetermined particle or word are extracted, and a text containing the predetermined particle or word is generated and output.

また、請求項８の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記ラベルに含まれる前記優先度が、即時、準即時、定期及びその他を示す情報のうちのいずれかの情報であり、前記即時の優先度が最も高く、前記準即時が次に高く、前記その他が最も低いものとし、前記順序破棄制御部が、前記定期の優先度を含むラベルの発話を、前記即時または前記準即時の優先度を含むラベルの第１の発話、及び所定の時間間隔で配置される前記定期の優先度を含むラベルの第２の発話とし、前記第１の発話について、前記優先度が高いほど前記先頭の近くに配置するように、前記発話の順序を決定すると共に、前記第２の発話について、前記第１の発話の後に前記所定の時間間隔で配置するように、前記発話の順序を決定する、ことを特徴とする。 Further, in the explanatory audio production device according to claim 8, in the explanatory audio production device according to claim 1, the priority included in the label is any one of information indicating immediate, semi-immediate, regular, and others. , the immediate priority is the highest, the semi-immediate is the next highest, and the other is the lowest, and the order discard control unit transmits the utterance of the label including the periodic priority to the a first utterance of a label that includes an immediate or said semi-immediate priority, and a second utterance of a label that includes said regular priority that is arranged at a predetermined time interval; The order of the utterances is determined so that the higher the frequency, the closer the utterances are to the beginning, and the second utterance is arranged at the predetermined time interval after the first utterance. It is characterized by determining the order of.

また、請求項９の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記ラベルが、前記情報源の種類、前記スポーツ番組の競技種目、前記優先度、前記テキスト要素が属するグループ、及び前記グループ内の項目を示す数値から構成されるものとし、前記発話定義データに含まれる前記１または複数のラベルが付与された対応する１または複数のテキスト要素を読み出す際の前記ラベルを読出対象ラベルとし、前記読出対象ラベルに加え、当該読出対象ラベルを構成する前記テキスト要素が属するグループ及び前記グループ内の項目が同じであって、前記情報源の種類が異なるラベルを同種ラベルとして、前記順序破棄制御部が、前記更新監視部による更新の判定に伴って、前記同種ラベルを含む新たなファイルを入力し、前記発話毎のファイルについての発話の順序を決定した際に、前記同種ラベルを含むファイルが複数存在する場合、前記同種ラベルを含む複数のファイルのうち、前記新たなファイル以外のファイルを破棄する、ことを特徴とする。 Further, in the explanatory audio production device according to claim 9, in the explanatory audio production device according to claim 1, the label includes the type of the information source, the competition type of the sports program, the priority, and the text element to which the text element belongs. The label shall be composed of a group and a numerical value indicating an item within the group, and the label shall be used when reading out the corresponding one or more text elements to which the one or more labels included in the utterance definition data are attached. As a read target label, in addition to the read target label, a label to which the group to which the text elements constituting the read target label belong and the items in the group are the same, and the type of the information source is different is defined as a homogeneous label; When the order discard control unit inputs a new file including the same kind of label in accordance with the update determination by the update monitoring unit and determines the order of utterances for the files for each utterance, the same kind label If there are a plurality of files including the same kind of label, the files other than the new file among the plurality of files including the same kind of label are discarded.

また、請求項１０の解説音声制作装置は、請求項１に記載の解説音声制作装置において、前記順序破棄制御部が、前記発話毎のファイルのうち、予め設定された時間が経過したファイルを破棄する、ことを特徴とする。 In the explanatory audio production device according to claim 10, in the explanatory audio production device according to claim 1, the order discard control unit discards files for which a preset time has elapsed among the files for each utterance. It is characterized by doing.

さらに、請求項１１のプログラムは、ライブ配信しているスポーツ番組の解説音声用のテキストを発話毎に生成する解説音声制作装置を構成するコンピュータを、前記発話毎に、前記テキストが１または複数のテキスト要素により構成される場合の前記１または複数のテキスト要素に対応する１または複数のラベルを含む発話定義データが定義されたテンプレート、前記テキスト要素が格納される情報管理テーブル、複数の情報源のそれぞれから前記スポーツ番組の試合状況に応じたデータを入力し、前記情報源の予め設定されたデータフォーマットに従って前記データを解析することで、前記データから前記テキスト要素を抽出する解析部、前記解析部により抽出された前記テキスト要素に対し、前記テキスト要素の発話を提示するタイミングの優先度を含むラベルを付与し、前記ラベルが付与されたテキスト要素を前記情報管理テーブルに格納する格納部、前記情報管理テーブルに格納された前記テキスト要素が更新されたか否かを監視し、更新されたと判定された場合の前記テキスト要素に付与された前記ラベルを出力する更新監視部、前記更新監視部により出力された前記ラベルを含む前記発話定義データの前記発話について、前記情報管理テーブルから、前記発話定義データに含まれる１または複数のラベルが付与された対応する１または複数のテキスト要素を読み出し、当該発話の１または複数のラベル及びこれに対応する１または複数のテキスト要素を出力する読出部、前記読出部により出力された当該発話の１または複数のラベル及びこれに対応する１または複数のテキスト要素を、所定の再生時刻を含むファイルにフォーマット変換するフォーマット変換部、前記フォーマット変換部によりフォーマット変換された前記ファイルから前記１または複数のテキスト要素を抽出し、前記テキストを生成して出力するテキスト生成部、及び、前記フォーマット変換部によりフォーマット変換された前記発話毎のファイルを入力し、前記発話毎のファイルの前記ラベルに含まれる前記優先度に基づいて、前記発話の順序を決定し、前記順序に従って前記発話毎のファイルに含まれる前記再生時刻を再設定し、前記順序が決定された先頭の発話のファイルに含まれる前記再生時刻を出力し、前記先頭の発話のファイルを破棄する順序破棄制御部として機能させることを特徴とする。 Further, the program according to claim 11 is directed to a computer forming a commentary audio production device that generates text for commentary audio of a sports program being live-distributed for each utterance. A template in which utterance definition data including one or more labels corresponding to the one or more text elements is defined when composed of text elements, an information management table in which the text elements are stored, and a plurality of information sources. an analysis unit that extracts the text element from the data by inputting data corresponding to the game situation of the sports program from each and analyzing the data according to a preset data format of the information source; a storage unit that assigns a label including a priority of the timing of presentation of the utterance of the text element to the extracted text element, and stores the text element to which the label has been assigned in the information management table; an update monitoring unit that monitors whether or not the text element stored in a management table has been updated, and outputs the label given to the text element when it is determined that the text element has been updated; Regarding the utterance of the utterance definition data that includes the label, one or more corresponding text elements to which one or more labels included in the utterance definition data are attached are read from the information management table, and a reading unit that outputs one or more labels and one or more text elements corresponding thereto; one or more labels of the utterance output by the reading unit and one or more text elements corresponding thereto; a format conversion unit that converts the format into a file including a predetermined playback time; a text generation unit that extracts the one or more text elements from the file whose format has been converted by the format conversion unit, generates and outputs the text; and inputting the file for each utterance whose format has been converted by the format conversion unit, determining the order of the utterances based on the priority included in the label of the file for each utterance, and converting the utterances according to the order. As an order discard control unit that resets the playback time included in a file for each utterance, outputs the playback time included in the file of the first utterance for which the order has been determined, and discards the file of the first utterance. It is characterized by making it function.

以上のように、本発明によれば、複数の情報源のデータを利用すると共に、拡張性及び汎用性の高い解説音声をリアルタイムで提供することができる。 As described above, according to the present invention, it is possible to utilize data from a plurality of information sources and provide explanatory audio with high expandability and versatility in real time.

本発明の実施形態による解説音声制作装置を含む解説音声制作配信システムの全体構成例を説明する概略図である。1 is a schematic diagram illustrating an example of the overall configuration of an explanatory audio production and distribution system including an explanatory audio production device according to an embodiment of the present invention. 本発明の実施形態による解説音声制作装置の構成例を示すブロック図である。1 is a block diagram showing a configuration example of an explanatory audio production device according to an embodiment of the present invention. FIG. 解析部及び格納部の処理例を示すフローチャートである。7 is a flowchart illustrating an example of processing by an analysis unit and a storage unit. ラベルを説明する図である。It is a figure explaining a label. テンプレートの例を示す図である。It is a figure which shows the example of a template. テンプレートに定義された発話定義データ、情報管理テーブルに格納されたラベル及びテキスト要素、Ｊｓｏｎファイル、並びに解説音声用テキストの例を説明する図である。FIG. 3 is a diagram illustrating an example of utterance definition data defined in a template, labels and text elements stored in an information management table, a Json file, and explanatory audio text. 更新監視部の処理例を示すフローチャートである。3 is a flowchart illustrating an example of processing by an update monitoring unit. 更新監視部の処理例を説明する図である。FIG. 3 is a diagram illustrating an example of processing by an update monitoring unit. 読出部の処理例を示すフローチャートである。3 is a flowchart illustrating an example of processing by a reading unit. テキスト要素の読み出し例（ステップＳ９０５）を説明する図である。FIG. 7 is a diagram illustrating an example of reading out a text element (step S905). フォーマット変換部の処理例を示すフローチャートである。3 is a flowchart illustrating an example of processing by a format converter. テキスト生成部の処理例を示すフローチャートである。3 is a flowchart illustrating an example of processing by a text generation unit. 順序破棄制御部の処理例を示すフローチャートである。3 is a flowchart illustrating an example of processing by an order discard control unit. 順序破棄制御部における配列内の発話データの変遷を説明する図である。FIG. 6 is a diagram illustrating the transition of utterance data in an array in an order discard control unit. 解説音声サービスを提供するシステムの概要を説明する図である。FIG. 1 is a diagram illustrating an overview of a system that provides an audio commentary service.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。
〔解説音声制作配信システム〕
まず、解説音声サービスを実現する解説音声制作配信システムについて説明する。図１は、本発明の実施形態による解説音声制作装置を含む解説音声制作配信システムの全体構成例を説明する概略図である。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail using the drawings.
[Explanatory audio production and distribution system]
First, the explanation audio production and distribution system that realizes the explanation audio service will be explained. FIG. 1 is a schematic diagram illustrating an example of the overall configuration of an explanatory audio production and distribution system including an explanatory audio production device according to an embodiment of the present invention.

この解説音声制作配信システム１０は、解説音声制作装置１、複数の情報源２、音声合成装置３、配信装置４及び携帯端末５を備えて構成される。解説音声制作配信システム１０は、図１５に示した解説音声サービスを提供するシステムのうち、解説音声制作配信装置１０３及び携帯端末１０５に対応している。 This explanatory audio production and distribution system 10 includes an explanatory audio production device 1, a plurality of information sources 2, a speech synthesis device 3, a distribution device 4, and a mobile terminal 5. The commentary audio production and distribution system 10 corresponds to the commentary audio production and distribution device 103 and the mobile terminal 105 among the systems that provide the commentary audio service shown in FIG.

解説音声制作装置１は、ライブ配信しているスポーツ番組の解説音声を制作する際の解説音声用テキストを発話毎に生成する装置である。解説音声制作装置１は、複数の情報源２から、ライブ配信しているスポーツ番組の試合状況に応じたリアルタイムのデータを入力する。そして、解説音声制作装置１は、データの入力元である情報源２独自のデータフォーマットに従ってデータを解析することでテキスト要素を抽出し、テキスト要素にラベルを付与し、後述する情報管理テーブル１３に格納する。 The commentary audio production device 1 is a device that generates commentary audio text for each utterance when producing commentary audio for a sports program that is being distributed live. An explanatory audio production device 1 receives real-time data from a plurality of information sources 2 according to the match situation of a live sports program. Then, the explanatory audio production device 1 extracts text elements by analyzing the data according to the unique data format of the information source 2 that is the input source of the data, assigns a label to the text element, and stores it in the information management table 13 described later. Store.

ここで、テキスト要素は、生成したい解説音声用テキスト（発話したい内容のテキスト）を構成する１または複数の要素である。ラベルは、テキスト要素の内容を識別するための情報である。詳細については後述する。 Here, the text element is one or more elements that constitute the explanatory voice text (text of the content desired to be uttered) to be generated. A label is information for identifying the content of a text element. Details will be described later.

解説音声制作装置１は、１発話の解説音声用テキストを生成するために、後述するテンプレート１４に定義された発話定義データに従い、情報管理テーブル１３から更新されたテキスト要素を読み出して再生時刻を含むＪｓｏｎファイルを生成し、発話ＩＤを付与して解説音声用テキストを生成する。また、解説音声制作装置１は、発話の順序を決定して再生時刻を再設定する。再生時刻は、携帯端末５が解説音声用テキストの音声ファイルを再生する時刻である。 In order to generate explanatory audio text for one utterance, the explanatory audio production device 1 reads updated text elements from the information management table 13 and includes the playback time according to utterance definition data defined in a template 14 described later. A Json file is generated, an utterance ID is assigned, and an explanatory voice text is generated. Further, the commentary audio production device 1 determines the order of utterances and resets the playback time. The reproduction time is the time at which the mobile terminal 5 reproduces the audio file of the explanatory audio text.

解説音声制作装置１は、発話毎に、発話ＩＤ及び解説音声用テキストを音声合成装置３へ出力すると共に、発話ＩＤ及び再生時刻を配信装置４へ出力する。 For each utterance, the explanatory audio production device 1 outputs the utterance ID and explanatory audio text to the speech synthesis device 3, and also outputs the utterance ID and playback time to the distribution device 4.

情報源２は、例えば競技種目毎の複数の情報源からなる。図１に示すとおり、野球の複数の情報源２としては、例えばＯＤＦの仕様に従ったオリンピック関連のデータを配信する情報源２－１、ＢＩＳの仕様に従ったプロ野球関連のデータを配信する情報源２－２、ＢＩＰの仕様に従ったプロ野球関連のデータを配信する情報源２－３、ＳＩＧＮの仕様に従った高校野球関連のデータを配信する情報源２－４がある。 The information source 2 includes a plurality of information sources for each sport event, for example. As shown in FIG. 1, the plurality of baseball information sources 2 include, for example, an information source 2-1 that distributes Olympic-related data according to ODF specifications, and an information source 2-1 that distributes professional baseball-related data according to BIS specifications. There are an information source 2-2, an information source 2-3 that distributes data related to professional baseball according to BIP specifications, and an information source 2-4 that distributes data related to high school baseball according to SIGN specifications.

また、野球の情報源２としては、放送番組を視聴しているオペレータの入力操作により、所定の仕様に従った野球関連のデータを送信する情報源２－５、野球の試合状況の画像を解析することで野球関連のデータを生成し、所定の仕様に従った野球関連のデータを送信する情報源２－６、野球の試合状況の音声を認識することで野球関連のデータを生成し、所定の仕様に従った野球関連のデータを送信する情報源２－７等がある。その他、テニスのデータを配信する複数の情報源２等がある。 In addition, the baseball information source 2 includes an information source 2-5 that transmits baseball-related data according to predetermined specifications based on input operations by an operator who is watching a broadcast program, and an information source 2-5 that analyzes images of baseball game situations. Information source 2-6 generates baseball-related data by doing so and transmits baseball-related data according to predetermined specifications; There are information sources 2-7, etc. that transmit baseball-related data according to the specifications. In addition, there are multiple information sources 2 that distribute tennis data.

このように、スポーツ番組の試合状況に応じたリアルタイムのデータを送信する情報源２－１，・・・，２－４に加え、オペレータの入力操作による情報源２－５、画像解析による情報源２－６及び音声認識による情報源２－７、または情報源２－５，２－６，２－７のうちの少なくとも１つを用いることにより、情報源２－１，・・・，２－４から配信されるデータの不足分を補充することができ、提示する解説音声の幅を広げることができる。 In this way, in addition to the information sources 2-1, ..., 2-4 that transmit real-time data according to the match situation of sports programs, the information source 2-5 based on operator input operations, and the information source based on image analysis. 2-6 and information source 2-7 by voice recognition, or by using at least one of information sources 2-5, 2-6, 2-7, information sources 2-1, ..., 2- It is possible to supplement the missing data distributed from 4, and it is possible to widen the range of explanatory voices to be presented.

また、複数の情報源２のデータを用いることにより、特定の番組、大会または競技に依存することなく、汎用性の高い解説音声制作配信システム１０を実現することができる。また、複数の情報源２から解説音声として必要なデータを取得することができるため、確実な解説音声の提示が可能となり、リアルタイム性も実現することができる。 Further, by using data from a plurality of information sources 2, it is possible to realize a highly versatile commentary audio production and distribution system 10 without depending on a specific program, tournament, or competition. Furthermore, since the data necessary for the explanatory audio can be acquired from a plurality of information sources 2, it becomes possible to present the explanatory audio reliably, and it is also possible to realize real-time performance.

音声合成装置３は、解説音声制作装置１から発話ＩＤ及び解説音声用テキストを入力し、既存技術により、解説音声用テキストから合成音を生成することで音声ファイルを生成する。そして、音声合成装置３は、発話ＩＤ及び音声ファイルを配信装置４へ出力する。 The speech synthesis device 3 receives the utterance ID and the commentary audio text from the commentary audio production device 1, and generates a synthesized sound from the commentary audio text using existing technology to generate an audio file. Then, the speech synthesis device 3 outputs the utterance ID and the audio file to the distribution device 4.

配信装置４は、解説音声制作装置１から発話ＩＤ及び再生時刻を入力すると共に、音声合成装置３から発話ＩＤ及び音声ファイルを入力する。そして、配信装置４は、解説音声の発話が放送の主音声の発話と重ならないように、再生時刻を変更する。具体的には、配信装置４は、放送の主音声の発話についてその終わりの時刻を予測し、主音声の開始時刻から終了時刻までの間の時間期間と解説音声の音声ファイルが再生される時間期間とが重なる場合、再生時刻を主音声の終了時刻の後の時刻に変更する。 The distribution device 4 inputs the utterance ID and playback time from the commentary audio production device 1, and also inputs the utterance ID and audio file from the speech synthesis device 3. Then, the distribution device 4 changes the playback time so that the utterance of the commentary audio does not overlap with the utterance of the main audio of the broadcast. Specifically, the distribution device 4 predicts the end time of the utterance of the main audio of the broadcast, and predicts the time period from the start time of the main audio to the end time and the time during which the audio file of the explanatory audio will be played. If the periods overlap, the playback time is changed to a time after the end time of the main audio.

配信装置４は、同じ発話ＩＤの音声ファイル及び再生時刻を携帯端末５へ配信する。 The distribution device 4 distributes the audio file and playback time of the same utterance ID to the mobile terminal 5.

携帯端末５は、配信装置４から配信された音声ファイル及び再生時刻を受信し、再生時刻のときに音声ファイルを再生する。 The mobile terminal 5 receives the audio file and playback time distributed from the distribution device 4, and plays the audio file at the playback time.

生放送のようなリアルタイムに進行する音声に対しては、複数の音声の重なりを避けられない状況も考えられる。また、高齢者または難聴者は、女性の声が聴き取り難かったり、試合の進行状況が速い競技では音声を聴き逃がしたりすることがあり得る。この場合、携帯端末５は、視聴者１００の操作に従い、再生音声の話者及び再生速度を選択して再生する。これにより、それぞれの事情に合わせて聴き取り易さを実現できる。 For audio that progresses in real time, such as in a live broadcast, there may be situations in which multiple audio overlap is unavoidable. Furthermore, elderly people or people with hearing loss may have difficulty hearing women's voices, or may miss hearing voices in fast-paced competitions. In this case, the mobile terminal 5 selects the speaker and reproduction speed of the reproduced audio according to the operation of the viewer 100, and reproduces the audio. This makes it possible to achieve ease of listening according to each situation.

〔解説音声制作装置１〕
次に、図１に示した解説音声制作装置１について詳細に説明する。図２は、本発明の実施形態による解説音声制作装置１の構成例を示すブロック図である。 [Explanatory audio production device 1]
Next, the explanation audio production device 1 shown in FIG. 1 will be explained in detail. FIG. 2 is a block diagram showing a configuration example of the explanatory audio production device 1 according to the embodiment of the present invention.

この解説音声制作装置１は、解析部１１、格納部１２、情報管理テーブル１３、テンプレート１４、更新監視部１５、読出部１６、フォーマット変換部１７、テキスト生成部１８及び順序破棄制御部１９を備えている。 This explanatory audio production device 1 includes an analysis section 11, a storage section 12, an information management table 13, a template 14, an update monitoring section 15, a reading section 16, a format conversion section 17, a text generation section 18, and an order discard control section 19. ing.

＜解析部１１＞
図３は、解析部１１及び格納部１２の処理例を示すフローチャートである。解析部１１は、複数の情報源２から、ライブ配信しているスポーツ番組の試合状況に応じたデータを入力する（ステップＳ３０１）。入力するデータは、固定長、ＣＳＶ、ＸＭＬ、Ｊｓｏｎ等の様々な形式で定義されたデータである。 <Analysis section 11>
FIG. 3 is a flowchart showing a processing example of the analysis unit 11 and storage unit 12. The analysis unit 11 inputs data corresponding to the match situation of the live sports program from the plurality of information sources 2 (step S301). The input data is data defined in various formats such as fixed length, CSV, XML, and Json.

解析部１１は、データの入力元である情報源２の種類を識別して識別情報を生成すると共に、入力した情報源２のデータについて、情報源２の予め設定されたデータフォーマットに従って解析することで、データからテキスト要素を抽出する（ステップＳ３０２）。また、解析部１１は、テキスト要素の抽出の際に、テキスト要素がどのような種類、内容等の情報であるかを示す解析結果を生成し、テキスト要素、解析結果及び識別情報を格納部１２に出力する。 The analysis unit 11 identifies the type of the information source 2 that is the data input source and generates identification information, and also analyzes the input data of the information source 2 according to a preset data format of the information source 2. Then, text elements are extracted from the data (step S302). Furthermore, when extracting a text element, the analysis unit 11 generates an analysis result indicating the type, content, etc. of the text element, and stores the text element, analysis result, and identification information in the storage unit 12. Output to.

例えば解析部１１は、情報源２－２からＢＩＳの仕様に従ったプロ野球関連のデータ（「ピッチャー鈴木」「かまえた」等）を入力し、識別情報として、情報源２の種類が「ＢＩＳ」であることを示す情報を生成し、情報源２－２のデータフォーマットに合わせて解析することで、データからテキスト要素「ピッチャー鈴木」「かまえた」を抽出する。また、解析部１１は、解析結果として、競技種目が「野球」であり、「ピッチャー鈴木」が投手の名前であり、「かまえた」が投手の動作であること等を示す結果を生成する。 For example, the analysis unit 11 inputs professional baseball related data ("Pitcher Suzuki", "Kamaeta", etc.) according to the BIS specifications from the information source 2-2, and determines that the type of the information source 2 is "BIS" as identification information. ” is generated and analyzed in accordance with the data format of information source 2-2, thereby extracting the text elements “Pitcher Suzuki” and “Kamaeta” from the data. Furthermore, the analysis unit 11 generates results indicating that the sport event is "baseball," "Pitcher Suzuki" is the name of the pitcher, and "kamaeta" is the motion of the pitcher.

＜格納部１２＞
格納部１２は、解析部１１からテキスト要素、解析結果及び識別情報を入力し、テキスト要素に対し、解析結果及び識別情報に応じたラベルを付与する（ステップＳ３０３）。そして、格納部１２は、ラベルが付与されたテキスト要素をタイムスタンプと共に、情報管理テーブル１３に格納（ラベルに応じた位置に配置）する（ステップＳ３０４）。タイムスタンプは、テキスト要素が情報管理テーブル１３に格納される時刻に関する情報である。 <Storage section 12>
The storage unit 12 inputs the text element, the analysis result, and the identification information from the analysis unit 11, and assigns a label to the text element according to the analysis result and identification information (step S303). Then, the storage unit 12 stores the labeled text element together with the time stamp in the information management table 13 (arranges it at a position according to the label) (step S304). The timestamp is information regarding the time when the text element is stored in the information management table 13.

前述のとおり、ラベルはテキスト要素の内容を識別するための情報である。ラベルは、後述する図４に示すように、１列目から５列目までの合計５個の数値により構成される。具体的には、１列目はテキスト要素が取得された情報源２の種類、２列目はテキスト要素の競技種目、３列目はテキスト要素を発話として提示する際の提示タイミングの優先度（優先順位）を示す。また、４，５列目はテキスト要素を内容に応じて分類したときのグループ及び項目を示す。 As mentioned above, a label is information for identifying the content of a text element. As shown in FIG. 4, which will be described later, the label is composed of a total of five numerical values from the first column to the fifth column. Specifically, the first column shows the type of information source 2 from which the text element was obtained, the second column shows the competition type of the text element, and the third column shows the presentation timing priority when presenting the text element as an utterance ( priority). Furthermore, the fourth and fifth columns show groups and items when text elements are classified according to content.

例えば格納部１２は、解析部１１からテキスト要素「ピッチャー鈴木」、解析結果（競技種目が「野球」であり、「ピッチャー鈴木」が投手の名前であること等を示す結果）及び識別情報（情報源２の種類が「ＢＩＳ」であることを示す情報）を入力した場合、テキスト要素「ピッチャー鈴木」に対し、解析結果及び識別情報に応じたラベルとして「２－１－３－９－１」を付与する。 For example, the storage unit 12 receives the text element “Pitcher Suzuki” from the analysis unit 11, analysis results (results indicating that the sport is “baseball” and “Pitcher Suzuki” is the name of the pitcher, etc.), and identification information (information (information indicating that the type of source 2 is "BIS"), "2-1-3-9-1" is added as a label for the text element "Pitcher Suzuki" according to the analysis result and identification information. Grant.

後述する図４に示すとおり、１列目「２」は情報源２の種類が「ＢＩＳ」であることを示し、２列目「１」は競技種目が「野球」であることを示し、３列目「３」は提示タイミングが「定期」であることを示している。また、４，５列目「９－１」はグループが「投手情報」であり、項目が「名前」であることを示している（投手の名前であることを示している）。 As shown in FIG. 4, which will be described later, "2" in the first column indicates that the type of information source 2 is "BIS", "1" in the second column indicates that the sport is "baseball", and "3" indicates that the type of information source 2 is "BIS". Column "3" indicates that the presentation timing is "regular". Further, "9-1" in the fourth and fifth columns indicates that the group is "pitcher information" and the item is "name" (indicating that it is the name of the pitcher).

また、格納部１２は、解析部１１からテキスト要素「かまえた」、解析結果（情報源２が「ＢＩＳ」であり、競技種目が「野球」であり、「かまえた」が投手の動作である等を示す結果）及び識別情報（情報源２の種類が「ＢＩＳ」であることを示す情報）を入力した場合、テキスト要素「かまえた」に対し、解析結果及び識別情報に応じたラベルとして「２－１－１－１１－１」を付与する。 In addition, the storage unit 12 receives the text element “Kamaeta” from the analysis unit 11, and the analysis result (information source 2 is “BIS”, the sport event is “baseball”, and “Kamaeta” is the motion of the pitcher). etc.) and identification information (information indicating that the type of information source 2 is "BIS"), the text element "Kamaeta" will be labeled with "," according to the analysis result and identification information. 2-1-1-11-1”.

後述する図４に示すとおり、１列目「２」は情報源２の種類が「ＢＩＳ」であることを示し、２列目「１」は競技種目が「野球」であることを示し、３列目「１」は提示タイミングが「即時」であることを示している。また、４，５列目「１１－１」はグループが「投手の動作」であり、項目が「かまえた」であることを示している。 As shown in FIG. 4, which will be described later, "2" in the first column indicates that the type of information source 2 is "BIS", "1" in the second column indicates that the sport is "baseball", and "3" indicates that the type of information source 2 is "BIS". Column "1" indicates that the presentation timing is "immediate". Further, "11-1" in the fourth and fifth columns indicates that the group is "pitcher's action" and the item is "kameta."

これにより、テキスト要素には共通のラベルが付与されることとなるため、本来的に異なる情報源２のデータフォーマットによるデータから抽出されたテキスト要素を、情報管理テーブル１３にて一元管理することができる。同様に、競技種目の異なるデータも一元管理することができる。 As a result, a common label is given to the text elements, so it is possible to centrally manage the text elements extracted from the data in the data format of the information source 2, which is originally different, in the information management table 13. can. Similarly, data for different sports can be centrally managed.

ここで、格納部１２において、ラベルの１列目には、識別情報に応じた数値が付与され、ラベルの２，４，５列目には、解析結果に応じた数値が付与され、３列目の提示タイミングには、後述するテンプレート１４に定義されたラベルの３列目における数値が付与される。具体的には、格納部１２は、付与するラベルの１～５列目の数値について、まず、解析結果及び識別情報に応じて１，２，４，５列目の数値を決定する。そして、格納部１２は、３列目について、後述するテンプレート１４に定義されたラベルのうち、決定した１，２，４，５列目の数値と同じ１，２，４，５列目の数値を有するラベルを特定し、特定したラベルの３列目の数値を抽出し、当該数値を、付与するラベルの３列目の数値として決定する。 Here, in the storage unit 12, the first column of the label is given a numerical value according to the identification information, the second, fourth, and fifth columns of the label are given numerical values according to the analysis results, and the third column The presentation timing of the eyes is given a numerical value in the third column of the label defined in the template 14, which will be described later. Specifically, the storage unit 12 first determines the numerical values in the 1st, 2nd, 4th, and 5th columns according to the analysis result and identification information regarding the numerical values in the 1st to 5th columns of the label to be attached. Then, for the third column, the storage unit 12 stores the values in the 1st, 2nd, 4th, and 5th columns that are the same as the determined values in the 1st, 2nd, 4th, and 5th columns among the labels defined in the template 14 to be described later. The label having the label is specified, the numerical value in the third column of the specified label is extracted, and the numerical value is determined as the numerical value in the third column of the label to be provided.

尚、後述するテンプレート１４に、ラベルの４，５列目に応じた３列目の提示タイミングが定義されるようにしてもよい。この場合、テンプレート１４は、ラベルの４，５列目毎に、３列目の提示タイミングの数値を備えており、格納部１２は、解析結果及び識別情報に応じてラベルの１，２，４，５列目の数値を決定した後、テンプレート１４からラベルの４，５列目に対応する３列目の提示タイミングの数値を読み出し、読み出した数値を、付与するラベルの３列目の数値として決定する。 Note that the presentation timing of the third column may be defined in the template 14, which will be described later, in accordance with the fourth and fifth columns of the labels. In this case, the template 14 includes a numerical value of the presentation timing of the third column for each of the fourth and fifth columns of the label, and the storage unit 12 stores the numerical value of the presentation timing of the third column in each of the fourth and fifth columns of the label. , After determining the numerical value in the fifth column, read out the numerical value of the presentation timing in the third column corresponding to the fourth and fifth columns of the label from the template 14, and use the read numerical value as the numerical value in the third column of the label to be given. decide.

＜情報管理テーブル１３＞
情報管理テーブル１３には、ラベルが付与されたテキスト要素がタイムスタンプと共に格納される。つまり、情報管理テーブル１３は、ラベル、テキスト要素及びタイムスタンプにより構成される。テキスト要素は、解説音声用テキストを構成する際の最小単位の要素である。 <Information management table 13>
The information management table 13 stores labeled text elements together with time stamps. That is, the information management table 13 is composed of labels, text elements, and time stamps. The text element is the minimum unit element when configuring the explanatory audio text.

図４は、ラベルを説明する図である。図４（１）に示すように、ラベルは、１つのテキスト要素に対して付与され、１列目から５列目までの合計５個の数値により構成される。 FIG. 4 is a diagram illustrating labels. As shown in FIG. 4(1), a label is given to one text element and is composed of a total of five numerical values from the first column to the fifth column.

ラベルの１列目は、図４（２）に示すように、テキスト要素の配信元等である情報源２の種類を示す。情報源２の種類は、素性を区別するための情報である。数値「１」は「ＯＤＦ」、数値「２」は「ＢＩＳ」、数値「３」は「ＢＩＰ」、数値４は「画像解析ツール」、数値５は「入力ツール」、・・・を示す。 As shown in FIG. 4(2), the first column of labels indicates the type of information source 2, which is the distribution source of the text element. The type of information source 2 is information for distinguishing features. The numerical value "1" indicates "ODF", the numerical value "2" indicates "BIS", the numerical value "3" indicates "BIP", the numerical value 4 indicates "image analysis tool", the numerical value 5 indicates "input tool", and so on.

ラベルの２列目は、図４（３）に示すように、テキスト要素の内容が表現している競技種目を示す。競技種目は、競技毎に異なる独自の発話の言い回し、または競技毎に異なる条件が必要になったときに使用する情報である。数値「１」は「野球」、数値「２」は「テニス」、数値「３」は「卓球」、数値「４」は「バドミントン」、数値「５」は「バスケットボール」、・・・を示す。 The second column of labels indicates the competition event expressed by the content of the text element, as shown in FIG. 4(3). The competition event is information that is used when different unique utterances are required for each competition, or when different conditions are required for each competition. The number "1" indicates "baseball", the number "2" indicates "tennis", the number "3" indicates "table tennis", the number "4" indicates "badminton", the number "5" indicates "basketball", etc. .

ラベルの３列目は、図４（４）に示すように、テキスト要素を発話として提示する際の提示タイミングの優先度を示す。提示タイミングは、解説音声を提示するタイミングを制御するための情報である。数値「１」は「即時」、数値「２」は「準即時」、数値「３」は「定期」、数値「４」は「その他」を示す。 The third column of labels indicates the priority of presentation timing when presenting a text element as an utterance, as shown in FIG. 4(4). The presentation timing is information for controlling the timing of presenting the explanatory audio. The numerical value "1" indicates "immediate," the numerical value "2" indicates "semi-immediate," the numerical value "3" indicates "regular," and the numerical value "4" indicates "other."

解説音声は必ず１つずつ提示する必要があり、放送に合わせて提示する条件下では特に、放送が解説音声に重なってもよいか否かの観点で、テキスト要素に応じて提示タイミングが予め設定される。 It is necessary to present explanatory audio one by one, and the presentation timing is set in advance according to the text element, especially under the condition that the explanatory audio is presented in conjunction with the broadcast, from the viewpoint of whether or not the broadcast may overlap with the explanatory audio. be done.

「即時」は、映像との同期が重要であり、放送音声との重なりは一切考えず、配信装置４から解説音声の音声ファイル等が配信され次第、即時に携帯端末５のアプリはこれを再生する。このため、優先度としては最上位に位置する。例えば、解説音声が「ピッチャーかまえた」「投げた」の場合、これらは映像と同期して再生されなければ意味がない。 "Immediately" means that it is important to synchronize with the video, and the application on the mobile terminal 5 plays it immediately as soon as the audio file of the commentary audio is distributed from the distribution device 4, without considering any overlap with the broadcast audio. do. Therefore, it is placed at the highest priority level. For example, if the explanatory audio is ``The pitcher caught'' or ``The pitcher threw,'' these words are meaningless unless they are played in synchronization with the video.

「準即時」は、放送音声との重なりも考慮しつつ、所定の時間内に携帯端末５のアプリが解説音声の音声ファイルを再生する。例えば卓球の試合で技が決まった際に、解説音声が「鈴木対山田１０対６」の場合、携帯端末５のアプリは、放送音声と重ならずに発話させるために、例えば２秒の時間内に放送音声と重ならないときに再生を行うか、または２秒を超えたときに、即時に再生を行う。 "Semi-immediate" means that the app on the mobile terminal 5 plays the audio file of the commentary audio within a predetermined time, taking into consideration the overlap with the broadcast audio. For example, when a technique is decided in a table tennis match, and the explanatory audio is "Suzuki vs. Yamada 10 vs. 6," the app on the mobile device 5 will set a time limit of 2 seconds, for example, to allow the player to speak without overlapping with the broadcast audio. Playback is performed when there is no overlap with the broadcast audio within 2 seconds, or playback is performed immediately when the time exceeds 2 seconds.

「定期」は、解説音声が試合タイトル、対戦カード、今の得点情報等、即時性がなく定期的に発話させた方がよい場合に設定される。携帯端末５のアプリは、所定時間間隔で、または所定条件下で解説音声の音声ファイルを再生する。 "Regularly" is set when the explanatory voice is not immediate and should be uttered periodically, such as the match title, match card, current score information, etc. The application on the mobile terminal 5 reproduces the audio file of the explanatory audio at predetermined time intervals or under predetermined conditions.

この場合、例えば図１に示した配信装置４は、解説音声制作装置１から発話ＩＤ及び再生時刻と共に、当該発話に対応するラベルを入力し、ラベルの提示タイミングが「定期」であることを判定すると、解説音声の発話が放送の主音声の発話と重ならないように、再生時刻を変更する。 In this case, for example, the distribution device 4 shown in FIG. 1 receives the utterance ID and playback time as well as the label corresponding to the utterance from the commentary audio production device 1, and determines that the label presentation timing is "regular". Then, the playback time is changed so that the utterance of the commentary audio does not overlap with the utterance of the main audio of the broadcast.

ラベルの４列目は、図４（５）に示すように、テキスト要素を内容に応じて分類したときのグループを示す。グループは、テキスト要素のカテゴリーを示す情報である。数値「１」は「試合情報」、数値「２」は「試合の種類」、・・・、数値「９」は「投手情報」、数値「１０」は「打者情報」、数値「１１」は「投手の動作」、・・・・を示す。 The fourth column of labels indicates groups when text elements are classified according to content, as shown in FIG. 4(5). A group is information indicating a category of text elements. The number "1" is "match information", the number "2" is "game type", the number "9" is "pitcher information", the number "10" is "batter information", the number "11" is Indicates the "pitcher's action"...

これにより、テキスト要素がグループで管理されるため、下位に定められた５列目の情報を一括で制御することができる。 As a result, the text elements are managed in groups, so that the information in the fifth column defined at the lower level can be controlled all at once.

ラベルの５列目は、図４（５）に示すように、グループ内の項目を示す。項目は、テキスト要素のカテゴリーをさらに細かく分類したときの情報であり、最も具体的に表した情報である。 The fifth column of labels indicates the items within the group, as shown in FIG. 4(5). Items are information obtained by further classifying categories of text elements, and are the most concrete information.

例えばラベルの４列目のグループが数値「１」の「試合情報」である場合、項目の数値「１」は「大会名」、数値「２」は「試合名（例えばＸ対Ｙ）」、数値「３」は「会場（例えばＺ球場）」、・・・を示す。また、例えばラベルの４列目のグループが数値「９」の「投手情報」である場合、項目の数値「１」は「名前（例えば鈴木）」、数値「２」は「シーズン成績（例えば今シーズンの勝敗として５勝２敗）」、数値「３」は「今日の成績（例えば今日の防御率０．５０）」を示す。また、例えばラベルの４列目のグループが数値「１１」の「投手の動作」である場合、項目の数値「１」は「かまえた」、数値「２」は「投げた」、数値「３」は「牽制」を示す。 For example, if the group in the fourth column of the label is "match information" with the number "1", the number "1" in the item is "tournament name", the number "2" is "match name (for example, X vs. Y)", The numerical value "3" indicates "venue (for example, Z Stadium)", . . . For example, if the group in the fourth column of the label is "pitcher information" with the number "9", the number "1" in the item is "name (e.g. Suzuki)" and the number "2" is "season performance (e.g. current The number "3" indicates "today's performance (for example, today's ERA 0.50)". For example, if the group in the fourth column of the label is "Pitcher's action" with the number "11", the item number "1" is "Kate", the number "2" is "Throw", and the number "3" is "Pitcher's action". ” indicates “check”.

ラベルの４列目の「グループ」において、数値「１」～「５」，「１８」は全競技共通の情報であり、情報源２からこの種のテキスト要素を取得できない場合は使用されない。ラベルの４列目の「グループ」における数値「６」，「７」，「１５」～「１７」は、ラケット競技共通の情報であり、例えば「競技種目」が「卓球」、「バドミントン」、「テニス」の場合に使用される。「競技種目」が「卓球」、「バドミントン」及び「テニス」の場合には、共通する「項目」が多いため、このような共通の情報が使用される。 In "Group" in the fourth column of the label, the numerical values "1" to "5" and "18" are information common to all competitions, and are not used if this type of text element cannot be obtained from the information source 2. The numbers "6", "7", "15" to "17" in "Group" in the fourth column of the label are common information for racket sports, for example, if the "competition" is "table tennis", "badminton", Used in the case of "tennis". When the "competitive event" is "table tennis," "badminton," and "tennis," there are many "items" in common, so such common information is used.

ラベルの４列目の「グループ」における数値「８」～「１４」は、「競技種目」が「野球」の場合の情報であるが、「競技種目」が「ソフトボール」の場合も共通の「項目」があるため、「野球」及び「ソフトボール」に共通の情報としてもよい。 The numbers "8" to "14" in "Group" in the fourth column of the label are information when "Sports Event" is "Baseball", but they are also common when "Sports Event" is "Softball". Since there is an "item", the information may be common to "baseball" and "softball".

このように、複数の情報源２を用いることで、情報管理テーブル１３の多くの「項目」にテキスト要素を格納することができるため、多くの種類の解説音声用テキストを生成することができ、表現したい解説音声の幅を広げることができる。 In this way, by using multiple information sources 2, text elements can be stored in many "items" of the information management table 13, so many types of explanatory audio texts can be generated. You can expand the range of explanatory voices you want to express.

また、ラベルの４列目のグループが数値「４０」の「助詞」である場合、項目の数値「１」は「は」、数値「２」は「の」、数値「３」は「へ」、数値「４」は「が」を示す。また、ラベルの４列目のグループが数値「４１」の「単語（位置）」である場合、項目の数値「１」は「方向へ」、数値「２」は「奥へ」、数値「３」は「手前へ」を示す。 Also, if the group in the fourth column of the label is a "particle" with the number "40", the number "1" in the item is "ha", the number "2" is "no", and the number "3" is "he". , the numerical value "4" indicates "ga". Also, if the group in the fourth column of the label is "word (position)" with the number "41", the number "1" in the item is "toward", the number "2" is "toward", and the number "3" is "towards". ” indicates “forward”.

ラベルの４列目のグループが数値「４０」の「助詞」である場合、及びラベルの４列目のグループが数値「４１」の「単語（位置）」である場合、これらのテキスト要素は、情報源２から取得されるのではなく、固定の文字列として予め当該情報管理テーブル１３に格納されている。 If the group in the fourth column of the label is a "particle" with the numerical value "40", and if the group in the fourth column of the label is "word (position)" with the numerical value "41", these text elements are Rather than being acquired from the information source 2, it is stored in advance in the information management table 13 as a fixed character string.

このような「助詞」または「単語（位置）」のテキスト要素を用いることにより、すなわち情報源２から取得されず、かつ情報管理テーブル１３に予め格納された固定のテキスト要素を用いることにより、柔軟な表現の解説音声用テキストを生成することができる。そして、携帯端末５のアプリは、人の発話に近い解説音声の音声ファイルを再生することができ、視聴者１００は、解説音声を容易に認識することができる。 By using text elements such as "particles" or "words (positions)," that is, by using fixed text elements that are not acquired from the information source 2 and stored in advance in the information management table 13, flexibility can be achieved. It is possible to generate explanatory audio text for expressions. The application on the mobile terminal 5 can play an audio file of an explanatory voice that is similar to human speech, and the viewer 100 can easily recognize the explanatory voice.

＜テンプレート１４＞
図５は、テンプレート１４の例を示す図である。このテンプレート１４には、解説音声制作装置１が生成する解説音声用テキスト毎に、すなわち発話毎に、発話番号、発話内容、ラベルの組合せ及びトリガーラベルからなる発話定義データが定義されている。新たに解説音声の種類を増やすためには、このテンプレート１４に、新たな発話の発話定義データ、すなわち発話番号、発話内容、ラベルの組合せ及びトリガーラベルを追加すればよい。 <Template 14>
FIG. 5 is a diagram showing an example of the template 14. In this template 14, utterance definition data consisting of an utterance number, utterance content, label combination, and trigger label is defined for each explanatory voice text generated by the explanatory voice production device 1, that is, for each utterance. In order to newly increase the number of types of explanatory sounds, it is sufficient to add utterance definition data of new utterances, that is, utterance numbers, utterance contents, label combinations, and trigger labels, to this template 14.

テンプレート１４に定義される発話定義データは、解説音声制作装置１を操作するユーザのキー入力により設定される。尚、図５に示すテンプレート１４の構成は一例であり、これ以外の構成であってもよい。 The utterance definition data defined in the template 14 is set by key input by the user who operates the explanatory audio production device 1. Note that the configuration of the template 14 shown in FIG. 5 is an example, and other configurations may be used.

発話番号は、発話毎の発話定義データを識別するための番号である。発話内容は、発話したい内容であり、１または複数のテキスト要素の「項目」（ラベルの５列目の「項目」）により構成される。ラベルの組合せは、発話内容に対応する１または複数のラベルにより構成される。トリガーラベルは、後述する更新監視部１５により更新が監視されるテキスト要素に対応するラベルである。 The utterance number is a number for identifying utterance definition data for each utterance. The utterance content is the content to be uttered, and is composed of one or more "items" of text elements ("items" in the fifth column of labels). A combination of labels is made up of one or more labels corresponding to the content of the utterance. The trigger label is a label corresponding to a text element whose update is monitored by the update monitoring unit 15, which will be described later.

図５の例では、発話番号１として、発話内容が「投手情報の名前」及び「投手の動作（かまえた）」、ラベルの組合せが「４－１－３－９－１」及び「５－１－１－１１－１」、トリガーラベルが「５－１－１－１１－１」の各情報が定義されている。 In the example of FIG. 5, for utterance number 1, the utterance contents are "Pitcher information name" and "Pitcher's action (kamaeta)", and the label combinations are "4-1-3-9-1" and "5- 1-1-11-1" and the trigger label is "5-1-1-11-1."

これは、情報管理テーブル１３に格納されたトリガーラベル「５－１－１－１１－１」の４，５列目「１１－１」のテキスト要素が更新されたときに、情報管理テーブル１３に格納されているラベル「４－１－３－９－１」の４，５列目「９－１」及び「５－１－１－１１－１」の４，５列目「１１－１」のテキスト要素「投手情報の名前」「投手の動作（かまえた）」からなる解説音声用テキストを生成することを示している。 This means that when the text element in the 4th and 5th columns "11-1" of the trigger label "5-1-1-11-1" stored in the information management table 13 is updated, the information management table 13 "9-1" in the 4th and 5th columns of the stored label "4-1-3-9-1" and "11-1" in the 4th and 5th columns of "5-1-1-11-1" This indicates that an explanatory audio text consisting of the text elements ``Pitcher information name'' and ``Pitcher's motion (kamaeta)'' is generated.

また、発話番号２として、発話内容が「大会名」「試合名」「国名１」及び「国名２」、ラベルの組合せ及びトリガーラベルが「１－１－３－１－１」「１－１－３－１－２」「１－１－３－３－１」及び「１－１－３－３－２」の各情報が定義されている。 In addition, as utterance number 2, the utterance content is "tournament name", "match name", "country name 1", and "country name 2", and the label combination and trigger label are "1-1-3-1-1" and "1-1". -3-1-2,” “1-1-3-3-1,” and “1-1-3-3-2” are defined.

これは、情報管理テーブル１３に格納されたトリガーラベル「１－１－３－１－１」の４，５列目「１－１」、「１－１－３－１－２」の４，５列目「１－２」、「１－１－３－３－１」の４，５列目「３－１」及び「１－１－３－３－２」の４，５列目「３－２」の全て（または少なくとも１つ）のテキスト要素が更新されたときに、情報管理テーブル１３に格納されているこれらのテキスト要素「大会名」「試合名」「国名１」及び「国名２」からなる解説音声用テキストを生成することを示している。 This is "1-1" in the 4th and 5th columns of the trigger label "1-1-3-1-1" stored in the information management table 13, 4 in the "1-1-3-1-2", 5th column "1-2", 4th and 5th columns "3-1" of "1-1-3-3-1" and 4th and 5th columns "1-1-3-3-2" When all (or at least one) text elements in "3-2" are updated, these text elements "tournament name", "match name", "country name 1", and "country name" stored in the information management table 13 are updated. 2" is generated.

また、発話番号３として、発話内容が「球種（変化球）」、ラベルの組合せが「５－１－１－１２－１」、トリガーラベルが「５－１－１－１２」の各情報が定義されている。さらに、発話番号４として、発話内容が「球種（ストレート）」、ラベルの組合せが「５－１－１－１２－２」、トリガーラベルが「５－１－１－１２」の各情報が定義されている。 In addition, as utterance number 3, the utterance content is "Pitch type (curving ball)", the label combination is "5-1-1-12-1", and the trigger label is "5-1-1-12". is defined. Furthermore, as utterance number 4, the following information is displayed: the utterance content is "pitch type (straight)", the label combination is "5-1-1-12-2", and the trigger label is "5-1-1-12". Defined.

これは、情報管理テーブル１３に格納されたトリガーラベル「５－１－１－１２」の「グループ」である「球種」について、これに属するラベル「５－１－１－１２－１」の「項目」のテキスト要素「変化球」またはラベル「５－１－１－１２－２」の「項目」のテキスト要素「ストレート」が更新されたときに、情報管理テーブル１３に格納されているラベル「５－１－１－１２－１」のテキスト要素「変化球」またはラベル「５－１－１－１２－２」のテキスト要素「ストレート」からなる解説音声用テキストを生成することを示している。 This applies to the "pitch type" which is the "group" of the trigger label "5-1-1-12" stored in the information management table 13, and the label "5-1-1-12-1" that belongs to this. The label stored in the information management table 13 when the text element "Curving ball" of the "Item" or the text element "Straight" of the "Item" of the label "5-1-1-12-2" is updated. Indicates that an explanatory audio text consisting of the text element "Curving Ball" with the label "5-1-1-12-1" or the text element "Straight" with the label "5-1-1-12-2" is to be generated. There is.

尚、トリガーラベルが１～５列目のラベル「５－１－１－１２－１」「５－１－１－１２－２」ではなく、１～４列目のラベル「５－１－１－１２」から構成されているのは、情報管理テーブル１３から、ラベル「５－１－１－１２－１」のテキスト要素「変化球」及びラベル「５－１－１－１２－２」のテキスト要素「ストレート」のうち、更新されたいずれか一方が読み出され、両方のテキスト要素が同時に読み出されることがないからである。 In addition, the trigger label is not the label "5-1-1-12-1" or "5-1-1-12-2" in the 1st to 5th columns, but the label "5-1-1" in the 1st to 4th columns. -12'' consists of the text element ``curving ball'' with the label ``5-1-1-12-1'' and the text element ``5-1-1-12-2'' from the information management table 13. This is because only one of the updated text elements "straight" is read out, and both text elements are not read out at the same time.

また、発話番号５として、発話内容が「守備位置」「単語（方向へ）」及び「打撃結果（ヒット）」、ラベルの組合せが「４－１－３－２０－１」「５－１－４－４１－１」及び「５－１－１－２１－１」、トリガーラベルが「５－１－１－２１－１」の各情報が定義されている。 Also, as utterance number 5, the utterance contents are "defensive position", "word (direction)", and "batting result (hit)", and the label combinations are "4-1-3-20-1" and "5-1-". 4-41-1" and "5-1-1-21-1", and each information with a trigger label of "5-1-1-21-1" is defined.

これは、情報管理テーブル１３に格納されたトリガーラベル「５－１－１－２１－１」のテキスト要素「打撃（ヒット）」が更新されたときに（「打撃（ヒット）」が格納されたときに）、情報管理テーブル１３に格納されているラベル「４－１－３－２０－１」「５－１－４－４１－１」「５－１－１－２１－１」のテキスト要素「守備位置」「単語（方向へ）」「打撃結果（ヒット）」からなる解説音声用テキストを生成することを示している。 This happens when the text element "Hit" of the trigger label "5-1-1-21-1" stored in the information management table 13 is updated ("Hit" is stored). ), the text elements of the labels "4-1-3-20-1", "5-1-4-41-1" and "5-1-1-21-1" stored in the information management table 13 This shows that an explanatory audio text consisting of "defense position", "word (direction)", and "batting result (hit)" is generated.

この発話番号５の例は、情報源２から取得したテキスト要素「守備位置」「打撃結果（ヒット）」に加え、これらの間に、情報源２からは取得されないテキスト要素である固定の文字列（この例では「単語（方向へ）」）を追加するものである。これにより、例えば解説音声用テキスト「レフト方向へヒット」が生成され、視聴者１００にとって理解し易い内容の解説音声用テキストを生成することができ、視聴者１００が解説音声を視聴したときの違和感をなくすことができる。 In this example of utterance number 5, in addition to the text elements "defense position" and "batting result (hit)" obtained from information source 2, there is a fixed character string between them that is a text element that is not obtained from information source 2. (In this example, "word (direction)") is added. As a result, for example, the explanatory audio text "hit in the left direction" can be generated, and the explanatory audio text with content that is easy for the viewer 100 to understand can be generated, and the viewer 100 may feel uncomfortable when viewing the explanatory audio. can be eliminated.

図６は、テンプレート１４に定義された発話定義データ、情報管理テーブル１３に格納されたラベル及びテキスト要素、Ｊｓｏｎファイル、並びに解説音声用テキストの例を説明する図である。 FIG. 6 is a diagram illustrating an example of the utterance definition data defined in the template 14, the labels and text elements stored in the information management table 13, the Json file, and the commentary audio text.

図６（１）に示すように、テンプレート１４の発話番号１の発話定義データには、発話内容「投手情報の名前」及び「投手の動作（かまえた）」、ラベルの組合せ「４－１－３－９－１」及び「５－１－１－１１－１」、トリガーラベル「５－１－１－１１－１」の各情報が定義されている。これは、図５に示した発話番号１の発話定義データと同様である。 As shown in FIG. 6(1), the utterance definition data of utterance number 1 of the template 14 includes the utterance contents "name of pitcher information" and "pitcher's action", and the label combination "4-1- Each information of "3-9-1" and "5-1-1-11-1" and a trigger label "5-1-1-11-1" are defined. This is similar to the utterance definition data of utterance number 1 shown in FIG.

ここで、情報管理テーブル１３において、ラベル「５－１－１－１１－１」のテキスト要素である「投手の動作（かまえた）」として、「かまえた」が新たに格納され、このデータが更新されたとする。 Here, in the information management table 13, "kamaeta" is newly stored as "pitcher's action (kamaeta)", which is the text element of the label "5-1-1-11-1", and this data is Suppose it has been updated.

このとき、図６（２）に示すように、更新時において、情報管理テーブル１３には、ラベル「４－１－３－９－１」が付与されたテキスト要素の「投手情報の名前」として、「ピッチャー鈴木」が格納されているものとする。また、情報管理テーブル１３には、ラベル「５－１－１－１１－１」が付与されたテキスト要素の「投手の動作（かまえた）」として、「かまえた」が格納されているものとする。 At this time, as shown in FIG. 6(2), at the time of update, the information management table 13 contains the "pitcher information name" of the text element with the label "4-1-3-9-1". , "Pitcher Suzuki" is stored. In addition, in the information management table 13, it is assumed that "kamaeta" is stored as "pitcher's action (kamaeta)" of the text element with the label "5-1-1-11-1". do.

この場合、テンプレート１４に定義された発話番号１の発話定義データに従い、情報管理テーブル１３において、トリガーラベル「５－１－１－１１－１」が付与されたテキスト要素の更新が判定され、ラベル「４－１－３－９－１」が付与されたテキスト要素「ピッチャー鈴木」及びラベル「５－１－１－１１－１」が付与されたテキスト要素「かまえた」が読み出される。 In this case, in accordance with the utterance definition data of utterance number 1 defined in the template 14, the update of the text element to which the trigger label "5-1-1-11-1" is attached is determined in the information management table 13, and the label The text element "Pitcher Suzuki" to which "4-1-3-9-1" is attached and the text element "Kamaeta" to which the label "5-1-1-11-1" is attached are read out.

そして、後述する図６（３）及び（４）のＪｓｏｎファイル及び解説音声用テキストが生成され、携帯端末５により、解説音声の音声ファイルとして「ピッチャー鈴木かまえた」が再生されることとなる。 Then, the Json file and explanatory audio text shown in FIGS. 6(3) and (4), which will be described later, are generated, and the mobile terminal 5 plays "Pitcher Suzuki Kameta" as the explanatory audio audio file.

尚、前述のとおり、図５に示したテンプレート１４の発話番号１の発話定義データには、発話内容が「投手情報の名前」及び「投手の動作（かまえた）」、ラベルの組合せが「４－１－３－９－１」及び「５－１－１－１１－１」、トリガーラベルが「５－１－１－１１－１」が定義されており、この発話定義データにより、テキスト要素「投手情報の名前」「投手の動作（かまえた）」からなる解説音声用テキストが生成される。 As mentioned above, the utterance definition data of utterance number 1 of the template 14 shown in FIG. -1-3-9-1" and "5-1-1-11-1", the trigger label is "5-1-1-11-1", and this utterance definition data allows the text element to be An explanatory audio text consisting of "name of pitcher information" and "actions of pitcher" is generated.

これに対し、この発話番号１の発話定義データに、例えば５回の更新のうち１回はテキスト要素「投手情報の名前」「投手の動作（かまえた）」からなる解説音声用テキストが生成され、４回はテキスト要素「投手の動作（かまえた）」のみからなる解説音声用テキストが生成されるようにするための付加情報を定義するようにしてもよい。 On the other hand, in the utterance definition data of utterance number 1, for example, once out of 5 updates, explanatory audio text consisting of the text elements ``name of pitcher information'' and ``pitcher's motion (kamaeta)'' is generated. , 4th inning, additional information may be defined so that an explanatory audio text consisting only of the text element "Pitcher's action" is generated.

＜更新監視部１５＞
図７は、更新監視部１５の処理例を示すフローチャートである。更新監視部１５は、読出部１６からトリガーラベルを入力し、情報管理テーブル１３においてトリガーラベルが付与されたテキスト要素が更新されたか否かを監視する（ステップＳ７０１）。尚、トリガーラベルは、読出部１６によりテンプレート１４から読み出され、読出部１６から更新監視部１５に出力される。 <Update monitoring unit 15>
FIG. 7 is a flowchart showing an example of processing by the update monitoring unit 15. The update monitoring unit 15 receives the trigger label from the reading unit 16 and monitors whether the text element to which the trigger label has been assigned in the information management table 13 has been updated (step S701). Note that the trigger label is read from the template 14 by the reading unit 16 and output from the reading unit 16 to the update monitoring unit 15.

更新監視部１５は、ステップＳ７０１において、トリガーラベルのテキスト要素が更新されていないと判定した場合（ステップＳ７０１：更新無）、ステップＳ７０１の処理を継続する。 If the update monitoring unit 15 determines in step S701 that the text element of the trigger label has not been updated (step S701: no update), it continues the process of step S701.

一方、更新監視部１５は、ステップＳ７０１において、トリガーラベルのテキスト要素が更新されたと判定した場合（ステップＳ７０１：更新有）、更新有及びトリガーラベルを読出部１６に出力する（ステップＳ７０２）。図７の処理は、テンプレート１４に定義された発話毎（発話定義データ毎）に行われる。 On the other hand, if the update monitoring unit 15 determines in step S701 that the text element of the trigger label has been updated (step S701: updated), it outputs the presence of update and the trigger label to the reading unit 16 (step S702). The process shown in FIG. 7 is performed for each utterance defined in the template 14 (for each utterance definition data).

図８は、更新監視部１５の処理例を説明する図である。図８（１）は、図５及び図６の例に示したテンプレート１４の発話番号１の発話についての処理例を示す。更新監視部１５は、読出部１６からトリガーラベル「５－１－１－１１－１」を入力する。そして、情報管理テーブル１３において、トリガーラベル「５－１－１－１１－１」が付与されたテキスト要素である「投手の動作（かまえた）」が更新され、「かまえた」が格納されたとする（図８のαを参照）。 FIG. 8 is a diagram illustrating a processing example of the update monitoring unit 15. FIG. 8(1) shows a processing example for the utterance with utterance number 1 of the template 14 shown in the examples of FIGS. 5 and 6. The update monitoring unit 15 inputs the trigger label “5-1-1-11-1” from the reading unit 16. Then, in the information management table 13, the text element "Pitcher's action (kamaeta)" with the trigger label "5-1-1-11-1" is updated, and "kamaeta" is stored. (See α in Figure 8).

そうすると、更新監視部１５は、情報管理テーブル１３におけるトリガーラベル「５－１－１－１１－１」が付与されたテキスト要素のタイムスタンプの更新を判断することで、「投手の動作（かまえた）」の更新有を判定し、更新有及びトリガーラベル「５－１－１－１１－１」を読出部１６に出力する。この場合、更新監視部１５は、トリガーラベル「５－１－１－１１－１」のテキスト要素自体の更新（「投手の動作（かまえた）」の領域に何ら格納されていない状態から、「かまえた」が格納された状態への変化による更新）を判断することで、「投手の動作（かまえた）」の更新有を判定するようにしてもよい。 Then, the update monitoring unit 15 determines whether the timestamp of the text element to which the trigger label "5-1-1-11-1" has been added in the information management table 13 has been updated. )” is updated, and outputs the update and the trigger label “5-1-1-11-1” to the reading unit 16. In this case, the update monitoring unit 15 updates the text element itself of the trigger label "5-1-1-11-1" (from a state where nothing is stored in the area of "pitcher's motion") to " It may also be possible to determine whether the "pitcher's motion (kameta)" has been updated by determining whether the "pitcher's action (kamaeta)" has been updated due to a change to the stored state.

これにより、テンプレート１４の発話番号１に定義された発話内容「投手情報の名前」及び「投手の動作（かまえた）」に対応して、テキスト要素「ピッチャー鈴木」及び「かまえた」が得られる。 As a result, text elements "Pitcher Suzuki" and "Kamaeta" are obtained corresponding to the utterance contents "Pitcher information name" and "Pitcher's action (Kamaeta)" defined in utterance number 1 of template 14. .

図８（２）は、図５の例に示したテンプレート１４の発話番号４の発話についての処理例を示す。更新監視部１５は、読出部１６からトリガーラベル「５－１－１－１２」を入力する。そして、情報管理テーブル１３において、トリガーラベル「５－１－１－１２」に対応するラベル「５－１－１－１２－１」「５－１－１－１２－２」が付与されたテキスト要素である「球種（変化球）」「球種（ストレート）」が更新され、「球種（変化球）」の領域には何ら格納されておらず、「球種（ストレート）」の領域に「ストレート」が格納されたとする（図８のβを参照）。または、トリガーラベル「５－１－１－１２」に対応するラベル「５－１－１－１２－１」「５－１－１－１２－２」のうちラベル「５－１－１－１２－２」が付与されたテキスト要素「球種（ストレート）」のタイムスタンプが更新され、「球種（ストレート）」の領域に新たに「ストレート」が格納されたとする FIG. 8B shows an example of processing for the utterance with utterance number 4 of the template 14 shown in the example of FIG. The update monitoring unit 15 inputs the trigger label “5-1-1-12” from the reading unit 16. Then, in the information management table 13, the text to which the labels "5-1-1-12-1" and "5-1-1-12-2" corresponding to the trigger label "5-1-1-12" are attached The elements "Pitch Type (Changing Ball)" and "Pitch Type (Straight)" are updated, but nothing is stored in the "Pitch Type (Changing Ball)" area, and the "Pitch Type (Straight)" area is updated. Assume that "straight" is stored in (see β in FIG. 8). Or, label "5-1-1-12" among labels "5-1-1-12-1" and "5-1-1-12-2" corresponding to trigger label "5-1-1-12" Assume that the timestamp of the text element "Pitch Type (Straight)" with "-2" is updated, and "Straight" is newly stored in the "Pitch Type (Straight)" area.

そうすると、更新監視部１５は、情報管理テーブル１３におけるトリガーラベル「５－１－１－１２」に対応するラベル「５－１－１－１２－１」「５－１－１－１２－２」に付与されたテキスト要素である「球種（変化球）」「球種（ストレート）」の更新、またはラベル「５－１－１－１２－２」に付与されたテキスト要素「球種（ストレート）」のタイムスタンプの更新を判断することで、更新有を判定し、更新有及びトリガーラベル「５－１－１－１２」を読出部１６に出力する。 Then, the update monitoring unit 15 displays the labels "5-1-1-12-1" and "5-1-1-12-2" that correspond to the trigger label "5-1-1-12" in the information management table 13. The text elements "Pitch type (Curving ball)" and "Pitch type (Straight)" assigned to the label "5-1-1-12-2" are updated. )", the presence of an update is determined, and the presence of an update and the trigger label "5-1-1-12" are output to the reading unit 16.

これにより、テンプレート１４の発話番号４に定義された発話内容「球種（ストレート）」に対応して、テキスト要素「ストレート」が得られる。この場合、タイムスタンプにて更新を判断することで、テキスト要素「ストレート」が連続して更新された場合も、連続して更新有を判定することができる。 As a result, the text element "straight" is obtained in response to the utterance content "pitch type (straight)" defined in utterance number 4 of the template 14. In this case, by determining the update based on the time stamp, it is possible to continuously determine whether the text element "STRAIGHT" has been updated even if the text element "STRAIGHT" has been updated continuously.

＜読出部１６＞
図９は、読出部１６の処理例を示すフローチャートである。読出部１６は、テンプレート１４から、発話毎の発話定義データ（発話番号、発話内容、ラベルの組合せ及びトリガーラベル）を読み出す（ステップＳ９０１）。そして、読出部１６は、発話毎のトリガーラベルを更新監視部１５に出力する（ステップＳ９０２）。 <Reading unit 16>
FIG. 9 is a flowchart showing a processing example of the reading unit 16. The reading unit 16 reads utterance definition data (utterance number, utterance content, label combination, and trigger label) for each utterance from the template 14 (step S901). Then, the reading unit 16 outputs the trigger label for each utterance to the update monitoring unit 15 (step S902).

読出部１６は、更新監視部１５から更新有を入力したか否かを判定する（ステップＳ９０３）。読出部１６は、ステップＳ９０３において、更新有を入力していないと判定した場合（ステップＳ９０３：Ｎ）、ステップＳ９０３の処理を継続する。 The reading unit 16 determines whether or not the presence of update has been input from the update monitoring unit 15 (step S903). If the reading unit 16 determines in step S903 that the presence of update has not been input (step S903: N), it continues the process of step S903.

一方、読出部１６は、ステップＳ９０３において、更新監視部１５から更新有を入力したと判定した場合（ステップＳ９０３：Ｙ）、当該更新有と共に入力したトリガーラベルに対応する発話を特定する。そして、読出部１６は、当該発話におけるラベルの組合せのそれぞれ（各ラベル）について、情報管理テーブル１３からラベル及び当該ラベルが付与されたテキスト要素を読み出す（ステップＳ９０４）。 On the other hand, if the reading unit 16 determines in step S903 that the presence of update has been input from the update monitoring unit 15 (step S903: Y), the reading unit 16 identifies the utterance corresponding to the trigger label input together with the presence of update. Then, the reading unit 16 reads out the label and the text element to which the label is attached from the information management table 13 for each combination of labels (each label) in the utterance (step S904).

ここで、読出部１６は、情報管理テーブル１３において、当該ラベル（読出対象ラベル）と同種のラベル（同種ラベル（４，５列目の数値が同じラベル））のテキスト要素が複数格納されている場合、これらのうち最先に格納された１つのラベル及びテキスト要素を読み出す（ステップＳ９０５）。同種ラベルとは、読出対象ラベルに加え、読出対象ラベルの４，５列目（グループ及び項目）と同じ数値を４，５列目に有し、かつ読出対象ラベルの１列目（情報源２の種類）と異なる数値を１列目に有するラベルをいう。具体例については後述する図１０にて説明する。 Here, the reading unit 16 stores a plurality of text elements of the same type of label (same type label (label with the same numerical value in the fourth and fifth columns)) as the label (read target label) in the information management table 13. If so, one label and text element stored first among these are read out (step S905). Similar labels are labels that have the same numerical values in the 4th and 5th columns (groups and items) of the label to be read, as well as the same numerical values in the 4th and 5th columns (groups and items) of the label to be read; A label whose first column has a numerical value different from the type of label. A specific example will be explained later with reference to FIG.

また、読出部１６は、情報管理テーブル１３において、テンプレート１４に定義されたラベル（読出対象ラベル）のテキスト要素が格納されておらず、読出対象ラベル以外の同種ラベルのテキスト要素のみが格納されている場合、読出対象ラベル以外の同種ラベル及びテキスト要素を読み出す。 The reading unit 16 also detects that in the information management table 13, text elements of labels defined in the template 14 (labels to be read) are not stored, and only text elements of labels of the same type other than the labels to be read are stored. If so, read labels and text elements of the same type other than the label to be read.

これにより、解説音声制作装置１が主となる情報源２からデータ（テンプレート１４に定義されたラベルに関連するデータ）を取得することができない場合であっても、他の情報源２からデータ（テンプレート１４に定義されたラベルにおいて１列目の情報源２の種類が異なるラベル（同種ラベル）に関連するデータ）を取得したときには、情報管理テーブル１３から当該同種ラベル及びテキスト要素を読み出し、これが反映された解説音声用テキストを生成することができる。つまり、携帯端末５は、主となる情報源２以外の情報源２から取得したデータが反映された解説音声の音声ファイルを再生することができる。 As a result, even if the explanatory audio production device 1 is unable to acquire data (data related to the labels defined in the template 14) from the main information source 2, data (data related to the labels defined in the template 14) can be obtained from other information sources 2. When data related to a label (same type label) of a different type of information source 2 in the first column in the label defined in the template 14 is acquired, the same type label and text element are read from the information management table 13, and this is reflected. It is possible to generate explanatory audio text. That is, the mobile terminal 5 can reproduce an audio file of explanatory audio in which data acquired from an information source 2 other than the main information source 2 is reflected.

図１０は、テキスト要素の読み出し例（ステップＳ９０５）を説明する図である。テンプレート１４のラベルの組合せにラベル「５－１－１－１１－１」が定義されており、情報管理テーブル１３には、ラベルの４，５列目が「１１－１」のテキスト要素として２つのテキスト要素が格納されている場合を想定する。 FIG. 10 is a diagram illustrating an example of reading out text elements (step S905). The label "5-1-1-11-1" is defined in the label combination of the template 14, and in the information management table 13, the 4th and 5th columns of the label are 2 as text elements of "11-1". Assume that two text elements are stored.

一方がラベル「５－１－１－１１－１」のテキスト要素「かまえた」ＴＡであり、そのタイムスタンプがｔ１であり、他方がラベル「４－１－１－１１－１」のテキスト要素「かまえた」ＴＢであり、そのタイムスタンプがｔ２であるとする。ｔ１＜ｔ２であり、ｔ２の方が現在時刻に近いものとする。タイムスタンプは、ラベル及びテキスト要素が情報管理テーブル１３に格納された時刻を示す。 One is the text element "Kamaeta" TA with the label "5-1-1-11-1", its timestamp is t1, and the other is the text element with the label "4-1-1-11-1" Assume that the TB is "Kamaeda" and its timestamp is t2. It is assumed that t1<t2, and t2 is closer to the current time. The timestamp indicates the time when the label and text element were stored in the information management table 13.

これらのラベルは４，５列目「１１－１」が同じであり、ラベル「５－１－１－１１－１」が読出対象ラベルであり、ラベル「５－１－１－１１－１」及び「４－１－１－１１－１」が同種ラベルである。両ラベルは、ラベルの４，５列目「１１－１」に対応する発話内容が「かまえた」である点で同じであり、ラベルの１～３列目「５－１－１」「４－１－１」における１列目の示す情報源２の種類が「入力ツール」「画像解析ツール」である点で相違する。 These labels have the same "11-1" in the 4th and 5th columns, the label "5-1-1-11-1" is the label to be read, and the label "5-1-1-11-1" and “4-1-1-11-1” are similar labels. Both labels are the same in that the utterance content corresponding to "11-1" in the 4th and 5th columns of the label is "kameta", and the utterance content corresponding to "11-1" in the 1st to 3rd columns of the label is "5-1-1" and "4". The difference is that the type of information source 2 shown in the first column in ``-1-1'' is ``input tool'' and ``image analysis tool.''

この場合、読出部１６は、テンプレート１４のラベルの組合せに定義されたラベル（読出対象ラベル）「５－１－１－１１－１」に対する同種ラベル「５－１－１－１１－１」及び「４－１－１－１１－１」について、ラベル「５－１－１－１１－１」「４－１－１－１１－１」のタイムスタンプｔ１，ｔ２を比較し、最先の（一番古い、最も過去の）タイムスタンプｔ１を特定する。そして、読出部１６は、最先に格納されたタイムスタンプｔ１のラベル「５－１－１－１１－１」及びそのテキスト要素「かまえた」ＴＡを読み出す。 In this case, the reading unit 16 reads the same label “5-1-1-11-1” and Regarding "4-1-1-11-1", compare the timestamps t1 and t2 of labels "5-1-1-11-1" and "4-1-1-11-1", and The oldest (past) timestamp t1 is specified. Then, the reading unit 16 reads out the label "5-1-1-11-1" of the timestamp t1 stored first and its text element "Kamaeta" TA.

これにより、携帯端末５は、最先に格納されたテキスト要素に対応する解説音声の音声ファイルを再生することができるため、映像に合わせたリアルタイム性を実現することができる。 Thereby, the mobile terminal 5 can reproduce the audio file of the explanatory audio corresponding to the text element stored first, so that real-time performance matching the video can be realized.

図９に戻って、読出部１６は、情報管理テーブル１３から読み出したラベル及びテキスト要素について、ラベルの２列目の数値が示す「競技種目」に応じて、予め設定された規則に従い、テキスト要素を修正する（ステップＳ９０６）。 Returning to FIG. 9, the reading unit 16 reads out the text elements according to the preset rules for the labels and text elements read from the information management table 13 according to the "competition event" indicated by the numerical value in the second column of the label. (step S906).

例えばテンプレート１４の発話定義データとして、発話内容「得点１、単語（対）、得点２」及びこれに対応するラベルの組合せ等が定義されており、読出部１６は、情報管理テーブル１３から、ラベル「４－２－１－５－１」（２列目の数値「２」は「競技種目」が「テニス」であることを示す。）のテキスト要素「得点１」として「１５」を読み出すと共に、ラベル「４－２－１－５－２」のテキスト要素「得点２」として「１５」を読み出したとする。 For example, as the utterance definition data of the template 14, the utterance content "score 1, word (pair), score 2" and the corresponding combination of labels, etc. are defined, and the reading unit 16 reads the label from the information management table 13. "15" is read out as the text element "score 1" of "4-2-1-5-1" (the number "2" in the second column indicates that the "competition" is "tennis"). , it is assumed that "15" is read out as the text element "score 2" of the label "4-2-1-5-2".

読出部１６は、当該発話について、予め設定された規則に従い、「競技種目」が「テニス」であることを判断し、発話内容を構成するテキスト要素「１５」、テキスト要素「対」及びテキスト要素「１５」（この場合の解説音声用テキストは「１５対１５」）を、テキスト要素「フィフティーン」及びテキスト要素「オール」（この場合の解説音声用テキストは「フィフティーンオール」）に修正する。 Regarding the utterance, the reading unit 16 determines that the "competition" is "tennis" according to a preset rule, and reads the text element "15", the text element "pair", and the text element that constitute the utterance content. "15" (in this case, the explanatory audio text is "15 to 15") is modified to the text element "fifteen" and the text element "all" (in this case, the explanatory audio text is "fifteen all").

また、読出部１６は、情報管理テーブル１３から、ラベル「４－４－１－５－１」（２列目の数値「４」は「バドミントン」の「競技種目」を示す。）のテキスト要素「得点１」として「１５」を読み出すと共に、ラベル「４－４－１－５－２」のテキスト要素「得点２」として「１５」を読み出したとする。 The reading unit 16 also retrieves the text element of the label “4-4-1-5-1” (the number “4” in the second column indicates the “competition event” of “badminton”) from the information management table 13. Assume that "15" is read out as "score 1" and "15" is read out as text element "score 2" of label "4-4-1-5-2".

この場合、読出部１６は、当該発話について、予め設定された規則に従い、「競技種目」が「バドミントン」であることを判断し、発話内容を構成するテキスト要素「１５」、テキスト要素「対」及びテキスト要素「１５」（この場合の解説音声用テキストは「１５対１５」）を、テキスト要素「１５」、テキスト要素「対」、テキスト要素「１５」及びテキスト要素「同点」（この場合の解説音声用テキストは「１５対１５同点」）に修正する。 In this case, the reading unit 16 determines that the "competitive event" is "badminton" in accordance with the preset rules for the utterance, and the text element "15" and the text element "pair" that constitute the utterance content. and text element "15" (in this case, the explanatory audio text is "15 to 15"), text element "15", text element "pair", text element "15" and text element "tie" (in this case, the text for explanatory audio is "15 to 15"). The audio commentary text has been revised to read "15 to 15 tie."

これにより、複数の競技種目において得点は同じであるが、発話の際の言い回しが異なる場合、その言い回しを競技種目毎に区別することができ、競技種目に合わせた解説音声の音声ファイルを再生することができる。 As a result, if the score is the same in multiple sports events, but the phrases used are different, the phrases can be distinguished for each sports event, and an audio file of explanatory audio tailored to the sports event can be played. be able to.

また、例えば読出部１６は、情報管理テーブル１３から、ラベル「５－２－１－１６－１」～「５－２－１－１６－４」及びこれらのテキスト要素「決め手（スマッシュ）」・・・、並びにラベル「５－２－１－１７－１」「５－２－１－１７－２」及びこれらのテキスト要素「結果（成功）」「結果（失敗）」を読み出し、予め設定された規則に従い、「競技種目」が「テニス」であることを判断し、これらの組合せにより得点を自動的に加算する。そして、読出部１６は、例えば得点が３０対１５のときに、新たなラベル及びテキスト要素「鈴木」「対」「田中」「サーティ」「フィフティーン」（この場合の解説音声用テキストは「鈴木対田中サーティフィフティーン」）を生成するようにしてもよい。 Further, for example, the reading unit 16 retrieves the labels "5-2-1-16-1" to "5-2-1-16-4" and these text elements "Smash" from the information management table 13. ..., and the labels "5-2-1-17-1" and "5-2-1-17-2" and their text elements "Result (success)" and "Result (failure)" are read out and the preset values are read. According to the established rules, it is determined that the ``sports event'' is ``tennis,'' and points are automatically added based on these combinations. Then, for example, when the score is 30 to 15, the reading unit 16 adds a new label and text element "Suzuki" "VS" "Tanaka" "Thirty" "Fifteen" (in this case, the explanatory audio text is "Suzuki vs. Tanaka "Thirty Fifteen").

このように、例えばサーブを行う選手のテキスト要素（例えば「鈴木」）が特定された場合、テキスト要素「鈴木」「ダブルフォルト」、「鈴木」「サービスエース」、「鈴木」「リターンエース」等のように、読出部１６における予め設定された規則に従い、テキスト要素を修正したり、新たなテキスト要素を生成したりすることができる。 In this way, for example, if the text element of a player serving (for example, "Suzuki") is specified, the text elements "Suzuki", "Double Fault", "Suzuki", "Service Ace", "Suzuki", "Return Ace", etc. According to the preset rules in the reading unit 16, text elements can be modified or new text elements can be generated.

図９に戻って、読出部１６は、発話毎に、１または複数のラベル及びこれに対応する１または複数のテキスト要素をフォーマット変換部１７に出力する（ステップＳ９０７）。 Returning to FIG. 9, the reading unit 16 outputs one or more labels and one or more text elements corresponding thereto to the format converting unit 17 for each utterance (step S907).

＜フォーマット変換部１７＞
図１１は、フォーマット変換部１７の処理例を示すフローチャートである。フォーマット変換部１７は、読出部１６から、発話毎に、１または複数のラベル及びこれに対応する１または複数のテキスト要素を入力する（ステップＳ１１０１）。 <Format converter 17>
FIG. 11 is a flowchart showing an example of processing by the format conversion unit 17. The format conversion unit 17 inputs one or more labels and one or more text elements corresponding thereto for each utterance from the reading unit 16 (step S1101).

フォーマット変換部１７は、発話毎に、後述するＪｓｏｎファイルを識別するためのＩＤを付与する（ステップＳ１１０２）。そして、フォーマット変換部１７は、後述するステップＳ１１０４におけるＪｓｏｎファイルを生成する際の時刻を基準として、音声合成装置３による音声合成処理の時間等による遅れを考慮することで再生時刻を設定する（ステップＳ１１０３）。再生時刻は、携帯端末５が解説音声の音声ファイルを再生するときの時刻であり、後段の順序破棄制御部１９及び図１に示した配信装置４にて再設定されることがあり得る。 The format conversion unit 17 assigns an ID for identifying a Json file, which will be described later, to each utterance (step S1102). Then, the format conversion unit 17 sets the playback time by taking into account the delay due to the time of the speech synthesis process by the speech synthesis device 3, etc., based on the time when the Json file is generated in step S1104, which will be described later. S1103). The reproduction time is the time when the mobile terminal 5 reproduces the audio file of the commentary audio, and may be reset by the subsequent order discard control unit 19 and the distribution device 4 shown in FIG. 1.

フォーマット変換部１７は、予め設定されたデータフォーマットに従い、発話毎に、ＩＤ及び再生時刻、並びに１または複数のラベル及びこれに対応する１または複数のテキスト要素を含むＪｓｏｎファイルを生成する。そして、フォーマット変換部１７は、発話毎のＪｓｏｎファイルをテキスト生成部１８及び順序破棄制御部１９に出力する（ステップＳ１１０４）。 The format conversion unit 17 generates, for each utterance, a Json file that includes an ID, playback time, one or more labels, and one or more text elements corresponding thereto, according to a preset data format. Then, the format converter 17 outputs the Json file for each utterance to the text generator 18 and the order discard controller 19 (step S1104).

図６（１）～（３）を参照して、読出部１６により、情報管理テーブル１３から、ラベル「４－１－３－９－１」及びこれに対応するテキスト要素「ピッチャー鈴木」、並びにラベル「５－１－１－１１－１」及びこれに対応するテキスト要素「かまえた」が読み出され、フォーマット変換部１７は、当該発話について、ラベル「４－１－３－９－１」及びこれに対応するテキスト要素「ピッチャー鈴木」、並びにラベル「５－１－１－１１－１」及びこれに対応するテキスト要素「かまえた」を入力する。そして、フォーマット変換部１７は、当該発話のＩＤを付与し、再生時刻を設定する。 6(1) to (3), the reading unit 16 retrieves from the information management table 13 the label "4-1-3-9-1" and the corresponding text element "Pitcher Suzuki", and The label "5-1-1-11-1" and the corresponding text element "kamaeta" are read out, and the format conversion unit 17 converts the utterance into a label "4-1-3-9-1". and the corresponding text element "Pitcher Suzuki", as well as the label "5-1-1-11-1" and the corresponding text element "Kamaeta" are input. Then, the format conversion unit 17 assigns an ID to the utterance and sets a playback time.

フォーマット変換部１７は、図６（３）に示すように、ＩＤ「０００・・・２７２４」、再生時刻「２０２１－０３－２３・・・２２３３７０５Ｚ」、１番目のラベル「４－１－３－９－１」及びテキスト要素「ピッチャー鈴木」、並びに２番目のラベル「５－１－１－１１－１」及びテキスト要素「かまえた」を含むＪｓｏｎファイルを生成する。 As shown in FIG. 6(3), the format conversion unit 17 inputs the ID "000...2724", the playback time "2021-03-23...2233705Z", and the first label "4-1-3-". 9-1'' and the text element ``Pitcher Suzuki'', as well as the second label ``5-1-1-11-1'' and the text element ``Kamaeta'' are generated.

これにより、複数の情報源２から取得された異なるデータフォーマットのデータを統一化することができ、情報源２の素性が反映されていないＪｓｏｎファイルが生成される。 Thereby, data in different data formats acquired from a plurality of information sources 2 can be unified, and a Json file that does not reflect the identity of the information sources 2 is generated.

＜テキスト生成部１８＞
図１２は、テキスト生成部１８の処理例を示すフローチャートである。テキスト生成部１８は、フォーマット変換部１７から発話毎のＪｓｏｎファイルを入力する（ステップＳ１２０１）。 <Text generation unit 18>
FIG. 12 is a flowchart showing an example of processing by the text generation unit 18. The text generation unit 18 receives a Json file for each utterance from the format conversion unit 17 (step S1201).

テキスト生成部１８は、Ｊｓｏｎファイルから１または複数のテキスト要素を順番に抽出し（ステップＳ１２０２）、１または複数のテキスト要素からなる解説音声用テキストを生成する（ステップＳ１２０３）。 The text generation unit 18 sequentially extracts one or more text elements from the Json file (step S1202), and generates explanatory audio text consisting of one or more text elements (step S1203).

図６の例では、図６（３）に示したＪｓｏｎファイルからテキスト要素「ピッチャー鈴木」「かまえた」が抽出され、図６（４）に示すように、解説音声用テキスト「ピッチャー鈴木かまえた」が生成される。 In the example in Figure 6, the text elements "Pitcher Suzuki" and "Kamaeta" are extracted from the Json file shown in Figure 6 (3), and the explanatory audio text "Pitcher Suzuki Kameta" is extracted as shown in Figure 6 (4). " is generated.

図１２に戻って、テキスト生成部１８は、Ｊｓｏｎファイルに対し（当該発話に対し）、Ｊｓｏｎファイル（発話）を識別するためのユニークな発話ＩＤを付与する（ステップＳ１２０４）。そして、テキスト生成部１８は、ステップＳ１２０３にて生成した解説音声用テキストの文字数を求め、文字数に基づき、所定の算出処理にて解説音声用テキストの音声ファイル（ｗａｖ（Waveform Audio File Format）ファイル）の時間長を算出する（ステップＳ１２０５）。文字数から音声ファイルの時間長を算出するための処理は既知であるため、ここでは説明を省略する。 Returning to FIG. 12, the text generation unit 18 gives the Json file (to the utterance) a unique utterance ID for identifying the Json file (utterance) (step S1204). Then, the text generation unit 18 calculates the number of characters in the explanatory audio text generated in step S1203, and performs a predetermined calculation process based on the number of characters to create an audio file (wav (Waveform Audio File Format) file) of the explanatory audio text. The time length is calculated (step S1205). Since the process for calculating the time length of an audio file from the number of characters is well known, the description thereof will be omitted here.

テキスト生成部１８は、発話ＩＤ及び時間長を順序破棄制御部１９に出力し（ステップＳ１２０６）、発話ＩＤ及び解説音声用テキストを音声合成装置３へ出力する（ステップＳ１２０７）。 The text generation unit 18 outputs the utterance ID and time length to the order discard control unit 19 (step S1206), and outputs the utterance ID and explanatory audio text to the speech synthesis device 3 (step S1207).

これにより、音声合成装置３は、発話ＩＤを用いて解説音声用テキストの音声ファイルを管理することができる。また、後段の順序破棄制御部１９は、時間長を用いて、発話毎の再生時刻を再設定することができる。 Thereby, the speech synthesis device 3 can manage the audio file of the explanatory audio text using the utterance ID. Further, the subsequent order discard control unit 19 can reset the playback time of each utterance using the time length.

尚、テキスト生成部１８は、フォーマット変換部１７から入力したＪｓｏｎファイルに基づいて解説音声用テキストを生成するようにしたが、読出部１６から発話毎の１または複数のテキスト要素を入力し、解説音声用テキストを生成するようにしてもよい。 Note that although the text generation unit 18 generates the explanatory voice text based on the Json file input from the format conversion unit 17, it also inputs one or more text elements for each utterance from the reading unit 16, and generates the explanatory audio text based on the Json file input from the format conversion unit 17. Audio text may also be generated.

＜順序破棄制御部１９＞
図１３は、順序破棄制御部１９の処理例を示すフローチャートである。順序破棄制御部１９は、フォーマット変換部１７からの入力があるか否か（入力タイミングであるか否か）を判定する（ステップＳ１３０１）。順序破棄制御部１９は、ステップＳ１３０１において、入力があると判定した場合（ステップＳ１３０１：Ｙ）、ステップＳ１３０２へ移行する。一方、順序破棄制御部１９は、ステップＳ１３０１において、入力がないと判定した場合（ステップＳ１３０１：Ｎ）、ステップＳ１３０８へ移行する。 <Order discard control unit 19>
FIG. 13 is a flowchart showing a processing example of the order discard control unit 19. The order discard control unit 19 determines whether there is an input from the format conversion unit 17 (whether it is input timing or not) (step S1301). If the order discard control unit 19 determines in step S1301 that there is an input (step S1301: Y), the process proceeds to step S1302. On the other hand, if the order discard control unit 19 determines in step S1301 that there is no input (step S1301: N), the process proceeds to step S1308.

順序破棄制御部１９は、ステップＳ１３０１（Ｙ）から移行して、フォーマット変換部１７からＪｓｏｎファイルを入力すると共に、テキスト生成部１８から当該Ｊｓｏｎファイルに対応する発話ＩＤ及び時間長を入力する（ステップＳ１３０２）。 Moving from step S1301 (Y), the order discard control unit 19 inputs the Json file from the format conversion unit 17, and inputs the utterance ID and time length corresponding to the Json file from the text generation unit 18 (step S1302).

順序破棄制御部１９は、入力したＪｓｏｎファイルから再生時刻、及び１または複数のラベルを抽出し、発話ＩＤ、再生時刻、１または複数のラベル、及び時間長からなる発話データを生成し、配列の最後部に追加する（ステップＳ１３０３）。これにより、情報管理テーブル１３において更新されたテキスト要素に対応する発話データが、配列に追加される。尚、発話データは、発話ＩＤ及びＪｓｏｎファイルからなるようにしてもよい。 The order discard control unit 19 extracts the playback time and one or more labels from the input Json file, generates utterance data consisting of the utterance ID, playback time, one or more labels, and time length, and creates an array of utterance data. It is added to the last part (step S1303). As a result, the utterance data corresponding to the text element updated in the information management table 13 is added to the array. Note that the utterance data may include an utterance ID and a Json file.

ここで、配列は、発話毎の１または複数の発話データ、すなわちこれから解説音声として発話される１または複数の音声ファイルに対応する１または複数の発話データにより構成される。発話される解説音声がない場合は、配列には発話データは存在しない。配列には、フォーマット変換部１７からＪｓｏｎファイルが入力される毎に、当該Ｊｓｏｎファイルに対応する発話データが追加される。また、後述するステップＳ１３０６等の処理により、配列内の発話データが破棄される。 Here, the array is composed of one or more utterance data for each utterance, that is, one or more utterance data corresponding to one or more audio files that will be uttered as explanatory audio. If there is no explanatory audio to be uttered, no utterance data exists in the array. Every time a Json file is input from the format conversion unit 17, speech data corresponding to the Json file is added to the array. Further, the utterance data in the array is discarded by processing such as step S1306, which will be described later.

順序破棄制御部１９は、配列内の複数の発話データについて、ラベルの３列目の「提示タイミング」に基づいて、「即時」＞「準即時」＞「定期」＞「その他」の優先度となるように、配列内における複数の発話データの順序を決定して並び替える（ステップＳ１３０４）。 The order discard control unit 19 assigns priorities of "immediate" > "semi-immediate" > "regular" > "other" based on the "presentation timing" in the third column of labels for the plurality of utterance data in the array. The order of the plurality of utterance data in the array is determined and rearranged so that the sequence becomes (step S1304).

ここで、順序破棄制御部１９は、発話データに（発話に）複数のラベルが含まれる場合、複数のラベルのうち１つでもその３列目が「定期」の数値「３」であるとき、発話データとしての「提示タイミング」を「定期」として扱う。また、順序破棄制御部１９は、複数のラベルに「定期」の数値「３」がなく、１つでも「即時」の数値「１」があるとき、発話データとしての「提示タイミング」を「即時」として扱う。また、順序破棄制御部１９は、複数のラベルに「定期」の数値「３」がなく、１つでも「準即時」の数値「２」があるとき、発話データとしての「提示タイミング」を「準即時」として扱う。 Here, when the utterance data includes a plurality of labels (utterances), and the third column of even one of the plurality of labels is "3" indicating "regular", The "presentation timing" as speech data is treated as "regular". In addition, when multiple labels do not have the numerical value "3" for "regular" and at least one label has the numerical value "1" for "immediate", the order discard control unit 19 changes the "presentation timing" of the speech data to "immediate". ”. In addition, when multiple labels do not have the numerical value "3" for "regular" and at least one label has the numerical value "2" for "semi-immediate", the order discard control unit 19 changes the "presentation timing" as speech data to " treated as "semi-immediate".

映像及び画像については、同時に複数の情報を伝えることができるが、音声については、同時に複数の情報を提示したとしても、視聴者１００は、内容を理解して受け取ることが困難である。特に、解説音声制作装置１が使用される解説音声サービスでは、伝えたい情報が同時に複数あった場合でも、必ず１つずつ順番に伝える必要がある。 With video and images, multiple pieces of information can be conveyed at the same time, but with audio, even if multiple pieces of information are presented at the same time, it is difficult for the viewer 100 to understand and receive the content. In particular, in an explanation audio service in which the explanation audio production device 1 is used, even if there is a plurality of information to be conveyed at the same time, it is necessary to convey the information one by one in order.

このため、ステップＳ１３０４の処理において優先度に従い発話データの順序を入れ替え、後述するステップＳ１３０６等の処理において発話データを破棄することで、１または複数の発話データからなる配列を構成するようにした。 Therefore, by rearranging the order of speech data according to the priority in the process of step S1304 and discarding the speech data in processes such as step S1306, which will be described later, an array consisting of one or more speech data is constructed.

順序破棄制御部１９は、配列内の複数の発話データについて、再生時刻を再設定する（ステップＳ１３０５）。具体的には、順序破棄制御部１９は、発話データについて、配列内の順序、並びに配列内の発話データの再生時刻及び時間長に基づいて、その再生時刻を再設定する。 The order discard control unit 19 resets the playback times for the plurality of utterance data in the array (step S1305). Specifically, the order discard control unit 19 resets the reproduction time of the utterance data based on the order in the array and the reproduction time and time length of the utterance data in the array.

例えば、ステップＳ１３０３にて追加された発話データについて、ステップＳ１３０４にてその順序が配列内の２番目に決定されたとする。この場合、順序破棄制御部１９は、配列内の発話データを並び替える前の先頭の発話データの再生時刻に、並び替えた後の先頭の発話データの時間長を加算することで、加算結果の時刻を、並び替えた後の２番目に決定された発話データの再生時刻として再設定する。 For example, assume that the order of the speech data added in step S1303 is determined to be second in the array in step S1304. In this case, the order discard control unit 19 adds the time length of the first utterance data after rearrangement to the playback time of the first utterance data before rearranging the utterance data in the array. The time is reset as the reproduction time of the second determined utterance data after the rearrangement.

尚、順序破棄制御部１９は、テキスト生成部１８により算出された時間長を用いるようにしたが、音声合成装置３により算出された時間長を用いて、発話データの再生時刻を再設定するようにしてもよい。音声合成装置３は、音声ファイルを生成する際に時間長を算出するため、テキスト生成部１８よりも精度の高い時間長を算出することができる。したがって、順序破棄制御部１９は、音声合成装置３により算出された時間長を用いることで、精度の高い再生時刻を再設定することができる。 Note that although the order discard control unit 19 uses the time length calculated by the text generation unit 18, it is also possible to reset the playback time of the speech data using the time length calculated by the speech synthesis device 3. You can also do this. Since the speech synthesis device 3 calculates the time length when generating the audio file, it can calculate the time length with higher accuracy than the text generation unit 18. Therefore, the order discard control unit 19 can reset the playback time with high accuracy by using the time length calculated by the speech synthesis device 3.

順序破棄制御部１９は、配列内に４，５列目が同じ同種ラベルの発話データが複数ある場合、古い再生時刻の発話データを破棄する（ステップＳ１３０６）。 If there is a plurality of utterance data with the same type of label in the fourth and fifth columns in the array, the order discard control unit 19 discards the utterance data with the older playback time (step S1306).

具体的には、順序破棄制御部１９は、配列内の複数の発話データから、４，５列目が同じ数値のラベルを含む発話データ（同種ラベルの発話データ）を特定する。そして、順序破棄制御部１９は、特定した複数の発話データのうち、最新の再生時刻を含む発話データを残し、これ以外の古い（最新の再生時刻よりも過去の）再生時刻を含む１または複数の発話データを破棄する。この場合、順序破棄制御部１９は、破棄の処理の後の配列内の発話データについて、ステップＳ１３０５と同様の処理にて、再生時刻を再設定する。 Specifically, the order discard control unit 19 identifies speech data whose fourth and fifth columns include labels with the same numerical value (speech data with the same type of label) from a plurality of speech data in the array. Then, the order discard control unit 19 leaves the utterance data including the latest reproduction time among the plurality of identified utterance data, and leaves one or more utterance data including old reproduction time (past than the latest reproduction time). utterance data is discarded. In this case, the order discard control unit 19 resets the playback time for the utterance data in the array after the discard process, using the same process as step S1305.

これにより、後述するステップＳ１３０９にて、配列内の先頭の発話データにおける発話ＩＤ及び再生時刻を出力するまでの待ち状態の間に、同種ラベルのテキスト要素が情報管理テーブル１３において更新され、この発話データが配列に追加された場合、古い再生時刻を含む発話データが破棄され、最新の再生時刻を含む発話データ、すなわち追加された発話データに対応する解説音声の音声ファイルが携帯端末５にて再生されることとなる。したがって、映像に合わせた解説音声の音声ファイルを再生する際に、最新の試合状況の内容が反映された解説音声を視聴者１００に提示することができ、映像に合わせたリアルタイム性を実現することができる。 As a result, in step S1309, which will be described later, text elements with similar labels are updated in the information management table 13 during the waiting state until the utterance ID and playback time of the first utterance data in the array are output. When data is added to the array, the utterance data including the old playback time is discarded, and the utterance data including the latest playback time, that is, the audio file of the explanatory audio corresponding to the added utterance data is played back on the mobile terminal 5. It will be done. Therefore, when playing an audio file of commentary audio that matches the video, it is possible to present the commentary audio that reflects the contents of the latest match situation to the viewer 100, and to realize real-time performance that matches the video. Can be done.

順序破棄制御部１９は、配列内の発話データについて、配列内に追加されたときから一定時間（予め設定された時間）経過した発話データを破棄し（ステップＳ１３０７）、ステップＳ１３０８へ移行する。この場合も、順序破棄制御部１９は、破棄の処理の後の配列内の発話データについて、ステップＳ１３０５と同様の処理にて、再生時刻を再設定する。 Regarding the utterance data in the array, the order discard control unit 19 discards the utterance data for which a certain period of time (a preset time) has passed since it was added to the array (step S1307), and moves to step S1308. In this case as well, the order discard control unit 19 resets the playback time for the utterance data in the array after the discard process, using the same process as step S1305.

順序破棄制御部１９は、ステップＳ１３０１（Ｎ）またはステップＳ１３０７から移行して、当該順序破棄制御部１９からの出力があるか否か（出力タイミングであるか否か）を判定する（ステップＳ１３０８）。 The order discard control unit 19 moves from step S1301 (N) or step S1307 and determines whether there is an output from the order discard control unit 19 (whether it is the output timing or not) (step S1308). .

順序破棄制御部１９は、ステップＳ１３０８において、出力があると判定した場合（ステップＳ１３０８：Ｙ）、ステップＳ１３０９へ移行する。一方、順序破棄制御部１９は、ステップＳ１３０８において、出力がないと判定した場合（ステップＳ１３０８：Ｎ）、当該処理を終了し、ステップＳ１３０１から処理を再度開始する。 If the order discard control unit 19 determines in step S1308 that there is an output (step S1308: Y), the process proceeds to step S1309. On the other hand, if the order discard control unit 19 determines in step S1308 that there is no output (step S1308: N), it ends the process and restarts the process from step S1301.

順序破棄制御部１９は、ステップＳ１３０８（Ｙ）から移行して、配列内の先頭の発話データから発話ＩＤ及び再生時刻を抽出し、発話ＩＤ及び再生時刻を配信装置４へ出力する（ステップＳ１３０９）。そして、順序破棄制御部１９は、配列内の先頭の発話データを破棄する（ステップＳ１３１０）。 The order discard control unit 19 moves from step S1308 (Y), extracts the utterance ID and playback time from the first utterance data in the array, and outputs the utterance ID and playback time to the distribution device 4 (step S1309). . Then, the order discard control unit 19 discards the first utterance data in the array (step S1310).

図１４は、順序破棄制御部１９における配列内の発話データの変遷を説明する図である。（Ａ）の配列が構成されているものとする。この配列の先頭は「投手名（投手情報の名前）＋かまえた（投手の動作（かまえた））」の発話データ＜ａ＞であり、ラベルの３列目の「提示タイミング」は「即時」（［１］）である。また、この配列には、「試合名＋イニング＋スコア（得点１、得点２）」の発話データ＜ｐ＞が含まれており、ラベルの３列目の「提示タイミング」は「定期」（［３］）である。発話データ＜ｐ＞の解説音声用テキストは「Ｘ対Ｙ６回裏７対０」である。 FIG. 14 is a diagram illustrating the transition of utterance data in an array in the order discard control unit 19. Assume that the array (A) is configured. The beginning of this array is the utterance data <a> of "Pitcher name (Pitcher information name) + Kamaeta (Pitcher's action (Kamaeta))", and the "Presentation timing" in the third column of labels is "Immediate". ([1]). This array also includes the utterance data of "game name + innings + score (score 1, score 2)", and the "presentation timing" in the third column of the label is "regular" ([ 3]). The explanatory audio text for the utterance data is "X vs. Y, bottom of the 6th inning, 7 to 0."

（Ｂ）に示すように、順序破棄制御部１９は、ステップＳ１３０９において、配列の先頭の発話データ＜ａ＞における発話ＩＤ及び再生時刻を出力し、ステップＳ１３１０において、発話データ＜ａ＞を破棄する。これにより、（Ａ）の配列から発話データ＜ａ＞が破棄され、（Ｃ）の配列が構成される。 As shown in (B), the order discard control unit 19 outputs the utterance ID and playback time of the first utterance data <a> in the array in step S1309, and discards the utterance data <a> in step S1310. . As a result, the utterance data <a> is discarded from the array (A), and the array (C) is constructed.

そして、（Ｄ）に示すように、順序破棄制御部１９は、ステップＳ１３０２において、「打者名（打者情報の名前）＋打った（打者の動作）」の発話データ＜ｂ＞のＪｓｏｎファイル、「球速」の発話データ＜ｃ＞のＪｓｏｎファイル、及び「守備位置＋方向へ（単語（方向へ））＋ヒット（打撃結果（ヒット））」の発話データ＜ｄ＞のＪｓｏｎファイルを入力する。発話データ＜ｂ＞に含まれるラベルの３列目の「提示タイミング」は「即時」（［１］）であり、発話データ＜ｃ＞に含まれるラベルの３列目の「提示タイミング」は「準即時」（［２］）である。また、発話データ＜ｄ＞に含まれるラベルの３列目の「提示タイミング」は「即時」（［１］）である。 Then, as shown in (D), in step S1302, the order discard control unit 19 creates a Json file of the utterance data of "batter name (name of batter information) + hit (batter's action)", " A Json file of speech data <c> of "ball speed" and a Json file of speech data <d> of "defense position + direction (word (direction)) + hit (batting result (hit))" are input. The "presentation timing" in the third column of the label included in speech data is "immediate" ([1]), and the "presentation timing" in the third column of the label included in speech data <c> is "immediate" ([1]). "Semi-immediate" ([2]). Further, the "presentation timing" in the third column of the labels included in the utterance data <d> is "immediate" ([1]).

（Ｅ）に示すように、順序破棄制御部１９は、ステップＳ１３０４において、配列内の複数の発話データにつき、ラベルの３列目の「提示タイミング」に基づいて「即時」＞「準即時」＞「定期」＞「その他」の優先度となるように、発話データの順序を決定して並び替え、ステップＳ１３０５において、発話データの再生時刻を再設定し、（Ｆ）の配列を構成する。 As shown in (E), in step S1304, the order discard control unit 19 selects "immediate">"semi-immediate">based on the "presentation timing" in the third column of labels for the plurality of utterance data in the array. The order of the utterance data is determined and rearranged so that the priority is "regular">"other", and in step S1305, the playback time of the utterance data is reset to form the array (F).

（Ｆ）の配列の先頭は発話データ＜ｂ＞であり、ラベルの３列目の「提示タイミング」は「即時」（［１］）である。２番目は発話データ＜ｄ＞であり、ラベルの３列目の「提示タイミング」は「即時」（［１］）である。また、３番目は発話データ＜ｃ＞であり、ラベルの３列目の「提示タイミング」は「準即時」（［２］）である。尚、４番目以降の発話データについては、ラベルの３列目の「提示タイミング」が「準即時」（［２］）、「定期」（［３］）または「その他」（「４」）であるとする。 The beginning of the array in (F) is speech data , and the "presentation timing" in the third column of labels is "immediate" ([1]). The second is speech data <d>, and the "presentation timing" in the third column of labels is "immediate" ([1]). The third item is speech data <c>, and the “presentation timing” in the third column of labels is “semi-immediate” ([2]). For the fourth and subsequent utterance data, the "presentation timing" in the third column of the label is "semi-immediate" ([2]), "regular" ([3]), or "other" ("4"). Suppose there is.

そして、（Ｇ）に示すように、順序破棄制御部１９は、ステップＳ１３０９において、配列の先頭の発話データ＜ｂ＞における発話ＩＤ及び再生時刻を出力し、ステップＳ１３１０において、発話データ＜ｂ＞を破棄する。これにより、（Ｆ）の配列から発話データ＜ｂ＞が破棄される。また、順序破棄制御部１９は、ステップＳ１３０９において、配列の先頭の発話データ＜ｄ＞における発話ＩＤ及び再生時刻を出力し、ステップＳ１３１０において、発話データ＜ｄ＞を破棄する。これにより、配列から発話データ＜ｄ＞が破棄される。 Then, as shown in (G), the order discard control unit 19 outputs the utterance ID and playback time of the utterance data at the beginning of the array in step S1309, and in step S1310, the order discard control unit 19 outputs the utterance ID and reproduction time of the utterance data at the beginning of the array. Discard. As a result, speech data is discarded from the array (F). Further, the order discard control unit 19 outputs the utterance ID and playback time of the utterance data <d> at the beginning of the array in step S1309, and discards the utterance data <d> in step S1310. As a result, the utterance data <d> is discarded from the array.

そして、（Ｈ）に示すように、順序破棄制御部１９は、ステップＳ１３０２において、「レフトから３塁方向へ送球」の発話データ＜ｅ＞のＪｓｏｎファイル、及び「試合名＋イニング＋スコア」の発話データ＜ｆ＞のＪｓｏｎファイルを入力する。発話データ＜ｅ＞に含まれるラベルの３列目の「提示タイミング」は「即時」（［１］）であり、発話データ＜ｆ＞に含まれるラベルの３列目の「提示タイミング」は「定期」（［３］）である。 Then, as shown in (H), in step S1302, the order discard control unit 19 creates a Json file of the utterance data <e> of "throwing the ball from left field to third base" and "game name + inning + score". Input the Json file of speech data <f>. The "presentation timing" in the third column of the label included in the utterance data <e> is "immediate" ([1]), and the "presentation timing" in the third column of the label included in the utterance data <f> is "immediate" ([1]). Periodic” ([3]).

ここで、発話データ＜ｆ＞及び発話データ＜ｐ＞のラベルは同種ラベルである。前述の発話データ＜ｐ＞の解説音声用テキストは「Ｘ対Ｙ６回裏７対０」であるのに対し、更新後の発話データ＜ｆ＞の解説音声用テキストは「Ｘ対Ｙ６回裏７対１」である。 Here, the labels of the utterance data <f> and the utterance data are of the same type. The audio commentary text for the aforementioned utterance data is "X vs. Y, bottom of the 6th inning, 7-0", whereas the audio commentary text for the updated utterance data <f> is "X vs. Y, 6th inning". Back 7 to 1”.

（Ｉ）に示すように、順序破棄制御部１９は、ステップＳ１３０４において、配列内の複数の発話データにつき、ラベルの３列目の「提示タイミング」に基づいて「即時」＞「準即時」＞「定期」＞「その他」の優先度となるように、発話データの順序を決定して並び替え、ステップＳ１３０５において、発話データの再生時刻を再設定する。 As shown in (I), in step S1304, the order discard control unit 19 selects "immediate">"semi-immediate">based on the "presentation timing" in the third column of labels for the plurality of utterance data in the array. The order of the utterance data is determined and rearranged so that the priority is "regular">"other", and the playback time of the utterance data is reset in step S1305.

この場合、順序破棄制御部１９は、「提示タイミング」が「定期」である更新のあった発話データ＜ｆ＞について、「提示タイミング」を「即時」に設定して発話データの順序を決定して並び替える。そして、順序破棄制御部１９は、「提示タイミング」が「即時」に設定された発話データ＜ｆ＞の次から最後の発話データまでの間について、発話データ＜ｆ＞における「提示タイミング」を「定期」としたまま、予め設定された時間間隔となるように、発話データ＜ｆ＞を挿入する。 In this case, the order discard control unit 19 sets the "presentation timing" to "immediate" for the updated speech data <f> whose "presentation timing" is "regular" and determines the order of the speech data. Sort by. Then, the order discard control unit 19 sets the "presentation timing" of the utterance data <f> to "immediately" for the period from the next to the last utterance data of the utterance data <f> for which the "presentation timing" is set to "immediate". The utterance data <f> is inserted at a preset time interval while leaving the setting as ``periodic''.

これにより、発話データ＜ｆ＞の解説音声用テキスト「Ｘ対Ｙ６回裏７対１」が即時に生成されると共に、その後は定期的に生成されることとなる。 As a result, the explanatory audio text "X vs. Y, bottom of the 6th inning, 7 vs. 1" for the utterance data <f> is generated immediately, and thereafter is generated periodically.

そして、（Ｊ）に示すように、順序破棄制御部１９は、ステップＳ１３０６において、配列内に同種ラベルの発話データ＜ｆ＞＜ｐ＞があり、発話データ＜ｐ＞が発話データ＜ｆ＞に更新されたため、新たに追加した発話データ＜ｆ＞以外の発話データ＜ｐ＞を破棄する。これにより、配列から古い再生時刻の解説音声用テキスト「Ｘ対Ｙ６回裏７対０」に対応する発話データ＜ｐ＞が破棄され、最新の再生時刻の解説音声用テキスト「Ｘ対Ｙ６回裏７対１」に対応する発話データ＜ｆ＞が残ることとなる。また、順序破棄制御部１９は、ステップＳ１３０７において、配列から一定時間経過した発話データ＜ｃ＞を破棄する。これにより、（Ｋ）の配列が構成される。 Then, as shown in (J), in step S1306, the order discard control unit 19 determines that there is speech data <f> with the same type of label in the array, and that the speech data is the same as the speech data <f>. Since it has been updated, the utterance data other than the newly added utterance data <f> are discarded. As a result, the utterance data corresponding to the explanatory audio text "X vs. Y, bottom of the 6th inning, 7-0" with the oldest playback time is discarded from the array, and the explanatory audio text "X vs. Y 6" with the latest playback time is discarded. The utterance data <f> corresponding to ``7 to 1 at the back of the inning'' remains. Furthermore, in step S1307, the order discard control unit 19 discards the utterance data <c> after a certain period of time has passed since the arrangement. As a result, the array (K) is constructed.

（Ｋ）の配列の先頭は発話データ＜ｅ＞であり、ラベルの３列目の「提示タイミング」は「即時」（［１］）である。２番目は発話データ＜ｆ＞であり、ラベルの３列目の「提示タイミング」は「即時」（［１］）である。この発話データ＜ｆ＞については、Ｊｓｏｎファイルを入力したときの「提示タイミング」は「定期」（［３］）であるため、予め設定された時間間隔となるように、発話データ＜ｆ’＞として配列内に挿入される。 The beginning of the array in (K) is the utterance data <e>, and the "presentation timing" in the third column of labels is "immediate" ([1]). The second is speech data <f>, and the "presentation timing" in the third column of labels is "immediate" ([1]). Regarding this utterance data <f>, since the "presentation timing" when inputting the Json file is "regular" ([3]), the utterance data <f'> is displayed at a preset time interval. is inserted into the array as .

これにより、発話データ＜ｆ＞の解説音声用テキスト「Ｘ対Ｙ６回裏７対１」が即時に生成されると共に、定期的に生成されることとなり、視聴者１００は、この解説音声を更新時に聞くことができ、さらに、その後更新されない限り定期的に聞くことができる。 As a result, the commentary audio text "X vs. Y, bottom of the 6th inning, 7 to 1" for the utterance data <f> is generated immediately and periodically, and the viewer 100 can listen to this commentary audio. You can listen to it when it updates, and then periodically until it updates.

ここで、複数のラベルを含む発話データについて、「提示タイミング」が「定期」であるラベルを含み、さらに「即時」または「準即時」であるラベルを含む場合、当該発話データの「提示タイミング」を「定期」として扱う。 Here, for utterance data that includes multiple labels, if the utterance data includes a label whose "presentation timing" is "regular" and further includes a label whose "immediate" or "semi-immediate", the "presentation timing" of the utterance data is treated as "periodic".

この場合、順序破棄制御部１９は、前述の（Ｉ）のとおり、新たな配列を構成する際に、「提示タイミング」が「定期」である更新のあった発話データ（新たに入力したＪｓｏｎファイルの発話データ）について、最初の当該発話データにおける「提示タイミング」を「即時」または「準即時」に設定して（配列内の他の発話データが「即時」のラベルを含み「準即時」のラベルを含まない場合は「即時」に設定し、他の発話データが「準即時」のラベルを含み「即時」のラベルを含まない場合は「準即時」に設定して）順序を決定し、配列の発話データを並び替える。そして、順序破棄制御部１９は、それ以降の当該発話データにおける「提示タイミング」を「定期」としたまま、予め設定された時間間隔となるように、順序を決定して配列の発話データを並び替える。 In this case, as described in (I) above, when configuring a new array, the order discard control unit 19 uses the updated utterance data (newly input Json file) whose "presentation timing" is "regular". utterance data), set the ``presentation timing'' in the first utterance data to ``immediate'' or ``semi-immediate'' (if other utterance data in the array contains the label ``immediate'' and is set to ``semi-immediate''). If the utterance data does not contain a label, set it to "immediate"; if the other utterance data contains a label of "semi-immediate" and does not contain a label of "immediate," set it to "semi-immediate"), and Sort the utterance data in the array. Then, the order discard control unit 19 determines the order and arranges the utterance data in the array so that the "presentation timing" of the subsequent utterance data is set to "regular" and at a preset time interval. Change.

以上のように、本発明の実施形態の解説音声制作装置１によれば、解析部１１は、複数の情報源２からデータを入力し、情報源２の予め設定されたデータフォーマットに従い、入力したデータを解析することでテキスト要素を抽出する。 As described above, according to the explanatory audio production device 1 according to the embodiment of the present invention, the analysis unit 11 inputs data from a plurality of information sources 2 and inputs data according to the preset data format of the information sources 2. Extract text elements by analyzing data.

格納部１２は、テキスト要素に対してラベルを付与し、ラベルが付与されたテキスト要素をタイムスタンプと共に情報管理テーブル１３に格納する。 The storage unit 12 assigns a label to a text element and stores the labeled text element together with a time stamp in the information management table 13.

更新監視部１５は、テンプレート１４に定義された発話定義データに含まれるトリガーラベルのテキスト要素について、情報管理テーブル１３にて更新されたか否かを監視する。 The update monitoring unit 15 monitors whether the text element of the trigger label included in the utterance definition data defined in the template 14 has been updated in the information management table 13.

読出部１６は、更新されたテキスト要素のトリガーラベルの発話について、発話定義データに含まれるラベルの組合せ（１または複数のラベル）の１または複数のテキスト要素を情報管理テーブル１３から読み出す。 The reading unit 16 reads one or more text elements of a combination of labels (one or more labels) included in the utterance definition data from the information management table 13 for the utterance of the updated trigger label of the text element.

フォーマット変換部１７は、１または複数のラベル及びこれに対応する１または複数のテキスト要素を、再生時刻を含むＪｓｏｎファイルにフォーマット変換する。 The format conversion unit 17 converts the format of one or more labels and one or more text elements corresponding thereto into a Json file including playback time.

テキスト生成部１８は、Ｊｓｏｎファイルから１または複数のテキスト要素を抽出し、解説音声用テキストを生成し、Ｊｓｏｎファイルに対して発話ＩＤを付与し、解説音声用テキストの文字数に基づいて、その音声ファイルの時間長を算出する。 The text generation unit 18 extracts one or more text elements from the Json file, generates explanatory audio text, assigns an utterance ID to the Json file, and generates the audio based on the number of characters in the explanatory audio text. Calculate the time length of the file.

順序破棄制御部１９は、フォーマット変換部１７からＪｓｏｎファイルの入力がある場合、Ｊｓｏｎファイルから再生時刻及び１または複数のラベルを抽出し、発話ＩＤ、再生時刻、１または複数のラベル、及び時間長からなる発話データを生成し、ラベルに含まれる提示タイミングの優先度に基づいて発話データの順序を決定して配列を構成する。また、順序破棄制御部１９は、構成した配列の順序、発話データの再生時刻及び時間長に基づいて、発話データの再生時刻を再設定する。 When a Json file is input from the format conversion unit 17, the order discard control unit 19 extracts the playback time and one or more labels from the Json file, and extracts the utterance ID, playback time, one or more labels, and time length. , and the order of the utterance data is determined based on the presentation timing priority included in the label to form an array. Furthermore, the order discard control unit 19 resets the playback time of the speech data based on the configured order of the array, the playback time and time length of the speech data.

順序破棄制御部１９は、配列内の先頭の発話データについての発話ＩＤ及び再生時刻を出力し、当該発話データを破棄する。また、順序破棄制御部１９は、配列内に同種ラベルの発話データが複数ある場合、古い再生時刻の発話データを破棄し、一定時間経過した発話データも破棄する。 The order discard control unit 19 outputs the utterance ID and playback time of the first utterance data in the array, and discards the utterance data. In addition, when there is a plurality of speech data with the same type of label in the array, the order discard control unit 19 discards the speech data with the oldest playback time, and also discards the speech data after a certain period of time has elapsed.

これにより、テンプレート１４に定義された発話毎の発話定義データに従い、発話したい解説音声用テキストが生成され、ラベルの３列目の「提示タイミング」の優先度に基づいて、発話の順序が決定され発話の再生時刻が再設定される。 As a result, the explanatory audio text to be uttered is generated according to the utterance definition data for each utterance defined in the template 14, and the order of utterances is determined based on the priority of "presentation timing" in the third column of the label. The playback time of the utterance is reset.

このようにして生成された解説音声用テキストの音声ファイルが再生時刻に再生されることで、ライブ配信されているスポーツ番組の生放送に合わせて、リアルタイムに解説音声を視聴者１００に提供することができる。 By playing the audio file of the explanatory audio text generated in this way at the playback time, the explanatory audio can be provided to the viewer 100 in real time in conjunction with the live broadcast of the sports program being distributed live. can.

また、複数の情報源２からデータを取得し、データから抽出したテキスト要素を、複数の情報源２のデータフォーマットを統一したラベルと共に、情報管理テーブル１３に格納するようにしたため、情報管理テーブル１３を一元管理することができる。つまり、汎用性に富んだ解説音声制作配信システム１０を実現することができる。 In addition, since data is acquired from multiple information sources 2 and text elements extracted from the data are stored in the information management table 13 along with a label that unifies the data formats of the multiple information sources 2, the information management table 13 can be centrally managed. In other words, it is possible to realize a highly versatile explanatory audio production and distribution system 10.

このように、複数の情報源２を利用した情報管理テーブル１３の一元管理により、リアルタイム性も担保しながら、テキスト要素が重なることのない発話毎の解説音声用テキストが生成される。 In this way, by centrally managing the information management table 13 using a plurality of information sources 2, explanatory audio text for each utterance without overlapping text elements is generated while ensuring real-time performance.

この解説音声用テキストの発話を増やしたい場合、テンプレート１４に当該発話の発話定義データを追加すれば済むため、拡張性に富んだ解説音声制作配信システム１０を実現することができる。 If it is desired to increase the number of utterances of this explanatory voice text, it is only necessary to add the utterance definition data of the utterances to the template 14, so it is possible to realize a highly extensible explanatory voice production and distribution system 10.

したがって、複数の情報源２のデータを利用すると共に、拡張性及び汎用性の高い解説音声をリアルタイムで提供することができる。 Therefore, it is possible to utilize data from a plurality of information sources 2 and to provide explanatory audio with high expandability and versatility in real time.

以上、実施形態を挙げて本発明を説明したが、本発明は前記実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。 Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the embodiments described above, and can be modified in various ways without departing from the technical concept thereof.

例えばフォーマット変換部１７は、Ｊｓｏｎファイルを生成するようにしたが、Ｊｓｏｎファイル以外のファイルを生成するようにしてもよい。本発明は、テキスト生成部１８及び順序破棄制御部１９が入力するファイルをＪｓｏｎファイルに限定するものではなく、他のデータフォーマットに従ったファイルであってもよい。 For example, although the format conversion unit 17 is configured to generate a Json file, it may also generate files other than Json files. In the present invention, the files input by the text generation section 18 and the order discard control section 19 are not limited to Json files, but may be files in accordance with other data formats.

また、図１に示した解説音声制作配信システム１０では、音声合成装置３は、音声合成処理により音声ファイルを生成し、配信装置４は、音声ファイル及び再生時刻を携帯端末５へ配信するようにした。 In addition, in the explanatory audio production and distribution system 10 shown in FIG. did.

これに対し、配信装置４は、音声ファイル及び再生時刻の代わりに、解説音声制作装置１により生成された解説音声用テキスト及び再生時刻を携帯端末５へ配信するようにしてもよい。この場合、解説音声制作配信システム１０は音声合成装置３を備える必要がなく、携帯端末５は、解説音声用テキストを受信して音声合成処理を行うことで音声ファイルを生成し、音声ファイルを再生時刻に再生する。 On the other hand, the distribution device 4 may distribute the explanatory audio text and the reproduction time generated by the explanatory audio production device 1 to the mobile terminal 5 instead of the audio file and the reproduction time. In this case, the commentary audio production and distribution system 10 does not need to include the speech synthesis device 3, and the mobile terminal 5 generates an audio file by receiving the commentary audio text and performs speech synthesis processing, and plays the audio file. Play at the time.

また、テンプレート１４には、図５に示したとおり、発話内容、ラベルの組合せ等が定義される。これに対し、テンプレート１４にテロップ等の情報が定義されるようにしてもよい。この場合、ラベルの５列目として、テロップ等の汎用性の高い項目を用意しておき、格納部１２は、情報管理テーブル１３に、このようなラベルが付与された情報のテキスト要素を格納する。そして、テキスト生成部１８は、テロップ等のテキスト要素を含む解説音声用テキストを生成する。これにより、どのような情報を解説音声として提示すべきか分からない場合であっても、テロップ等のテキスト要素を含む解説音声用テキストを生成することができ、番組の解説付与率を向上させることができる。 Furthermore, as shown in FIG. 5, the template 14 defines utterance content, label combinations, and the like. On the other hand, information such as a subtitle may be defined in the template 14. In this case, a highly versatile item such as a caption is prepared as the fifth column of labels, and the storage unit 12 stores text elements of information given such a label in the information management table 13. . Then, the text generation unit 18 generates explanatory audio text including text elements such as telops. As a result, even if it is not known what kind of information should be presented as commentary audio, text for commentary audio that includes text elements such as captions can be generated, and the rate of providing explanations for programs can be improved. can.

例えば入力ツールの情報源２－５から入力されたデータのテキスト要素が、テロップ等のラベルと共に情報管理テーブル１３に格納されるようにする。これにより、入力ツールの情報源２－５のオペレータにより入力操作された文字情報が解説音声用テキストに反映されることとなり、当該文字情報が反映された解説音声の音声ファイルが再生されることとなる。 For example, text elements of data input from the information source 2-5 of the input tool are stored in the information management table 13 together with labels such as captions. As a result, the character information inputted by the operator of information source 2-5 of the input tool will be reflected in the commentary audio text, and the audio file of the commentary audio in which the text information is reflected will be played. Become.

尚、本発明の実施形態による解説音声制作装置１のハードウェア構成としては、通常のコンピュータを使用することができる。解説音声制作装置１は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 Note that a normal computer can be used as the hardware configuration of the explanatory audio production device 1 according to the embodiment of the present invention. The explanatory audio production device 1 is constituted by a computer equipped with a CPU, a volatile storage medium such as a RAM, a nonvolatile storage medium such as a ROM, an interface, and the like.

解説音声制作装置１に備えた解析部１１、格納部１２、情報管理テーブル１３、テンプレート１４、更新監視部１５、読出部１６、フォーマット変換部１７、テキスト生成部１８及び順序破棄制御部１９の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Each of the analysis section 11, storage section 12, information management table 13, template 14, update monitoring section 15, reading section 16, format conversion section 17, text generation section 18, and order discard control section 19 provided in the explanatory audio production device 1 The functions are realized by causing the CPU to execute programs in which these functions are written.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium, and are read and executed by the CPU. Additionally, these programs can be stored and distributed in storage media such as magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.), semiconductor memories, etc., and can be distributed via networks. You can also send and receive messages.

１解説音声制作装置
２情報源
３音声合成装置
４配信装置
５，１０５携帯端末
１０解説音声制作配信システム
１１解析部
１２格納部
１３情報管理テーブル
１４テンプレート
１５更新監視部
１６読出部
１７フォーマット変換部
１８テキスト生成部
１９順序破棄制御部
１００視聴者
１０１放送送信装置
１０２放送受信装置
１０３解説音声制作配信装置
１０４アプリサーバ 1 Explanation audio production device 2 Information source 3 Speech synthesis device 4 Distribution device 5, 105 Mobile terminal 10 Explanation audio production and distribution system 11 Analysis section 12 Storage section 13 Information management table 14 Template 15 Update monitoring section 16 Reading section 17 Format conversion section 18 Text generation unit 19 Order discard control unit 100 Viewer 101 Broadcast transmission device 102 Broadcast reception device 103 Explanation audio production and distribution device 104 Application server

Claims

In a commentary audio production device that generates text for commentary audio for each utterance of a live sports program,
a template in which utterance definition data including one or more labels corresponding to the one or more text elements when the text is composed of one or more text elements is defined for each of the utterances;
an information management table in which the text elements are stored;
Analysis that extracts the text element from the data by inputting data corresponding to the match situation of the sports program from each of a plurality of information sources and analyzing the data according to a preset data format of the information source. Department and
a storage unit that assigns a label including a priority of the timing of presenting the utterance of the text element to the text element extracted by the analysis unit, and stores the text element to which the label has been assigned in the information management table; and,
an update monitoring unit that monitors whether or not the text element stored in the information management table has been updated, and outputs the label given to the text element when it is determined that the text element has been updated;
Regarding the utterance of the utterance definition data including the label output by the update monitoring unit, one or more corresponding texts to which one or more labels included in the utterance definition data are attached are determined from the information management table. a reading unit that reads the element and outputs one or more labels of the utterance and one or more text elements corresponding thereto;
a format conversion unit that converts the format of one or more labels of the utterance output by the reading unit and one or more text elements corresponding thereto into a file including a predetermined playback time;
a text generation unit that extracts the one or more text elements from the file whose format has been converted by the format conversion unit, generates and outputs the text;
The file for each utterance whose format has been converted by the format converter is input, the order of the utterances is determined based on the priority included in the label of the file for each utterance, and the file for each utterance is converted according to the order. reset the playback time included in the file;
an order discard control unit that outputs the playback time included in the file of the first utterance for which the order has been determined, and discards the file of the first utterance;
An explanatory audio production device characterized by comprising:

The explanatory audio production device according to claim 1,
The plurality of information sources include an information source that transmits real-time data according to the match situation of the sports program, and further includes an information source that transmits data of the sports program according to an input operation by an operator, and a source that transmits real-time data according to the match situation of the sports program. The information source includes at least one of an information source that transmits data obtained by analyzing an image of a match situation of a program, and an information source that transmits data obtained by recognizing audio of a match situation of the sports program. An explanatory audio production device characterized by:

The explanatory audio production device according to claim 1,
In the template, for each utterance, in addition to the one or more labels, one of the one or more labels is defined as a trigger label,
The update monitoring unit includes:
An explanatory audio production device characterized in that it monitors whether or not the text element to which the trigger label is attached has been updated in the information management table, and outputs the trigger label when it is determined that the text element has been updated.

The explanatory audio production device according to claim 1,
The label is composed of respective numerical values indicating the type of information source, the competition type of the sports program, the priority, the group to which the text element belongs, and the item within the group,
When reading the corresponding one or more text elements to which the one or more labels included in the utterance definition data are attached, the label is set as a read target label, and in addition to the read target label, the read target label is configured. A label to which the group to which the text element belongs and the item in the group are the same, and the type of the information source is different is considered a homogeneous label,
The reading section is
When the information management table stores a plurality of text elements to which the same type of label is attached,
An explanatory audio production device characterized in that the text element stored first among a plurality of text elements to which the same kind of label is given is read out from the information management table.

The explanatory audio production device according to claim 4,
The reading section is
When the information management table does not store a text element to which the read target label is attached, but stores a text element to which the same kind of label other than the read target label is attached,
An explanatory audio production device characterized in that the text element to which the same kind of label other than the read target label is attached is read out from the information management table.

The explanatory audio production device according to claim 1,
In addition to the priority, the label includes the competition type of the sports program,
The reading section is
Regarding the text element read from the information management table, the text element is modified according to the sport event included in the label given to the text element, and one or more labels and one or more texts of the utterance are modified. An explanatory audio production device characterized by outputting an element (if there is a corrected text element, the text element).

The explanatory audio production device according to claim 1,
The one or more labels included in the utterance definition data include a label corresponding to a predetermined particle or word,
The reading section is
reading one or more text elements containing the predetermined particle or word from the information management table;
The text generation unit is
An explanatory audio production device characterized by extracting one or more text elements containing the predetermined particle or word from the file, and generating and outputting a text containing the predetermined particle or word.

The explanatory audio production device according to claim 1,
The priority included in the label is any one of information indicating immediate, semi-immediate, regular, and other, and the immediate priority is the highest, the semi-immediate is the next highest, and the other is the priority. is the lowest,
The order discard control unit includes:
utterances of a label containing the periodic priority, a first utterance of the label containing the immediate or semi-immediate priority, and a second utterance of the label containing the periodic priority arranged at predetermined time intervals; As an utterance,
Regarding the first utterance, determining the order of the utterances such that the higher the priority, the closer to the beginning the utterances are, and
An explanatory audio production device characterized in that the order of the utterances is determined so that the second utterance is placed after the first utterance at the predetermined time interval.

The explanatory audio production device according to claim 1,
The label is composed of a type of information source, a competition type of the sports program, the priority, a group to which the text element belongs, and a numerical value indicating an item within the group,
When reading the corresponding one or more text elements to which the one or more labels included in the utterance definition data are attached, the label is set as a read target label, and in addition to the read target label, the read target label is configured. A label to which the group to which the text element belongs and the item in the group are the same, and the type of the information source is different is considered a homogeneous label,
The order discard control unit includes:
In accordance with the update determination by the update monitoring unit, when a new file including the same kind of label is input and the order of utterances for the files for each utterance is determined, it is determined that there are multiple files including the same kind of label. case,
An explanatory audio production device characterized in that, among a plurality of files including the same kind of label, files other than the new file are discarded.

The explanatory audio production device according to claim 1,
The order discard control unit includes:
An explanatory audio production device characterized in that, among the files for each utterance, files for which a preset time has elapsed are discarded.

A computer that constitutes a commentary audio production device that generates text for commentary audio for each utterance of sports programs that are being streamed live.
a template in which utterance definition data including one or more labels corresponding to the one or more text elements when the text is composed of one or more text elements is defined for each of the utterances;
an information management table in which the text elements are stored;
Analysis that extracts the text element from the data by inputting data corresponding to the match situation of the sports program from each of a plurality of information sources and analyzing the data according to a preset data format of the information source. Department,
a storage unit that assigns a label including a priority of the timing of presenting the utterance of the text element to the text element extracted by the analysis unit, and stores the text element to which the label has been assigned in the information management table; ,
an update monitoring unit that monitors whether or not the text element stored in the information management table has been updated, and outputs the label given to the text element when it is determined that the text element has been updated;
Regarding the utterance of the utterance definition data including the label output by the update monitoring unit, one or more corresponding texts to which one or more labels included in the utterance definition data are attached are determined from the information management table. a reading unit that reads the element and outputs one or more labels of the utterance and one or more text elements corresponding thereto;
a format conversion unit that converts the format of one or more labels of the utterance output by the reading unit and one or more text elements corresponding thereto into a file including a predetermined playback time;
a text generation unit that extracts the one or more text elements from the file whose format has been converted by the format conversion unit, generates and outputs the text, and
The file for each utterance whose format has been converted by the format converter is input, the order of the utterances is determined based on the priority included in the label of the file for each utterance, and the file for each utterance is converted according to the order. reset the playback time included in the file;
A program for functioning as an order discard control unit that outputs the playback time included in a file of the first utterance whose order has been determined, and discards the file of the first utterance.